CURRICULUM
VITAE

Development of the methods and algorithms for Digital Signal Processing (feature extraction, HMM creation and training, situational and associative analysis) and Human-Computer Interaction (voice operated control systems, dialogue systems, multimodal systems).
Creation, debugging and testing the software for speech communication between a human and a computer (aircraft emulator, technological process, telecommunication equipment, Web-portals with speech interface etc).
Scientific Awards:
- Medal of the Russian Academy of Sciences for young Russian scientists for the best research work in the field of computer science, engineering and automation in 2011;
- Grant of the President of the Russia for young Ph.D., 2012-2013;
- Grant of the President of the Russia for young Ph.D., 2010-2011;
- Award "Distinguished PhD of the Russian Academy of Sciences", 2008-2009;
- Award of winner of competition of personal grants of Saint-Petersburg for young PhD from Administration of St. Petersburg, 2008-2010;
- Winner of project's competition in nomination "IT-technologies" organized by St. Petersburg Scientific Center of RAS, 2009;
- Award "Distinguished PhD student of the Russian Academy of Sciences", 2005-2006;
- Grand Prix at the Loco Mummy Contest 2006 in the nomination "Best PC Multimodal User Interface Software", 2006 (Belgium).
- Laureate of the competition "Grant of Saint-Petersburg 2004" of the International Soros Science Education Program (ISSEP) in specialty “Mathematics”.
2002-present: Senior researcher of the Speech
and Multimodal Interfaces Laboratory of the Institution of the Russian
Academy of Sciences St. Petersburg
Institute for Informatics and Automation of RAS (SPIIRAS).
Research areas: theoretical and experimental research in the area of automatic Russian speech recognition and understanding, audio-visual speech recognition and multimodal interfaces. Development of automatic speech recognition system SIRIUS (SPIIRAS Interface for Recognition and Integral Understanding of Speech), and multimodal system ICANDO (Intellectual Computer AssistaNt for Disabled Operators) based on speech recognition and head tracking. Fundamental and applied investigations (multimodal interfaces, speech recognition, natural language understanding, situational models and analysis, research of the HMM and methods for training the acoustical models), programming (development of demo-versions of speech recognition software, moving objects emulators, Web-programming), system and network administration. Member of Organizing Committee of the International Conferences “Speech and Computer” SPECOM in 2009, 2006, 2004, 2002 as well as INTAS Strategic Scientific Workshop “Development of perspective applications of Human-Computer Interaction for Information Society” in 2004.
Responsible researcher in the following International and Russian research projects:
- Grant of the President of Russia “Development of a computer multi-modal system for audio-visual synthesis of conversational Russian speech and sign language of the deaf”, # MK-64898.2010.8, 2010-2011;
- Project “Development of methods, models and algorithms for automatic recognition of audio-visual Russian speech” funded by the Ministry of Education and Science of the Russian Federation in framework of the Russian Federal Targeted Program “Research and Research-Human Resources for Innovating Russia, Contract # 2579, 2009-2011;
- Project “Development of methods for human-machine interaction and multimodal user interfaces for intelligent information systems” funded by the Ministry of Education and Science of the Russian Federation in framework of the Russian Federal Targeted Program “Research and Research-Human Resources for Innovating Russia, Contract # 2360, 2009-2011;
- Bilateral Turkish-Russian project funded by RFBR-TUBITAK foundations “Methods and multimodal interfaces for contactless communication of handicapped people with information inquiry systems”, # 09-07-91220-CT_a, 2009-2010;
- Project funded by the Human Capital Foundation “Multi-modal assistive system based on technologies of Russian speech recognition and computer vision”, 2009-2010;
- Bilateral Russian-Byelorussian project funded by RFBR-BRFFI foundations “Model of Audio-Visual Speech Synthesis and Recognition for Intellectual Queuing Devices”, # 08-07-90002-Bel_a, 2008-2009;
- Project of the European Union in Framework Program 6 - SIMILAR Network of Excellence “The European taskforce creating human-machine interfaces SIMILAR to human-human communication”, FP6-IST-2002-507609, www.similar.cc, 2003‑2007;
- Russian Foundation for Basic Research project “Investigation of multimodal interaction by an information kiosk”, # 07-07-00073-a, 2007-2009;
- European INTAS innovative project “Introduction of the automatic Russian speech recognition system SIRIUS in telecommunication”, # 05-1000007-426, 2006-2008;
- European INTAS research project “Development of multi-voice and multi-language Text-to-Speech (TTS) and Speech-to-Text (STT) conversion system (languages: Polish, Russian, Belarussian)”, # 04-77-7404, www.spiiras.nw.ru/speech/intas, 2005-2007;
- Project funded by the Human Capital Foundation “Development of the service for voice access to inquiry system”, # 64, 2006;
- ISTC-EOARD Project “Voice Operated Flying Object”, #1993P, task # 4, 2000-2003.
Participation with papers and/or presentations in the following International scientific events:
-
12th International Conference Interspeech’2011, Florence, Italy, August 2011;
-
17th International Congress of Phonetic Sciences, Hong Kong, China,
August 2011;
-
14th International Conference Human-Computer Interaction International,
Orlando, USA, July 2011;
- 7th Summer Workshop on Multimodal Interfaces, Plzen, Czech Republic, July 2011;
-
11th International Conference Interspeech’2010, Makuhari,
Japan, September 2010;
-
13th International Conference “Text, Speech and Dialogue” TSD’2010,
Brno, Czech Rep, Sep 2010;
-
20th International Conference on Pattern Recognition ICPR’2010,
Istanbul, Turkey, August 2010;
- 6th Summer Workshop on Multimodal Interfaces, Amsterdam, The Netherlands, July 2010;
- 10th International Conference “Pattern Recognition and Image Analysis”, St. Petersburg, Russia, Dec 2010;
- 20th International Conference Graphicon-2010, St. Petersburg, Russia, August 2010;
-
10th International Conference Interspeech’2009, Brighton, UK, September
2009;
-
13th International Conference on Speech and Computer SPECOM’2009, St.
Petersburg, Russia, June 2009;
-
16th European Signal Processing Conference EUSIPCO’2008, Lausanne,
Switzerland, August 2008;
- 4th Summer Workshop on Multimodal Interfaces, Orsay, France, August 2008;
- 9th International Conference “Pattern Recognition and Image Analysis”, Nizhny Novgorod, Russia, 2008;
-
19th International Congress on Acoustics, Madrid, Spain, September
2007;
- 3rd Summer Workshop on Multimodal Interfaces, Istanbul, Turkey, July-August 2007;
- SIMILAR International Day, Louvain-la-Neuve, Belgium, December 2006;
-
Interspeech’2006 - International Conference on Spoken Language
Processing, Pittsburgh, PA, USA, September 2006;
-
14th European Signal Processing Conference EUSIPCO’2006, Florence,
Italy, September 2006;
-
International Conference “Intelligent and
Multiprocessor systems” IMS’2006, Kaciveli, Ukraine,
September 2006;
-
11th International Conference on Speech and Computer SPECOM’2006, St.
Petersburg, Russia, June 2006;
- SIMILAR General Assembly Meeting, Louvain-la-Neuve, Belgium, December 2005;
- 13th European Signal Processing Conference EUSIPCO’2005, Antalya, Turkey, September 2005;
- International Conference “Intelligent and Multiprocessor systems” IMS’2005, Divnomorskoe, Russia, September 2005;
- 1st Summer Workshop on Multimodal Interfaces, Mons, Belgium, July-August 2005;
- International Conference Intelligent Information Systems: Intelligent Information Processing and Web Mining, Gdansk, Poland, June 2005;
- 3rd International IEEE Conference: Sciences of Electronic, Technologies of Information and Telecommunications SETIT’2005, Tunisia, March 2005;
- Lecture “Speech recognition and multimodal applications for Russian language” at NISLab of University of Southern Denmark, Odense, Denmark, February 2005;
- 9th International Conference on Speech and Computer SPECOM’2004, St. Petersburg, September 2004;
- 3rd All-Russian Conference “Theory and Practice of speech investigations” ARSO-2003, Moscow, September 2003;
- 7th International Workshop on Speech and Computer SPECOM’2002, St. Petersburg, September 2002.
2003 - 2007 PhD student of Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences.
Specialty: 05.13.11 “Mathematical and software support of computers, complexes and computer networks”.
PhD Thesis: “Models
and software realization for Russian speech recognition based on morphemic
analysis” was defended in March 2007:
http://theses.eurasip.org/search/theses/?q=Karpov
1996 - 2002 Student of Saint-Petersburg State University of airspace instrumentation. Cum laude diploma.
Specialty: “Computational and radio-electronic systems”.
Master Thesis: "Development of the client/server architecture for object-oriented
database management system" was
defended in February 2002.
- Member of the European Association
for Signal Processing (EURASIP): http://www.eurasip.org
- Local Liaison Officer in Russia of
the EURASIP Association: http://www.eurasip.org/index.php?option=com_content&view=article&id=83&Itemid=91
- Member of the OpenInterface
foundation: http://www.openinterface.org/foundation
- Co-chair of the Organizing Committee of the
series of the International Conferences on Speech and Computer – SPECOM: www.specom.nw.ru
Book
- A. Ronzhin, A. Karpov, I. Li. Speech and Multimodal Interfaces, Moscow: Nauka, 2006, 173 p. (in Rus.): http://www.knigoprovod.ru/?topic_id=23;book_id=1865
Book Chapter
-
A. Karpov, S. Carbini, A. Ronzhin, J.E. Viallet. Chapter
“Two Similar Different Speech and Gestures Multimodal Interfaces” in book
"Multimodal User Interfaces: From Signals to
Interaction", D. Tzovaras (Ed.),
Springer, 325 p., 2008, ISBN: 978-3-540-78344-2: http://www.springerlink.com/content/p5g181/?p=e5b13daec3f84a4aa66ba95f38e0fe85&pi=0
in International
Scientific Journals
1) A. Karpov, A. Ronzhin. Information Enquiry Kiosk with Multimodal User Interface // Pattern Recognition and Image Analysis, Pleiades Publishing, Vol. 19, ¹ 3, 2009, pp.546-558:
http://www.springerlink.com/content/ym766064658w12l7/?p=ad4e76356897411e90554fc1094cb60d&pi=6
2) S. Argyropoulos, K. Moustakas, A. Karpov, O. Aran, D. Tzovaras, T. Tsakiris, G. Varni, B. Kwon. A Multimodal Framework for the Communication of the Disabled // Journal on Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 2, ¹ 2, 2008, pp. 105-116:
http://www.springerlink.com/content/a502q41464m3p858/?p=b9b3dbf5baec46b09d46ebca966dad4b&pi=7
3) A. Ronzhin, A. Karpov. Russian Voice Interface // Pattern Recognition and Image Analysis, Pleiades Publishing, Vol. 17, No. 2, 2007, pp. 321–336:
http://www.springerlink.com/content/376u33001458177p/?p=ad4e76356897411e90554fc1094cb60d&pi=3
4) A. Karpov, A. Ronzhin. ICANDO: Low Cost Multimodal Interface for Hand Disabled People // Journal on Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 1, No. 2, 2007, pp. 21-29:
http://www.springerlink.com/content/h53t18507u97h4h6/?p=b9b3dbf5baec46b09d46ebca966dad4b&pi=1
5) M. Hruz, P. Campr, E. Dikici, A. Kindirouglu, Z. Krnoul, Al. Ronzhin, H. Sak, D. Schorno, L. Akarun, O. Aran, A. Karpov, M. Saraclar, M. Zelezny. Automatic Fingersign to Speech Translation System // Journal on Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 4, No. 2, 2011, pp. 61-79:
http://www.springerlink.com/content/p274351661517j75/
6) A. Kindiroglu, H. Yalcın, O. Aran, M. Hruz, P. Campr, L. Akarun, A. Karpov. Multi-lingual Fingerspelling Recognition in a Handicapped Kiosk // Pattern Recognition and Image Analysis, Pleiades Publishing, Vol. 21, No. 3, 2011, pp. 402-406:
http://www.springerlink.com/content/ax23232447264486/
7) A. Kindiroglu, H. Yalcın, O. Aran, M. Hruz, P. Campr, L. Akarun, A. Karpov. Automatic Recognition of Fingerspelling Gestures in Multiple Languages for a Communication Interface for the Disabled // Pattern Recognition and Image Analysis. Pleiades Publishing, Vol. 22, 2012.
in Reviewed Proceedings of Top International
Conferences
1) A. Karpov, I. Kipyatkova, A. Ronzhin. Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis. In Proc. INTERSPEECH-2011 International Conference, ISCA Association, Florence, Italy, 2011, pp. 3161-3164.
2) A. Karpov, A. Ronzhin, K. Markov, M. Zelezny. Viseme-Dependent Weight Optimization for CHMM-Based Audio-Visual Speech Recognition. In Proc. INTERSPEECH-2010 International Conference, ISCA Association, Makuhari, Chiba, Japan, 2010, pp. 2678-2681.
3) A. Karpov, L. Tsirulnik, Z. Krnoul, A. Ronzhin, B. Lobanov, M. Zelezny. Audio-Visual Speech Asynchrony Modeling in a Talking Head. In Proc. INTERSPEECH-2009 International Conference, ISCA Association, Brighton, UK, 2009, pp. 2911-2914.
4) A. Karpov, A. Ronzhin, A. Cadiou. A Multi-Modal System ICANDO: Intellectual Computer AssistaNt for the Disabled Operators. In Proc. INTERSPEECH-2006 International Conference, ISCA Association, Pittsburgh, PA, USA, 2006, pp. 1998-2001.
5) A. Karpov, A. Ronzhin, I. Kipyatkova, Al. Ronzhin, L. Akarun. Multimodal Human Computer Interaction with MIDAS Intelligent Infokiosk. In Proc. 20th International Conference on Pattern Recognition ICPR-2010, IAPR Association, Turkey, Istanbul, 2010, pp. 3862-3865.
6) A. Karpov, S. Carbini, A. Ronzhin, J.E. Viallet. Two Different SIMILAR Speech and Gestures Multimodal Interfaces. In Proc. 16th European Signal Processing Conference EUSIPCO-2008, EURASIP Association, Lausanne, Switzerland, 2008.
7) A. Karpov, A. Ronzhin. ICANDO: Intelligent Computer AssistaNt for Disabled Operators. In Proc. 14th European Signal Processing Conference EUSIPCO-2006, EURASIP Association, Florence, Italy, 2006.
8) A. Karpov, A. Ronzhin, A. Nechaev, S. Chernakova. Multimodal system for hands-free PC control. In Proc. 13th European Signal Processing Conference EUSIPCO-2005, EURASIP Association, Antalya, Turkey, 2005.
9) A. Karpov, A. Ronzhin, I. Kipyatkova. An Assistive Bi-Modal User Interface Integrating Multi-Channel Speech Recognition and Computer Vision. In Proc. 14th International Conference on Human-Computer Interaction HCI International-2011, Springer-Verlag Berlin Heidelberg, LNCS 6762, Orlando, FL, USA, 2011, pp. 454-463.
10) A. Ronzhin, A. Karpov, I. Kipyatkova. Designing Cognition-centric Smart Room Predicting Inhabitant Activities // In Proc. 13th International Conference on Human-Computer Interaction HCI International-2009, Springer, LNAI 5638, D. Schmorrow et al. (Eds.): Augmented Cognition, San Diego, CA, USA, 2009, pp. 78–87.
11) A. Ronzhin, A. Karpov, M. Zelezny, R. Mesheryakov. Smart Multimodal Assistant for Disabled. In Proc. 12th International Conference on Human-Computer Interaction HCI International-2007, Springer-Verlag Berlin Heidelberg, LNCS series vol. 4550-4566, Beijing, PR China, 2007, pp. 201-205.
12) A. Ronzhin, A. Karpov, I. Kipyatkova, M. Zelezny. Client and Speech Detection System for Intelligent Infokiosk. In Proc. 13th International Conference on Text, Speech and Dialog TSD-2010, Springer LNAI, Brno, Czech Republic, 2010, pp. 560–567.
13) A. Karpov, A. Ronzhin, A. Leontyeva. A Semi-automatic Wizard of Oz Technique for Let’sFly Spoken Dialogue System. In Proc. 11th International Conference on Text, Speech and Dialog TSD-2008, Springer LNAI 5246, Brno, Czech Republic, 2008, pp. 585‑592.
14) A. Karpov, A. Ronzhin, I. Kipyatkova, M. Zelezny. Influence of Phone-viseme Temporal Correlations on Audiovisual STT and TTS Performance. In Proc. 17th International Congress of Phonetic Sciences ICPhS-2011, Hong Kong, China, 2011, pp. 1030-1033.
15) Yu. Kosarev, A. Ronzhin, A. Karpov, I. Lee. Continuous Speech Recognition without Use of High-Level Information, In Proc. 15th International Congress of Phonetic Sciences ICPhS-2003, Barcelona, Spain, 2003, pp 1373-1376.
16) A. Karpov, A. Ronzhin. Russian Speech Recognition Model with Morphemic Analysis and Synthesis. In Proc. 19th International Congress on Acoustics 2007, Madrid, Spain, 2007.
17) M. Sargin, O. Aran, A. Karpov, F. Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, M. Tekalp. Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis. In Proc. IEEE International Conference on Multimedia & Expo ICME-2006, Toronto, Canada, 2006.
18) P. Campr, E. Dikici, M. Hruz, A. Kindiroglu, Z. Krnoul, Al. Ronzhin, H. Sak, D. Schorno, L. Akarun, O. Aran, A. Karpov, M. Saraclar, M. Zelezny. Automatic Fingersign to Speech Translator. In Proc. 6th Summer Workshop on Multimodal Interfaces eNTERFACE-2010, Amsterdam, The Netherlands, 2010, pp. 69-82.
19) O. Aran, P. Campr, M. Hruz, A. Karpov, P. Santemiz, M. Zelezny. Sign-language-enabled Information Kiosk. In Proc. 4th Summer Workshop on Multimodal Interfaces eNTERFACE-2009, Orsay, France, 2009, pp. 24-33.
20) S. Argyropoulos, K. Moustakas, A. Karpov, O. Aran, D. Tzovaras, T. Tsakiris, G. Varni, B. Kwon. A Multimodal Framework for the Communication of the Disabled. In Proc. 3rd Summer Workshop on Multimodal Interfaces eNTERFACE-2007, Istanbul, Turkey, 2007, pp. 27-36.
21) A. Karpov, L. Tsirulnik, M. Zelezny, Z. Krnoul, A. Ronzhin, B. Lobanov. Study of Audio-Visual Asynchrony of Russian Speech for Improvement of Talking Head Naturalness. In Proc. 13th International Conference SPECOM-2009, St. Petersburg, Russia, 2009, pp. 130-135.
22) M. Markaki, A. Karpov, E. Apostolopoulos, M. Astrinaki, Y. Stylianou, A. Ronzhin. A Hybrid System for Audio Segmentation and Speech-Endpoint Detection of Broadcast News. In Proc. 12th International Conference on Speech and Computer SPECOM-2007, Moscow, Russia, 2007, pp. 691-696.
23) P. Cisar, J. Zelinka, M. Zelezny, A. Karpov, A. Ronzhin. Audio-Visual Speech Recognition for Slavonic Languages (Czech and Russian). In Proc. 11th International Conference SPECOM-2006, St. Petersburg, Russia, 2006, pp. 493-498.
24) M. Železný, P. Císar, Z. Krnoul, A. Ronzhin, I. Li, A. Karpov. Design of Russian Audio-Visual Speech Corpus for Bimodal Speech Recognition. In Proc. 10th International Conference on Speech and Computer SPECOM-2005, Patras, Greece, pp. 397-400.
25) A.L. Ronzhin, A.A. Karpov. Implementation of morphemic analysis for Russian speech recognition. In Proc. 9th International Conference on Speech and Computer SPECOM-2004, St. Petersburg, Russia, 2004, pp.291-296.
Programmer: MS Visual
C++, Borland C++ Builder,
Speech Processing Software: Hidden Markov Model Toolkit (HTK), ASR Julius, CMUSLM toolkit, Microsoft Speech API, Cool Edit, Praat.
Image Processing Software: OpenCV library, Intel AVCSR toolkit, Virtual Dub.
English (fluent),
German (initial)
Married, wife Elena Karpova, one child
Information Technologies, traveling, “Zenit St. Petersburg”, football, ping pong