CURRICULUM VITAE

 

karpov

 

                                                                       1. Full Name: Alexey A. KARPOV

                                                                       2. Qualification: Senior researcher, PhD (in computer science)

                                                                       3. Citizenship: Russia

                                                                       4. Birthday: 17.11.1978

                                                                       5. Contacts: E-mail karpov@iias.spb.su; karpov_a@mail.ru   ICQ 161649073

 

6. Scientific Fields of Expertise and Interests

Development of the methods and algorithms for Digital Signal Processing (feature extraction, HMM creation and training, situational and associative analysis) and Human-Computer Interaction (voice operated control systems, dialogue systems, multimodal systems).

Creation, debugging and testing the software for speech communication between a human and a computer (aircraft emulator, technological process, telecommunication equipment, Web-portals with speech interface etc).

 

Scientific Awards:

-        Medal of the Russian Academy of Sciences for young Russian scientists for the best research work in the field of computer science, engineering and automation in 2011;

-        Grant of the President of the Russia for young Ph.D., 2012-2013;

-        Grant of the President of the Russia for young Ph.D., 2010-2011;

-        Award "Distinguished PhD of the Russian Academy of Sciences", 2008-2009;

-        Award of winner of competition of personal grants of Saint-Petersburg for young PhD from Administration of St. Petersburg, 2008-2010;

-        Winner of project's competition in nomination "IT-technologies" organized by St. Petersburg Scientific Center of RAS, 2009;

-        Award "Distinguished PhD student of the Russian Academy of Sciences", 2005-2006;

-        Grand Prix at the Loco Mummy Contest 2006 in the nomination "Best PC Multimodal User Interface Software", 2006 (Belgium).

-        Laureate of the competition "Grant of Saint-Petersburg 2004" of the International Soros Science Education Program (ISSEP) in specialty “Mathematics”.

7. Scientific Career

2002-present: Senior researcher of the Speech and Multimodal Interfaces Laboratory of the Institution of the Russian Academy of Sciences St. Petersburg Institute for Informatics and Automation of RAS (SPIIRAS).

Research areas: theoretical and experimental research in the area of automatic Russian speech recognition and understanding, audio-visual speech recognition and multimodal interfaces. Development of automatic speech recognition system SIRIUS (SPIIRAS Interface for Recognition and Integral Understanding of Speech), and multimodal system ICANDO (Intellectual Computer AssistaNt for Disabled Operators) based on speech recognition and head tracking. Fundamental and applied investigations (multimodal interfaces, speech recognition, natural language understanding, situational models and analysis, research of the HMM and methods for training the acoustical models), programming (development of demo-versions of speech recognition software, moving objects emulators, Web-programming), system and network administration. Member of Organizing Committee of the International Conferences “Speech and Computer” SPECOM in 2009, 2006, 2004, 2002 as well as INTAS Strategic Scientific Workshop “Development of perspective applications of Human-Computer Interaction for Information Society” in 2004.

Responsible researcher in the following International and Russian research projects:

 

-        Grant of the President of Russia “Development of a computer multi-modal system for audio-visual synthesis of conversational Russian speech and sign language of the deaf”, # MK-64898.2010.8, 2010-2011;

-        Project “Development of methods, models and algorithms for automatic recognition of audio-visual Russian speech” funded by the Ministry of Education and Science of the Russian Federation in framework of the Russian Federal Targeted Program “Research and Research-Human Resources for Innovating Russia, Contract # 2579, 2009-2011;

-        Project “Development of methods for human-machine interaction and multimodal user interfaces for intelligent information systems” funded by the Ministry of Education and Science of the Russian Federation in framework of the Russian Federal Targeted Program “Research and Research-Human Resources for Innovating Russia, Contract # 2360, 2009-2011;

-        Bilateral Turkish-Russian project funded by RFBR-TUBITAK foundations “Methods and multimodal interfaces for contactless communication of handicapped people with information inquiry systems”, # 09-07-91220-CT_a, 2009-2010;

-        Project funded by the Human Capital Foundation “Multi-modal assistive system based on technologies of Russian speech recognition and computer vision”, 2009-2010;

-        Bilateral Russian-Byelorussian project funded by RFBR-BRFFI foundations “Model of Audio-Visual Speech Synthesis and Recognition for Intellectual Queuing Devices”, # 08-07-90002-Bel_a, 2008-2009;

-        Project of the European Union in Framework Program 6 - SIMILAR Network of Excellence “The European taskforce creating human-machine interfaces SIMILAR to human-human communication”, FP6-IST-2002-507609, www.similar.cc, 2003‑2007;

-        Russian Foundation for Basic Research project “Investigation of multimodal interaction by an information kiosk”, # 07-07-00073-a, 2007-2009;

-        European INTAS innovative project “Introduction of the automatic Russian speech recognition system SIRIUS in telecommunication”, # 05-1000007-426, 2006-2008;

-        European INTAS research project “Development of multi-voice and multi-language Text-to-Speech (TTS) and Speech-to-Text (STT) conversion system (languages: Polish, Russian, Belarussian)”, # 04-77-7404, www.spiiras.nw.ru/speech/intas, 2005-2007;

-        Project funded by the Human Capital Foundation “Development of the service for voice access to inquiry system”, # 64, 2006;

-        ISTC-EOARD Project “Voice Operated Flying Object”, #1993P, task # 4, 2000-2003.

 

Participation with papers and/or presentations in the following International scientific events:

-        12th International Conference Interspeech’2011, Florence, Italy, August 2011;

-        17th International Congress of Phonetic Sciences, Hong Kong, China, August 2011;

-        14th International Conference Human-Computer Interaction International, Orlando, USA, July 2011;

-        7th Summer Workshop on Multimodal Interfaces, Plzen, Czech Republic, July 2011;

-        11th International Conference Interspeech’2010, Makuhari, Japan, September 2010;

-        13th International Conference “Text, Speech and Dialogue” TSD’2010, Brno, Czech Rep, Sep 2010;

-        20th International Conference on Pattern Recognition ICPR’2010, Istanbul, Turkey, August 2010;

-        6th Summer Workshop on Multimodal Interfaces, Amsterdam, The Netherlands, July 2010;

-        10th International Conference “Pattern Recognition and Image Analysis”, St. Petersburg, Russia, Dec 2010;

-        20th International Conference Graphicon-2010, St. Petersburg, Russia, August 2010;

-        10th International Conference Interspeech’2009, Brighton, UK, September 2009;

-        13th International Conference on Speech and Computer SPECOM’2009, St. Petersburg, Russia, June 2009;

-        16th European Signal Processing Conference EUSIPCO’2008, Lausanne, Switzerland, August 2008;

-        4th Summer Workshop on Multimodal Interfaces, Orsay, France, August 2008;

-        9th International Conference “Pattern Recognition and Image Analysis”, Nizhny Novgorod, Russia, 2008;

-        19th International Congress on Acoustics, Madrid, Spain, September 2007;

-        3rd Summer Workshop on Multimodal Interfaces, Istanbul, Turkey, July-August 2007;

-        SIMILAR International Day, Louvain-la-Neuve, Belgium, December 2006;

-        Interspeech’2006 - International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 2006;

-        14th European Signal Processing Conference EUSIPCO’2006, Florence, Italy, September 2006;

-        International Conference “Intelligent and Multiprocessor systems” IMS’2006, Kaciveli, Ukraine, September 2006;

-        11th International Conference on Speech and Computer SPECOM’2006, St. Petersburg, Russia, June 2006;

-        SIMILAR General Assembly Meeting, Louvain-la-Neuve, Belgium, December 2005;

-        13th European Signal Processing Conference EUSIPCO’2005, Antalya, Turkey, September 2005;

-        International Conference “Intelligent and Multiprocessor systems” IMS’2005, Divnomorskoe, Russia, September 2005;

-        1st Summer Workshop on Multimodal Interfaces, Mons, Belgium, July-August 2005;

-        International Conference Intelligent Information Systems: Intelligent Information Processing and Web Mining, Gdansk, Poland, June 2005;

-        3rd International IEEE Conference: Sciences of Electronic, Technologies of Information and Telecommunications SETIT’2005, Tunisia, March 2005;

-        Lecture “Speech recognition and multimodal applications for Russian language” at NISLab of University of Southern Denmark, Odense, Denmark, February 2005;

-        9th International Conference on Speech and Computer SPECOM’2004, St. Petersburg, September 2004;

-        3rd All-Russian Conference “Theory and Practice of speech investigations” ARSO-2003, Moscow, September 2003;

-        7th International Workshop on Speech and Computer SPECOM’2002, St. Petersburg, September 2002.

8. Education

2003 - 2007    PhD student of Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences.

Specialty: 05.13.11 “Mathematical and software support of computers, complexes and computer networks”.

PhD Thesis: “Models and software realization for Russian speech recognition based on morphemic analysis” was defended in March 2007:

http://theses.eurasip.org/search/theses/?q=Karpov

 

1996 - 2002    Student of Saint-Petersburg State University of airspace instrumentation. Cum laude diploma.

Specialty: “Computational and radio-electronic systems”.

Master Thesis: "Development of the client/server architecture for object-oriented database management system" was defended in February 2002.

9. Membership

            - Member of the International Speech Communication Association (ISCA): http://www.isca-speech.org

            - Member of the International Association on Pattern Recognition (IAPR): http://www.iapr.org

            - Member of the European Association for Signal Processing (EURASIP): http://www.eurasip.org

            - Local Liaison Officer in Russia of the EURASIP Association: http://www.eurasip.org/index.php?option=com_content&view=article&id=83&Itemid=91

            - Member of the OpenInterface foundation: http://www.openinterface.org/foundation

- Co-chair of the Organizing Committee of the series of the International Conferences on Speech and Computer – SPECOM: www.specom.nw.ru

10. Main Publications

Book

-          A. Ronzhin, A. Karpov, I. Li. Speech and Multimodal Interfaces, Moscow: Nauka, 2006, 173 p. (in Rus.): http://www.knigoprovod.ru/?topic_id=23;book_id=1865

 

Book Chapter

-          A. Karpov, S. Carbini, A. Ronzhin, J.E. Viallet. Chapter “Two Similar Different Speech and Gestures Multimodal Interfaces” in book "Multimodal User Interfaces: From Signals to Interaction", D. Tzovaras (Ed.), Springer, 325 p., 2008, ISBN: 978-3-540-78344-2: http://www.springerlink.com/content/p5g181/?p=e5b13daec3f84a4aa66ba95f38e0fe85&pi=0

 

in International Scientific Journals

1)      A. Karpov, A. Ronzhin. Information Enquiry Kiosk with Multimodal User Interface // Pattern Recognition and Image Analysis, Pleiades Publishing, Vol. 19, ¹ 3, 2009, pp.546-558:

 http://www.springerlink.com/content/ym766064658w12l7/?p=ad4e76356897411e90554fc1094cb60d&pi=6

2)      S. Argyropoulos, K. Moustakas, A. Karpov, O. Aran, D. Tzovaras, T. Tsakiris, G. Varni, B. Kwon. A Multimodal Framework for the Communication of the Disabled // Journal on Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 2, ¹ 2, 2008, pp. 105-116:

http://www.springerlink.com/content/a502q41464m3p858/?p=b9b3dbf5baec46b09d46ebca966dad4b&pi=7

3)      A. Ronzhin, A. Karpov. Russian Voice Interface // Pattern Recognition and Image Analysis, Pleiades Publishing, Vol. 17, No. 2, 2007, pp. 321–336:

http://www.springerlink.com/content/376u33001458177p/?p=ad4e76356897411e90554fc1094cb60d&pi=3

4)      A. Karpov, A. Ronzhin. ICANDO: Low Cost Multimodal Interface for Hand Disabled People // Journal on Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 1, No. 2, 2007, pp. 21-29:

http://www.springerlink.com/content/h53t18507u97h4h6/?p=b9b3dbf5baec46b09d46ebca966dad4b&pi=1

5)      M. Hruz, P. Campr, E. Dikici, A. Kindirouglu, Z. Krnoul, Al. Ronzhin, H. Sak, D. Schorno, L. Akarun, O. Aran, A. Karpov, M. Saraclar, M. Zelezny. Automatic Fingersign to Speech Translation System // Journal on Multimodal User Interfaces, Springer Berlin/Heidelberg, Vol. 4, No. 2, 2011, pp. 61-79:

http://www.springerlink.com/content/p274351661517j75/

6)      A. Kindiroglu, H. Yalcın, O. Aran, M. Hruz, P. Campr, L. Akarun, A. Karpov. Multi-lingual Fingerspelling Recognition in a Handicapped Kiosk // Pattern Recognition and Image Analysis, Pleiades Publishing, Vol. 21, No. 3, 2011, pp. 402-406:

http://www.springerlink.com/content/ax23232447264486/

7)      A. Kindiroglu, H. Yalcın, O. Aran, M. Hruz, P. Campr, L. Akarun, A. Karpov. Automatic Recognition of Fingerspelling Gestures in Multiple Languages for a Communication Interface for the Disabled // Pattern Recognition and Image Analysis. Pleiades Publishing, Vol. 22, 2012.

 

in Reviewed Proceedings of Top International Conferences

1)        A. Karpov, I. Kipyatkova, A. Ronzhin. Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis. In Proc. INTERSPEECH-2011 International Conference, ISCA Association, Florence, Italy, 2011, pp. 3161-3164.

2)        A. Karpov, A. Ronzhin, K. Markov, M. Zelezny. Viseme-Dependent Weight Optimization for CHMM-Based Audio-Visual Speech Recognition. In Proc. INTERSPEECH-2010 International Conference, ISCA Association, Makuhari, Chiba, Japan, 2010, pp. 2678-2681.

3)        A. Karpov, L. Tsirulnik, Z. Krnoul, A. Ronzhin, B. Lobanov, M. Zelezny. Audio-Visual Speech Asynchrony Modeling in a Talking Head. In Proc. INTERSPEECH-2009 International Conference, ISCA Association, Brighton, UK, 2009, pp. 2911-2914.

4)        A. Karpov, A. Ronzhin, A. Cadiou. A Multi-Modal System ICANDO: Intellectual Computer AssistaNt for the Disabled Operators. In Proc. INTERSPEECH-2006 International Conference, ISCA Association, Pittsburgh, PA, USA, 2006, pp. 1998-2001.

5)        A. Karpov, A. Ronzhin, I. Kipyatkova, Al. Ronzhin, L. Akarun. Multimodal Human Computer Interaction with MIDAS Intelligent Infokiosk. In Proc. 20th International Conference on Pattern Recognition ICPR-2010, IAPR Association, Turkey, Istanbul, 2010, pp. 3862-3865.

6)        A. Karpov, S. Carbini, A. Ronzhin, J.E. Viallet. Two Different SIMILAR Speech and Gestures Multimodal Interfaces. In Proc. 16th European Signal Processing Conference EUSIPCO-2008, EURASIP Association, Lausanne, Switzerland, 2008.

7)        A. Karpov, A. Ronzhin. ICANDO: Intelligent Computer AssistaNt for Disabled Operators. In Proc. 14th European Signal Processing Conference EUSIPCO-2006, EURASIP Association, Florence, Italy, 2006.

8)        A. Karpov, A. Ronzhin, A. Nechaev, S. Chernakova. Multimodal system for hands-free PC control. In Proc. 13th European Signal Processing Conference EUSIPCO-2005, EURASIP Association, Antalya, Turkey, 2005.

9)        A. Karpov, A. Ronzhin, I. Kipyatkova. An Assistive Bi-Modal User Interface Integrating Multi-Channel Speech Recognition and Computer Vision. In Proc. 14th International Conference on Human-Computer Interaction HCI International-2011, Springer-Verlag Berlin Heidelberg, LNCS 6762, Orlando, FL, USA, 2011, pp. 454-463.

10)    A. Ronzhin, A. Karpov, I. Kipyatkova. Designing Cognition-centric Smart Room Predicting Inhabitant Activities // In Proc. 13th International Conference on Human-Computer Interaction HCI International-2009, Springer, LNAI 5638, D. Schmorrow et al. (Eds.): Augmented Cognition, San Diego, CA, USA, 2009, pp. 78–87.

11)    A. Ronzhin, A. Karpov, M. Zelezny, R. Mesheryakov. Smart Multimodal Assistant for Disabled. In Proc. 12th International Conference on Human-Computer Interaction HCI International-2007, Springer-Verlag Berlin Heidelberg, LNCS series vol. 4550-4566, Beijing, PR China, 2007, pp. 201-205.

12)    A. Ronzhin, A. Karpov, I. Kipyatkova, M. Zelezny. Client and Speech Detection System for Intelligent Infokiosk. In Proc. 13th International Conference on Text, Speech and Dialog TSD-2010, Springer LNAI, Brno, Czech Republic, 2010, pp. 560–567.

13)    A. Karpov, A. Ronzhin, A. Leontyeva. A Semi-automatic Wizard of Oz Technique for Let’sFly Spoken Dialogue System. In Proc. 11th International Conference on Text, Speech and Dialog TSD-2008, Springer LNAI 5246, Brno, Czech Republic, 2008, pp. 585‑592.

14)    A. Karpov, A. Ronzhin, I. Kipyatkova, M. Zelezny. Influence of Phone-viseme Temporal Correlations on Audiovisual STT and TTS Performance. In Proc. 17th International Congress of Phonetic Sciences ICPhS-2011, Hong Kong, China, 2011, pp. 1030-1033.

15)    Yu. Kosarev, A. Ronzhin, A. Karpov, I. Lee. Continuous Speech Recognition without Use of High-Level Information, In Proc. 15th International Congress of Phonetic Sciences ICPhS-2003, Barcelona, Spain, 2003, pp 1373-1376.

16)    A. Karpov, A. Ronzhin. Russian Speech Recognition Model with Morphemic Analysis and Synthesis. In Proc. 19th International Congress on Acoustics 2007, Madrid, Spain, 2007.

17)    M. Sargin, O. Aran, A. Karpov, F. Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, M. Tekalp. Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis. In Proc. IEEE International Conference on Multimedia & Expo ICME-2006, Toronto, Canada, 2006.

18)    P. Campr, E. Dikici, M. Hruz, A. Kindiroglu, Z. Krnoul, Al. Ronzhin, H. Sak, D. Schorno, L. Akarun, O. Aran, A. Karpov, M. Saraclar, M. Zelezny. Automatic Fingersign to Speech Translator. In Proc. 6th Summer Workshop on Multimodal Interfaces eNTERFACE-2010, Amsterdam, The Netherlands, 2010, pp. 69-82.

19)    O. Aran, P. Campr, M. Hruz, A. Karpov, P. Santemiz, M. Zelezny. Sign-language-enabled Information Kiosk. In Proc. 4th Summer Workshop on Multimodal Interfaces eNTERFACE-2009, Orsay, France, 2009, pp. 24-33.

20)    S. Argyropoulos, K. Moustakas, A. Karpov, O. Aran, D. Tzovaras, T. Tsakiris, G. Varni, B. Kwon. A Multimodal Framework for the Communication of the Disabled. In Proc. 3rd Summer Workshop on Multimodal Interfaces eNTERFACE-2007, Istanbul, Turkey, 2007, pp. 27-36.

21)    A. Karpov, L. Tsirulnik, M. Zelezny, Z. Krnoul, A. Ronzhin, B. Lobanov. Study of Audio-Visual Asynchrony of Russian Speech for Improvement of Talking Head Naturalness. In Proc. 13th International Conference SPECOM-2009, St. Petersburg, Russia, 2009, pp. 130-135.

22)    M. Markaki, A. Karpov, E. Apostolopoulos, M. Astrinaki, Y. Stylianou, A. Ronzhin. A Hybrid System for Audio Segmentation and Speech-Endpoint Detection of Broadcast News. In Proc. 12th International Conference on Speech and Computer SPECOM-2007, Moscow, Russia, 2007, pp. 691-696.

23)    P. Cisar, J. Zelinka, M. Zelezny, A. Karpov, A. Ronzhin. Audio-Visual Speech Recognition for Slavonic Languages (Czech and Russian). In Proc. 11th International Conference SPECOM-2006, St. Petersburg, Russia, 2006, pp. 493-498.

24)    M. Železný, P. Císar, Z. Krnoul, A. Ronzhin, I. Li, A. Karpov. Design of Russian Audio-Visual Speech Corpus for Bimodal Speech Recognition. In Proc. 10th International Conference on Speech and Computer SPECOM-2005, Patras, Greece, pp. 397-400.

25)    A.L. Ronzhin, A.A. Karpov. Implementation of morphemic analysis for Russian speech recognition. In Proc. 9th International Conference on Speech and Computer SPECOM-2004, St. Petersburg, Russia, 2004, pp.291-296.

11. Computer skills

Programmer: MS Visual C++, Borland C++ Builder, Delphi, Assembler, HTML, DirectX, CORBA, COM, WinSockets, SQL.

Speech Processing Software: Hidden Markov Model Toolkit (HTK), ASR Julius, CMUSLM toolkit, Microsoft Speech API, Cool Edit, Praat.

Image Processing Software: OpenCV library, Intel AVCSR toolkit, Virtual Dub.

12. Foreign languages

English (fluent), German (initial)

13. Marital status

Married, wife Elena Karpova, one child

14. Hobbies and personal activities

Information Technologies, traveling, “Zenit St. Petersburg”, football, ping pong

 

RUSSIAN >>