Overview
Speech and Multimodal Interfaces Laboratory of SPIIRAS was established in 2008 on basis of Speech Informatics Group which was organized in 1984. Our main research areas are speech recognition and understanding, speech dialogue systems, intellectual multimodal interfaces, biometric system based on analysis of speech, machine translation.
FUNDAMENTAL RESEARCH:
- Development of large vocabulary Russian speech recognition system based on morphemic analysis of speech and language.
- Automatic speech recognition and understanding: digital signal processing; recognition and modeling of speech patterns, integral processing the language and extra linguistic data during speech understanding.
- Multimodal systems for human-machine interaction.
- Methods for knowledge representation for automatic speech understanding systems.
- Estimation of psychophysiological state of a human by his speech and other biometric data.
- The applied speech dialogue systems: voice control for moving objects, machine translation of spoken language, systems for voice access to Internet resources, human-computer interaction.
APPLIED WORKS:
- Speech Interface for Internet Service "Yellow Pages of Saint-Petersburg";
- Speech control for aircraft;
- Speech control for domestic equipment (TV set, radio, etc.);
- Speech interface for infotelecommunication servises;
- Dialogue phrase-book system;
- System for robot voice operating;
- Speech dialogue for education;
- Multimodal system for hands-free PC control based in speech recognition and head tracking;
- Multimodal interface for car control;
MAIN SCIENTIFIC RESULTS:
- The speaker independent model of Russian continuous speech recognition was developed on basis of the morpheme analysis. The vocabulary of recognized lexical units was reduced on some orders by dividing the word forms into morphemes. As a result of such processing the invariance to grammatical deviation is provided and speed of Russian speech recognition is increased. The automatic system SIRIUS (SPIIRAS Interface for Recognition and Integral Understanding of Speech) was developed for large vocabulary Russian speech recognition.
- The original method for total integration of knowledge about speech, language and applied domain has been developed. The method allows to achieve high understanding accuracy and provide the system's robustness. In this method other kinds of knowledge (gestures, tactile, etc.) can be easily included too, for creation of the multimodal understanding system;
- The laboratory biennially holds the International Conference on Speech and Computer SPECOM in St. Petersburg.
MAIN PUBLICATIONS:
- A. Ronzhin, A. Karpov, I. Li. Book "Speech and Multimodal Interfaces". - M.: Science, 2006 - (Computer science: unlimited abilities and possible limits), 173 p. (in Rus)
- A. Karpov, A. Ronzhin. Information Enquiry Kiosk with Multimodal User Interface // Pattern Recognition and Image Analysis, Springer, Vol. 19, No. 3, 2009, pp. 546-558.
- A. Karpov, A. Ronzhin. ICANDO: Low Cost Multimodal Interface for Hand Disabled People // Journal on Multimodal User Interfaces, Springer, Vol. 1, No. 2, 2007, pp. 21-29.
- A. Ronzhin, A. Karpov. Russian Voice Interface // Pattern Recognition and Image Analysis, Springer, Vol. 17, No. 2, 2007, pp. 321–336.
- R. Yusupov, A. Ronzhin, M. Prischepa, Al. Ronzhin. Models and Hardware-Software Solutions for Automatic Control of Intelligent Hall // Automation and Remote Control, Springer, Vol. 72, No. 7, 2011, pp. 1389–1397.
- Y. Kosarev, A. Ronzhin. Chapter "Quantitative methods in speech processing" in the book "Quantitative Linguistics", Berlin: New York, DeGruyter, 2005, pp. 834-846.
- A. Karpov, S. Carbini, A. Ronzhin, J.E. Viallet, chapter "Two Similar Different Speech and Gestures Multimodal Interfaces" in the book "Multimodal User Interfaces: From Signals to Interaction", D. Tzovaras (Ed.), Springer Berlin, 325 p., 2008.