APPROACH OF DIFFERENT MODELS OF MACHINE LEARNING IN AUTOMATIC SPEECH RECOGNITION OF BALKAN LANGUAGES

Authors

  • Dejan Dodić EDUKOM d.o.o. Vranje, Serbia

Keywords:

Automatic speech recognition (ASR), Gaussian mixture models, Hidden Markov models, Machine learning, Deep learning and speech, JAVA programming, Speech recognition (SR), e-Dictate

Abstract

Over the last few decades, there has been a tremendous development of machine learning paradigms used in automatic speech recognition (ASR) to automate the home (smart home) to space exploration. Although commercial speech recognizers are available for certain well-defined applications such as dictation and transcription, with a special emphasis on e-Dictate applications for Balkan language recognition, many problems in ASR such as recognition in noisy environments, multilingual recognition and multimodal recognition have yet to be addressed efficiently. A comprehensive overview of common machine learning (ML) techniques such as artificial neural networks, vector support machines, and Gaussian mixture models is provided along with the hidden Markov models represented in ASR.

References

Dudley, H. (1939).“The vocoder,” Bell Labs Rec., Vol. 17, pp. 122 6,

Dudley, H., Ryes, R. R., & Watkins, S. A. (1939). A synthetic speaker,” J. Franklin Inst., Vol. 227, pp. 739 64,

Davis, K. H., Biddulph, R., & Balashek, S. (1952). Automatic recognition of spoken digits,” J Acoust Soc Amer., vol. 24, no. 6, pp. 637 42, Nov.

Olson, H. F., & Belar, H. (1956). “Phonetic typewriter,” J Acoust Soc Amer., vol. 28, no. 6, pp. 1072 81, Nov.

Fry, D. B.(1959). “Theoretical aspects of the mechanical speech recognition,” J. Br. Inst. Radio Eng., Vol. 19, no. 4, pp. 211 29

Vintsyuk, T. K. (1968). “Speech discrimination by dynamic programming,” Kibernetika, Vol. 4, pp. 81 8,

Rabiner, L. R., S. E. Levinson, A. E. Rosenberg, and J. G. Wilpon, (1979). “Speaker independent recognition of isolated words using clustering techniques,” IEEE Trans. Acoust. Speech Signal Process., Vol. 27, pp. 336 49

Wilpon, J. G., L. R. Rabiner, C. H. Lee, and E. Goldman, (1990). “Automatic recognition of keywords in unconstrained speech using hidden Markov models,” IEEE Trans. Acoust. Speech Signal Process., Vol. 38, pp. 1870

Sahoo, S. K., T. Choubisa, and S. R. M. Prasanna, (2012). “Multimodal biometric person authentication: a review,” IETE Tech. Rev., Vol. 29, no. 1, pp. 54 75

Pati, D., and S. R. M. Prasanna, (2010). “Speaker recognition from excitation source perspective,” IETE Tech. Rev., Vol. 27, no. 2, pp. 138 57

Jayanna, H. S., and S. R. M. Prasanna, (2009). Analysis, feature extraction, modeling and testing techniques for speaker recognition,” IETE Tech. Rev., Vol. 26, no. 3, pp. 181 90, Sep.

Sachin Singh, Manoj Tripathy, and R. S. Anand, (2014). “Subjective and objective analysis of speech enhancement algorithms for single channel speech patterns of Indian and English languages,” IETE Tech. Rev., Vol. 31, no. 1, pp. 34 46,

Tom M. Mitchell, (1997). Machine Learning, New York, NY: McGraw Hill, International Edition

Baker, J. (1976). “Stochastic modeling for automatic speech recognition,” in Speech Recognition. R. Reddy, Ed. New York, NY: Academic Press, pp. 297 307.

Jelinek, F. (1976). “Continuous speech recognition by statistical methods,” Proc. IEEE, Vol. 64, no. 4, pp. 532 57,

Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, and D. OShgughnessy, (2009). “Research developments and directions in speech recognition and understanding. Part I,” IEEE Signal Process. Mag., Vol. 26, no. 3, pp. 75 80,

Rabiner, L., and B.-H. Juang, (1993). Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ,

Juang, B.-H., S. E. Levinson, and M. M. Sondhi, (1986). “Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains,” IEEE Trans. Inf. Theory, Vol. 32, no. 2, pp. 307 9

Deng, L., P. Kenny, M. Lennig, V. Gupta, F. Seitz, and P. Mermelsten, (1991). “Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition,” IEEE Trans. Acoust. Speech Signal Process., Vol. 39, no. 7, pp. 1677 81

Bilmes, (2006).“What HMMs can do,” IEICE Trans. Inf. Syst., Vol. E89-D, no. 3, pp. 869 91, Mar.

Bourlard, H., and C. J. Wellekens, (1989). “Links between Markov models and multilayer perceptrons,” in Advances in Neural Information Processing, D.S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, pp. 502 10.

Morgan, N. and H. Bourlard, (1990). “Continuous speech recognition using multilayer perceptrons with hidden Markov models,” in Proceedings of the IEEE International Conference ASSP, Albuquerque, NM, pp. 413 6.

Morgan, H. Hermansky, H. Bourlard, P. Kohn, and C. Wooters, (1991). “Continuous speech recognition using PLP analysis with multilayer perceptrons,” in Proceedings of the IEEE International Conference ASSP, Toronto, ON, pp. 49 52.

Stadermann, J. and G. Rigoll, (2004). “A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition,” in Proceedings of the Interspeech, Jeju island, Korea, pp. 661 4.

Zhang, S. A. Ragni, and M. Gales, (2010). “Structured log linear models for noise robust speech recognition,” IEEE Signal Process. Lett., Vol. 17, pp. 945 8, Nov. 2010.

Landauer, T. K., C. A. Kamm, and S. Singhal, (1987). “Teaching a minimally structured back propagation network to recognize speech,” in Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle,Washington, pp. 531 6.

Transformer-based Acoustic Modeling for Hybrid Speech Recognition (https://arxiv.org/abs/1910.09799).

Downloads

Published

2021-10-07

How to Cite

Dodić, D. (2021). APPROACH OF DIFFERENT MODELS OF MACHINE LEARNING IN AUTOMATIC SPEECH RECOGNITION OF BALKAN LANGUAGES. KNOWLEDGE - International Journal , 48(4), 659–663. Retrieved from https://ikm.mk/ojs/index.php/kij/article/view/4886