A NOVEL SPEECH RECOGNITION SYSTEM USING FUZZY NEURAL NETWORK
Main Article Content
Abstract
An important field in digital speech processing is human voice. Speech recognition in humans has long been a hot topic in signal processing and artificial intelligence research. Natural Language Processing (NLP) offers an interdisciplinary subfield called speech recognition that makes it easier for machines to recognize spoken language and transform it into text. To communicate with machines, speech recognition machines has the potential to be helpful. Technology that can have communicates in real time has been made possible. However, there remain additional issues, such as speaker variance due to other factors like age, gender, signal speed, pronunciation variations, noise from the surrounding area etc. Classification of age and gender is important for speech processing. A lot of work has been done to enhance each of these phases in order to get better and more accurate results. The main goal of this analysis is on the integration of machine learning into the speech recognition system. Hence, this analysis presents a novel speech recognition system using Fuzzy neural network. Pre-processing, speech signal segmentation, speech feature extraction, and speaker recognition are the different phases of a speech recognition system. It uses a fuzzy neural network to identify the speaker's age and gender. Accuracy, F1-score, and Precision are used to evaluate the fuzzy model's performance.
Downloads
Metrics
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
References
K. Žmolíková et al., "SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech
Mixtures," in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 4, pp. 800-814, Aug. 2019, doi:
1109/JSTSP.2019.2922820.
Livieris, Ioannis E., Emmanuel Pintelas, and Panagiotis Pintelas. 2019. "Gender Recognition by Voice Using an
Improved Self-Labeled Algorithm" Machine Learning and Knowledge Extraction, vol. 1, no. 1, pp. 492-503, 2019,
doi:10.3390/make1010030
P. Agrawal and S. Ganapathy, "Modulation Filter Learning Using Deep Variational Networks for Robust Speech
Recognition," in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 2, pp. 244-253, May 2019, doi:
1109/JSTSP.2019.2913965.
Noraini Seman, Ahmad Firdaus Norazam, “Hybrid methods of Brandt’s generalised likelihood ratio and short-term
energy for Malay word speech segmentation,” Indonesian Journal of Electrical Engineering and Computer Science, vol.
, no. 1, pp. 283-291, October 2019, doi: 10.11591/ijeecs.v16.i1.pp283-291
Sunanda Mendiratta, Neelam Turk, Dipali Bansal, “A Robust Isolated Automatic Speech Recognition System using
Machine Learning Techniques,” International Journal of Innovative Technology and Exploring Engineering (IJITEE),
vol. 8, no. 10, pp. 2325-2331, August 2019, doi: 10.35940/ijitee.J8765.0881019
F. Tao and C. Busso, "Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition,"
in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 7, pp. 1290-1302, July 2018, doi:
1109/TASLP.2018.2815268.
Dr. A. S Umesh , Prof. Ramesh Patole , Prof. Krishna Kulkarni, 2019, Automatic Recognition, Identifying Speaker
Emotion and Speaker Age Classification using Voice Signal, International Journal Of Engineering Research &
Technology (IJERT), vol. 08, no. 11,November 2019, doi:10.17577/IJERTV8IS110123
V. Mitra, W. Wang, C. Bartels, H. Franco and D. Vergyri, "Articulatory Information and Multiview Features for
Large Vocabulary Continuous Speech Recognition," 2018 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 5634-5638, doi: 10.1109/ICASSP.2018.8462028.
Gnevsheva, K., & Bürkle, D, “Age Estimation in Foreign-accented Speech by Native and Non-native
Speakers. Language and Speech, vol. 63, no. 1, pp. 166-183, 2020, doi:10.1177/0023830919827621
T. J. Sefara and A. Modupe, "Yorùbá Gender Recognition from Speech Using Neural Networks," 2019 6th
International Conference on Soft Computing & Machine Intelligence (ISCMI), Johannesburg, South Africa, 2019, pp.
-55, doi: 10.1109/ISCMI47871.2019.9004376.
M. Chen, X. He, J. Yang and H. Zhang, "3-D Convolutional Recurrent Neural Networks With Attention Model for
Speech Emotion Recognition," in IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440-1444, Oct. 2018, doi:
1109/LSP.2018.2860246.
Saeid Safavi, Martin Russell, Peter Jančovič, “Automatic speaker, age-group and gender identification from
children’s speech,” Computer Speech & Language, vol. 50, pp. 141-156, July 2018, doi: 10.1016/j.csl.2018.01.001
A. Jati and P. Georgiou, "Neural Predictive Coding Using Convolutional Neural Networks Toward Unsupervised
Learning of Speaker Characteristics," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27,
no. 10, pp. 1577-1589, Oct. 2019, doi: 10.1109/TASLP.2019.2921890.
Z. Tang, L. Li, D. Wang and R. Vipperla, "Collaborative Joint Training With Multitask Recurrent Model for Speech
and Speaker Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp.
-504, March 2017, doi: 10.1109/TASLP.2016.2639323.
P. J. Manamela, M. J. Manamela, T. I. Modipa, T. J. Sefara and T. B. Mokgonyane, "The Automatic Recognition
of Sepedi Speech Emotions Based on Machine Learning Algorithms," 2018 International Conference on Advances in
Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 2018, pp. 1-7, doi:
1109/ICABCD.2018.8465403.
G. S. Meltzner, J. T. Heaton, Y. Deng, G. De Luca, S. H. Roy and J. C. Kline, "Silent Speech Recognition as an
Alternative Communication Device for Persons With Laryngectomy," in IEEE/ACM Transactions on Audio, Speech,
and Language Processing, vol. 25, no. 12, pp. 2386-2398, Dec. 2017, doi: 10.1109/TASLP.2017.2740000.
Đ. T. Grozdić and S. T. Jovičić, "Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse
Filtering," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2313-2322, Dec.
, doi: 10.1109/TASLP.2017.2738559.
M. Kolbaek, D. Yu, Z. -H. Tan and J. Jensen, "Multitalker Speech Separation With Utterance-Level Permutation
Invariant Training of Deep Recurrent Neural Networks," in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, vol. 25, no. 10, pp. 1901-1913, Oct. 2017, doi: 10.1109/TASLP.2017.2726762.
C. Kurian, "Speech database and text corpora for Malayalam language automatic speech recognition
technology," 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization
of Speech Databases and Assessment Techniques (O-COCOSDA), Bali, Indonesia, 2016, pp. 7-11, doi:
1109/ICSDA.2016.7918975.
P. Sharma, V. Abrol and A. K. Sao, "Deep-Sparse-Representation-Based Features for Speech Recognition,"
in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 11, pp. 2162-2175, Nov. 2017,
doi: 10.1109/TASLP.2017.2748240.