Understanding Emotions with Deep Learning: A Model for Detecting Speech and Facial Expressions
Main Article Content
Abstract
In recent years, significant progress has been made in the realm of artificial intelligence, machine learning, and human-machine interaction. One of the increasingly popular aspects is voice interaction, where people can command machines to perform specific tasks, and devices like smartphones and smart speakers are now integrated with voice assistants like Siri, Alexa, Cortana, and Google Assistant. However, despite these advancements, machines still have limitations when it comes to engaging in conversations with humans as true conversational partners. They struggle to recognize human emotions and respond appropriately, which is why emotion recognition from speech has become a cutting-edge research topic in the field of human-machine interaction. As our reliance on machines becomes more ingrained in our daily lives, there is a growing demand for a more robust man-machine communication system. Numerous researchers are currently dedicated to the field of speech emotion recognition (SER) with the aim of improving interactions between humans and machines. The ultimate goal is to develop computers capable of recognizing emotional states and reacting to them in a manner akin to how we humans do. To achieve this goal, the key lies in accurately extracting emotional features from speech and employing effective classifiers. In this project, our focus was on identifying four fundamental emotions: anger, sadness, neutral, and happiness from speech. To do so, we utilized a convolutional neural network (CNN) in conjunction with the Mel Frequency Cepstral Coefficient (MFCC) as the technique for feature extraction from speech. After conducting simulations, the results demonstrated the superiority of the proposed MFCC-CNN model when compared to existing approaches. This promising outcome brings us closer to realizing a more emotionally intelligent man-machine interaction system, paving the way for more natural and meaningful exchanges between humans and technology.
Downloads
Metrics
Article Details
Licensing
TURCOMAT publishes articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This licensing allows for any use of the work, provided the original author(s) and source are credited, thereby facilitating the free exchange and use of research for the advancement of knowledge.
Detailed Licensing Terms
Attribution (BY): Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses them or their use.
No Additional Restrictions: Users may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.