Understanding Emotions with Deep Learning: A Model for Detecting Speech and Facial Expressions

Main Article Content

Kondragunta Rama Krishnaiah, Alahari Hanumant Prasad

Abstract

In recent years, significant progress has been made in the realm of artificial intelligence, machine learning, and human-machine interaction. One of the increasingly popular aspects is voice interaction, where people can command machines to perform specific tasks, and devices like smartphones and smart speakers are now integrated with voice assistants like Siri, Alexa, Cortana, and Google Assistant. However, despite these advancements, machines still have limitations when it comes to engaging in conversations with humans as true conversational partners. They struggle to recognize human emotions and respond appropriately, which is why emotion recognition from speech has become a cutting-edge research topic in the field of human-machine interaction. As our reliance on machines becomes more ingrained in our daily lives, there is a growing demand for a more robust man-machine communication system. Numerous researchers are currently dedicated to the field of speech emotion recognition (SER) with the aim of improving interactions between humans and machines. The ultimate goal is to develop computers capable of recognizing emotional states and reacting to them in a manner akin to how we humans do. To achieve this goal, the key lies in accurately extracting emotional features from speech and employing effective classifiers. In this project, our focus was on identifying four fundamental emotions: anger, sadness, neutral, and happiness from speech. To do so, we utilized a convolutional neural network (CNN) in conjunction with the Mel Frequency Cepstral Coefficient (MFCC) as the technique for feature extraction from speech. After conducting simulations, the results demonstrated the superiority of the proposed MFCC-CNN model when compared to existing approaches. This promising outcome brings us closer to realizing a more emotionally intelligent man-machine interaction system, paving the way for more natural and meaningful exchanges between humans and technology.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
Kondragunta Rama Krishnaiah, Alahari Hanumant Prasad. (2023). Understanding Emotions with Deep Learning: A Model for Detecting Speech and Facial Expressions. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 13(03), 1445–1455. https://doi.org/10.17762/turcomat.v13i03.14011
Section
Research Articles