Speech emotion recognition using deep learning

Tanmoy Roy, Marwala Tshilidzi, Snehashish Chakraverty

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Citation (Scopus)


This chapter explores the deep neural network (DNN) approach in speech emotion recognition (SER). DNN is solving difficult problems in artificial intelligence domain and its subdomains like computer vision. SER is an unsolved problem, and researchers are proposing different models to solve the pending issues. In this chapter, the existing deep learning (DL) approaches used in SER are being discussed in brief. Then, a novel model is built using DL architecture to produce results that can show directions toward building more robust solutions for SER. The dataset used here is EmoDB, a popular dataset for SER research, and data are augmented using random displacement technique. The network model used for this work is a feedforward neural network with four hidden layers. The model has produced approximately 10% cross-validation accuracy improvement over models trained on nonaugmented data.

Original languageEnglish
Title of host publicationNew Paradigms in Computational Modeling and Its Applications
Number of pages11
ISBN (Electronic)9780128221334
ISBN (Print)9780128221686
Publication statusPublished - 1 Jan 2021


  • Data augmentation
  • Deep neural network
  • Feature extraction
  • Speech emotion recognition

ASJC Scopus subject areas

  • General Biochemistry,Genetics and Molecular Biology


Dive into the research topics of 'Speech emotion recognition using deep learning'. Together they form a unique fingerprint.

Cite this