Abstract
This chapter explores the deep neural network (DNN) approach in speech emotion recognition (SER). DNN is solving difficult problems in artificial intelligence domain and its subdomains like computer vision. SER is an unsolved problem, and researchers are proposing different models to solve the pending issues. In this chapter, the existing deep learning (DL) approaches used in SER are being discussed in brief. Then, a novel model is built using DL architecture to produce results that can show directions toward building more robust solutions for SER. The dataset used here is EmoDB, a popular dataset for SER research, and data are augmented using random displacement technique. The network model used for this work is a feedforward neural network with four hidden layers. The model has produced approximately 10% cross-validation accuracy improvement over models trained on nonaugmented data.
Original language | English |
---|---|
Title of host publication | New Paradigms in Computational Modeling and Its Applications |
Publisher | Elsevier |
Pages | 177-187 |
Number of pages | 11 |
ISBN (Electronic) | 9780128221334 |
ISBN (Print) | 9780128221686 |
DOIs | |
Publication status | Published - 1 Jan 2021 |
Keywords
- Data augmentation
- Deep neural network
- Feature extraction
- Speech emotion recognition
ASJC Scopus subject areas
- General Biochemistry,Genetics and Molecular Biology