Abstract
Speech emotion recognition (SER) is a machine learning problem where the speech utterances are classified depending on their underlying emotions. This chapter presents an overview of the prominent classification techniques used in SER. There are two broad categories of classifiers used in SER, the linear classifiers and the non-linear classifiers. The chapter presents the details of feature sets that are heavily used so far in SER research and performed well in the classification stage. There are three prominent categories in speech features used in SER: the prosodic features, the spectral or vocal tract features, and the excitation source features. The chapter discusses four most prominent classifiers, namely, Hidden Markov model, Gaussian mixture model, Support vector machine, and deep neural network, for SER to depict the SER-specific implementation technique. It also provides information on the difficulties encountered in SER studies.
| Original language | English |
|---|---|
| Title of host publication | Mathematical Methods in Interdisciplinary Sciences |
| Publisher | wiley |
| Pages | 33-48 |
| Number of pages | 16 |
| ISBN (Electronic) | 9781119585640 |
| ISBN (Print) | 9781119585503 |
| DOIs | |
| Publication status | Published - 1 Jan 2020 |
Keywords
- Gaussian mixture model
- deep neural network
- hidden Markov model
- speech emotion recognition
- speech features
- support vector machine
ASJC Scopus subject areas
- General Mathematics