A Survey of Classification Techniques in Speech Emotion Recognition

Tanmoy Roy, Tshilidzi Marwala, Snehashish Chakraverty

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

13 Citations (Scopus)

Abstract

Speech emotion recognition (SER) is a machine learning problem where the speech utterances are classified depending on their underlying emotions. This chapter presents an overview of the prominent classification techniques used in SER. There are two broad categories of classifiers used in SER, the linear classifiers and the non-linear classifiers. The chapter presents the details of feature sets that are heavily used so far in SER research and performed well in the classification stage. There are three prominent categories in speech features used in SER: the prosodic features, the spectral or vocal tract features, and the excitation source features. The chapter discusses four most prominent classifiers, namely, Hidden Markov model, Gaussian mixture model, Support vector machine, and deep neural network, for SER to depict the SER-specific implementation technique. It also provides information on the difficulties encountered in SER studies.

Original languageEnglish
Title of host publicationMathematical Methods in Interdisciplinary Sciences
Publisherwiley
Pages33-48
Number of pages16
ISBN (Electronic)9781119585640
ISBN (Print)9781119585503
DOIs
Publication statusPublished - 1 Jan 2020

Keywords

  • Gaussian mixture model
  • deep neural network
  • hidden Markov model
  • speech emotion recognition
  • speech features
  • support vector machine

ASJC Scopus subject areas

  • General Mathematics

Fingerprint

Dive into the research topics of 'A Survey of Classification Techniques in Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this