Precise detection of speech endpoints dynamically: A wavelet convolution based approach

Tanmoy Roy, Tshilidzi Marwala, Snehashish Chakraverty

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Precise detection of speech endpoints is an important factor which affects the performance of the systems where speech utterances need to be extracted from the speech signal such as Automatic Speech Recognition (ASR) system. Existing endpoint detection (EPD) methods mostly uses Short-Term Energy (STE), Zero-Crossing Rate (ZCR) based approaches and their variants. But STE and ZCR based EPD algorithms often fail in the presence of Non-speech Sound Artifacts (NSAs) produced by the speakers. Pattern recognition and classification techniques are also applied but those methods require labeled data for training. In this article, a novel approach is proposed to extract speech endpoints and the algorithm is termed as Wavelet Convolution based Speech Endpoint Detection (WCSED). WCSED decomposes the speech signal into high-frequency and low-frequency components using wavelet convolution and then computes information-entropy based thresholds for the two frequency components. The low-frequency thresholds are used to extract voiced speech segments, whereas the high-frequency thresholds are used to extract the unvoiced speech segments by filtering out the NSAs. WCSED does not require any labeled data for training and can automatically extract speech segments. Experiments are carried out on two speech databases and the results are promising even in the presence of NSAs.

Original languageEnglish
Pages (from-to)162-175
Number of pages14
JournalCommunications in Nonlinear Science and Numerical Simulation
Volume67
DOIs
Publication statusPublished - Feb 2019

Keywords

  • Pattern recognition
  • Signal processing
  • Speech endpoint detection
  • Speech recognition
  • Wavelet convolution

ASJC Scopus subject areas

  • Numerical Analysis
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Precise detection of speech endpoints dynamically: A wavelet convolution based approach'. Together they form a unique fingerprint.

Cite this