Synthetic Speech Data Generation Using Generative Adversarial Networks

Michael Norval, Zenghui Wang, Yanxia Sun

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


The capabilities of artificial intelligence (AI) and deep Learning are increasing rapidly with the increasing computing power and specialized microprocessors. A very interesting architecture, generative adversarial networks (GANs), is at the forefront of innovation. Some examples of what GAN networks are used for are text-to-image translation, image editing/manipulation, recreating images of higher resolution, and creating three-dimensional objects. When it comes to audio, Google WaveNET, Parallel WaveNET, and its successor Tacotron 1 and 2 are the frameworks of choice to create synthetic-based audio. In cases where there is not enough training data, one can synthetically generate data for further research and training. Methodology wise qualitative data samples can be synthetically generated for any language. This paper showcases data generation for the Afrikaans language. Here, we used a trained network to create Afrikaans speech clips based on text. Finally, when generating the same sentence multiple times, the clips have different emotional states. These clips are then verified, categorized, and used to train another network.

Original languageEnglish
Title of host publicationSignals and Communication Technology
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages10
Publication statusPublished - 2024

Publication series

NameSignals and Communication Technology
VolumePart F2203
ISSN (Print)1860-4862
ISSN (Electronic)1860-4870


  • Artificial intelligence
  • Generative adversarial networks
  • Synthesize
  • Tacotron 2
  • WaveNET

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Synthetic Speech Data Generation Using Generative Adversarial Networks'. Together they form a unique fingerprint.

Cite this