TY - CHAP
T1 - Synthetic Speech Data Generation Using Generative Adversarial Networks
AU - Norval, Michael
AU - Wang, Zenghui
AU - Sun, Yanxia
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2024
Y1 - 2024
N2 - The capabilities of artificial intelligence (AI) and deep Learning are increasing rapidly with the increasing computing power and specialized microprocessors. A very interesting architecture, generative adversarial networks (GANs), is at the forefront of innovation. Some examples of what GAN networks are used for are text-to-image translation, image editing/manipulation, recreating images of higher resolution, and creating three-dimensional objects. When it comes to audio, Google WaveNET, Parallel WaveNET, and its successor Tacotron 1 and 2 are the frameworks of choice to create synthetic-based audio. In cases where there is not enough training data, one can synthetically generate data for further research and training. Methodology wise qualitative data samples can be synthetically generated for any language. This paper showcases data generation for the Afrikaans language. Here, we used a trained network to create Afrikaans speech clips based on text. Finally, when generating the same sentence multiple times, the clips have different emotional states. These clips are then verified, categorized, and used to train another network.
AB - The capabilities of artificial intelligence (AI) and deep Learning are increasing rapidly with the increasing computing power and specialized microprocessors. A very interesting architecture, generative adversarial networks (GANs), is at the forefront of innovation. Some examples of what GAN networks are used for are text-to-image translation, image editing/manipulation, recreating images of higher resolution, and creating three-dimensional objects. When it comes to audio, Google WaveNET, Parallel WaveNET, and its successor Tacotron 1 and 2 are the frameworks of choice to create synthetic-based audio. In cases where there is not enough training data, one can synthetically generate data for further research and training. Methodology wise qualitative data samples can be synthetically generated for any language. This paper showcases data generation for the Afrikaans language. Here, we used a trained network to create Afrikaans speech clips based on text. Finally, when generating the same sentence multiple times, the clips have different emotional states. These clips are then verified, categorized, and used to train another network.
KW - Artificial intelligence
KW - Generative adversarial networks
KW - Synthesize
KW - Tacotron 2
KW - WaveNET
UR - http://www.scopus.com/inward/record.url?scp=85183371007&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-47100-1_11
DO - 10.1007/978-3-031-47100-1_11
M3 - Chapter
AN - SCOPUS:85183371007
T3 - Signals and Communication Technology
SP - 117
EP - 126
BT - Signals and Communication Technology
PB - Springer Science and Business Media Deutschland GmbH
ER -