The news in black and white: word embeddings quantify racism in South African news

Research output: Contribution to journalArticlepeer-review

Abstract

Does race bias manifest in South African news, and how can computational methods like word embeddings reveal it? After apartheid’s end in 1994, South Africa implemented policies to address racial and economic divides and transform institutions and structures, including the news media. This study introduces a computational approach to quantify race bias in South African news using neural embeddings. We trained word2vec word embeddings on COVID-19 vaccination news articles from 76 South African news sources. These large-scale embeddings are unbiased by design but can detect and reveal hidden biases in language. We found consistent race bias in the coverage of socioeconomic phenomena, while health results were weaker, mixed and likely corpus-dependent. COVID-19 may have also amplified associations between “Black” and unhealthy terms in news coverage. Our methodology complements traditional qualitative techniques and allows for a more objective and representative way of investigating racism in South African news. Findings are validated through multiple methods, including human ratings, and have implications for South African news and this research field.

Original languageEnglish
Article number83
JournalEPJ Data Science
Volume14
Issue number1
DOIs
Publication statusPublished - Dec 2025

Keywords

  • COVID-19 vaccination
  • Computational social science
  • Natural language processing
  • News media
  • Race bias
  • South Africa
  • Speaker names
  • Word embedding

ASJC Scopus subject areas

  • Modeling and Simulation
  • Computer Science Applications
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'The news in black and white: word embeddings quantify racism in South African news'. Together they form a unique fingerprint.

Cite this