Anchoring race: improving the construction of race dimensions in word embeddings

Research output: Contribution to journalArticlepeer-review

Abstract

Word embeddings have become powerful tools for detecting social biases encoded in language, yet research on measuring race bias through embeddings remains underdeveloped compared to studies on gender bias. This gap largely stems from the complexity of constructing race dimensions, which involve socially contested meanings and less clear semantic oppositions. Existing studies on race bias often rely on intuition and context-specific approaches when choosing anchor terms. In this paper, we address this methodological gap by providing statistical metrics to evaluate the quality and adaptability of race categories in embeddings. We apply these metrics to race categories across three embeddings—Google News (U.S.-centric), South African News (South African context), and Wikipedia (neutral, general-purpose). We find that names are effective for constructing race dimensions, with sub-Saharan African/European name categories producing more stable and generalisable dimensions than other categories, while American names were less generalisable. Validation shows that SSA/European name embeddings correlate most strongly with human ratings and demonstrate that our metrics capture human-perceived semantic structure of race. This research provides a framework for constructing robust race dimensions for measuring race bias in word embeddings.

Original languageEnglish
Article number20
JournalJournal of Computational Social Science
Volume9
Issue number1
DOIs
Publication statusPublished - Feb 2026

Keywords

  • Computational social science
  • Cultural measurement
  • Race axes
  • Race bias
  • Word embeddings

ASJC Scopus subject areas

  • Transportation
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Anchoring race: improving the construction of race dimensions in word embeddings'. Together they form a unique fingerprint.

Cite this