How Good Are Large Language Models at Arithmetic Reasoning in Low-Resource Language Settings?—A Study on Yorùbá Numerical Probes with Minimal Contamination

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

We study the performance of large language models (LLMs) in natural language understanding and natural language reasoning tasks in a low-resourced-language (LRL) setting. Using Yorùbá, an LRL, we curated a set of numerical probes with minimal contamination. The probes comprise three sets of questions—the first covers basic arithmetic, the second covers date and time (calendar system), and the last focuses on numerals and counting systems. Assessed in a zero-shot setup, three LLMs (ChatGPT, Gemini, and PaLM) were evaluated based on several metrics. The best-performing model, ChatGPT, generated some correct answers, showing logical steps in attaining the answers in Yorùbá (with an accuracy of 56% in set one, and 44% in set two). The second-best model (with an accuracy of 56% in set one, and 32% in set two) is Gemini. PaLM (with an accuracy of 16% in set one, and 8% in set two) showed the answers without logic. The three models performed poorly on the Yorùbá numerals question set (ChatGPT scored 8%, and Gemini and PaLM each had 0% accuracy). The study also revealed that there is significant room for improvement in the state of the art of LLMs when it comes to Yorùbá numerals.

Original languageEnglish
Article number4459
JournalApplied Sciences (Switzerland)
Volume15
Issue number8
DOIs
Publication statusPublished - Apr 2025

Keywords

  • large language models
  • low-resource languages
  • natural language reasoning
  • natural language understanding
  • Yorùbá numerical probes

ASJC Scopus subject areas

  • General Materials Science
  • Instrumentation
  • General Engineering
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

Fingerprint

Dive into the research topics of 'How Good Are Large Language Models at Arithmetic Reasoning in Low-Resource Language Settings?—A Study on Yorùbá Numerical Probes with Minimal Contamination'. Together they form a unique fingerprint.

Cite this