Modifying Class Distributions to Improve the Classification of Minority Group Examples in a Class-Imbalanced Dataset

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Class-imbalanced datasets are a common occurrence in real-world applications. The imbalance between minority and majority classes exists due to the over-representation of one class compared to another in a dataset. The class imbalance might reflect a system’s behaviour over time. However, the class imbalance causes sub-optimal performance for machine learning models that predict the system’s future behaviour. Various techniques are used to reduce the negative impact of class-imbalanced datasets on machine learning models. Data resampling techniques are one of the main techniques, and the subdivisions of data resampling techniques include oversampling and undersampling. Oversampling techniques have outperformed undersampling techniques in most studies, and most data resampling techniques are derived from oversampling. However, some oversampling techniques are ineffective when used on minority-class datasets that lack within-class variation and have a high-class imbalance. In this study, an analysis was performed to understand the changes in within-class variation before and after oversampling for nine datasets. Additionally, classification performance was measured for standard and hybrid oversampled datasets. A novel hybrid oversampling technique that uses k-Means and ADASYN was implemented. Hybrid oversampling techniques generated synthetic examples that marginally changed the within-class variation and had the highest F1 score compared to standard oversampling techniques across nine datasets.

Original languageEnglish
Title of host publicationArtificial Intelligence Research - 6th Southern African Conference, SACAIR 2025, Proceedings
EditorsAurona Gerber, Anban W. Pillay
PublisherSpringer Science and Business Media Deutschland GmbH
Pages112-126
Number of pages15
ISBN (Print)9783032117328
DOIs
Publication statusPublished - 2026
Event6th Southern African Conference for Artificial Intelligence Research, SACAIR 2025 - Cape Town, South Africa
Duration: 1 Dec 20255 Dec 2025

Publication series

NameCommunications in Computer and Information Science
Volume2784 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference6th Southern African Conference for Artificial Intelligence Research, SACAIR 2025
Country/TerritorySouth Africa
CityCape Town
Period1/12/255/12/25

Keywords

  • ADASYN
  • Class imbalance
  • Classification algorithm
  • Oversampling
  • Within-class variation

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Modifying Class Distributions to Improve the Classification of Minority Group Examples in a Class-Imbalanced Dataset'. Together they form a unique fingerprint.

Cite this