Performance analysis of cost-sensitive learning methods with application to imbalanced medical data

Ibomoiye Domor Mienye, Yanxia Sun

Research output: Contribution to journalArticlepeer-review

87 Citations (Scopus)

Abstract

Many real-world machine learning applications require building models using highly imbalanced datasets. Usually, in medical datasets, the healthy patients or samples are dominant, making them the majority class, while the sick patients are few, making them the minority class. Researchers have proposed numerous machine learning methods to predict medical diagnosis. Still, the class imbalance problem makes it difficult for classifiers to adequately learn and distinguish between the minority and majority classes. Cost-sensitive learning and resampling techniques are used to deal with the class imbalance problem. This research focuses on developing robust cost-sensitive classifiers by modifying the objective functions of some well-known algorithms, such as logistic regression, decision tree, extreme gradient boosting, and random forest, which are then used to efficiently predict medical diagnosis. Meanwhile, as opposed to resampling techniques, our approach does not alter the original data distribution. Firstly, we implement the standard versions of these algorithms to provide a baseline for performance comparison. Secondly, we develop their corresponding cost-sensitive algorithms. For the proposed approaches, it is not necessary to change the distribution of the original data as the modified algorithms consider the imbalanced class distribution during training, thereby resulting in more reliable performance than when the data is resampled. Four popular medical datasets, including the Pima Indians Diabetes, Haberman Breast Cancer, Cervical Cancer Risk Factors, and Chronic Kidney Disease datasets, are used in the experiments to validate the performance of the proposed approach. The experimental results show that the cost-sensitive methods yield superior performance compared to the standard algorithms.

Original languageEnglish
Article number100690
JournalInformatics in Medicine Unlocked
Volume25
DOIs
Publication statusPublished - Jan 2021

Keywords

  • Cost-sensitive learning
  • Imbalanced classification
  • Machine learning
  • Medical diagnosis

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'Performance analysis of cost-sensitive learning methods with application to imbalanced medical data'. Together they form a unique fingerprint.

Cite this