Effective Feature Selection for Improved Prediction of Heart Disease

Ibomoiye Domor Mienye, Yanxia Sun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Citations (Scopus)

Abstract

Heart disease is among the most prevalent medical conditions globally, and early diagnosis is vital to reducing the number of deaths. Machine learning (ML) has been used to predict people at risk of heart disease. Meanwhile, feature selection and data resampling are crucial in obtaining a reduced feature set and balanced data to improve the performance of the classifiers. Estimating the optimum feature subset is a fundamental issue in most ML applications. This study employs the hybrid Synthetic Minority Oversampling Technique-Edited Nearest Neighbor (SMOTE-ENN) to balance the heart disease dataset. Secondly, the study aims to select the most relevant features for the prediction of heart disease. The feature selection is achieved using multiple base algorithms at the core of the recursive feature elimination (RFE) technique. The relevant features predicted by the various RFE implementations are then combined using set theory to obtain the optimum feature subset. The reduced feature set is used to build six ML models using logistic regression, decision tree, random forest, linear discriminant analysis, naïve Bayes, and extreme gradient boosting algorithms. We conduct experiments using the complete and reduced feature sets. The results show that the data resampling and feature selection leads to improved classifier performance. The XGBoost classifier achieved the best performance with an accuracy of 95.6%. Compared to some recently developed heart disease prediction methods, our approach obtains superior performance.

Original languageEnglish
Title of host publicationPan-African Artificial Intelligence and Smart Systems - 1st International Conference, PAAISS 2021,Proceedings
EditorsTelex Magloire Ngatched, Isaac Woungang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages94-107
Number of pages14
ISBN (Print)9783030933135
DOIs
Publication statusPublished - 2022
Event1st International Conference on Pan-African Intelligence and Smart Systems, PAAISS 2021 - Windhoek, Namibia
Duration: 6 Sept 20218 Sept 2021

Publication series

NameLecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
Volume405 LNICST
ISSN (Print)1867-8211
ISSN (Electronic)1867-822X

Conference

Conference1st International Conference on Pan-African Intelligence and Smart Systems, PAAISS 2021
Country/TerritoryNamibia
CityWindhoek
Period6/09/218/09/21

Keywords

  • Feature selection
  • Heart disease
  • Machine learning
  • SMOTE-ENN
  • XGBoost

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Effective Feature Selection for Improved Prediction of Heart Disease'. Together they form a unique fingerprint.

Cite this