Abstract
Heart disease is among the most prevalent medical conditions globally, and early diagnosis is vital to reducing the number of deaths. Machine learning (ML) has been used to predict people at risk of heart disease. Meanwhile, feature selection and data resampling are crucial in obtaining a reduced feature set and balanced data to improve the performance of the classifiers. Estimating the optimum feature subset is a fundamental issue in most ML applications. This study employs the hybrid Synthetic Minority Oversampling Technique-Edited Nearest Neighbor (SMOTE-ENN) to balance the heart disease dataset. Secondly, the study aims to select the most relevant features for the prediction of heart disease. The feature selection is achieved using multiple base algorithms at the core of the recursive feature elimination (RFE) technique. The relevant features predicted by the various RFE implementations are then combined using set theory to obtain the optimum feature subset. The reduced feature set is used to build six ML models using logistic regression, decision tree, random forest, linear discriminant analysis, naïve Bayes, and extreme gradient boosting algorithms. We conduct experiments using the complete and reduced feature sets. The results show that the data resampling and feature selection leads to improved classifier performance. The XGBoost classifier achieved the best performance with an accuracy of 95.6%. Compared to some recently developed heart disease prediction methods, our approach obtains superior performance.
| Original language | English |
|---|---|
| Title of host publication | Pan-African Artificial Intelligence and Smart Systems - 1st International Conference, PAAISS 2021,Proceedings |
| Editors | Telex Magloire Ngatched, Isaac Woungang |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 94-107 |
| Number of pages | 14 |
| ISBN (Print) | 9783030933135 |
| DOIs | |
| Publication status | Published - 2022 |
| Event | 1st International Conference on Pan-African Intelligence and Smart Systems, PAAISS 2021 - Windhoek, Namibia Duration: 6 Sept 2021 → 8 Sept 2021 |
Publication series
| Name | Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST |
|---|---|
| Volume | 405 LNICST |
| ISSN (Print) | 1867-8211 |
| ISSN (Electronic) | 1867-822X |
Conference
| Conference | 1st International Conference on Pan-African Intelligence and Smart Systems, PAAISS 2021 |
|---|---|
| Country/Territory | Namibia |
| City | Windhoek |
| Period | 6/09/21 → 8/09/21 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Feature selection
- Heart disease
- Machine learning
- SMOTE-ENN
- XGBoost
ASJC Scopus subject areas
- Computer Networks and Communications
Fingerprint
Dive into the research topics of 'Effective Feature Selection for Improved Prediction of Heart Disease'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver