TY - GEN
T1 - Effective Feature Selection for Improved Prediction of Heart Disease
AU - Mienye, Ibomoiye Domor
AU - Sun, Yanxia
N1 - Publisher Copyright:
© 2022, ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering.
PY - 2022
Y1 - 2022
N2 - Heart disease is among the most prevalent medical conditions globally, and early diagnosis is vital to reducing the number of deaths. Machine learning (ML) has been used to predict people at risk of heart disease. Meanwhile, feature selection and data resampling are crucial in obtaining a reduced feature set and balanced data to improve the performance of the classifiers. Estimating the optimum feature subset is a fundamental issue in most ML applications. This study employs the hybrid Synthetic Minority Oversampling Technique-Edited Nearest Neighbor (SMOTE-ENN) to balance the heart disease dataset. Secondly, the study aims to select the most relevant features for the prediction of heart disease. The feature selection is achieved using multiple base algorithms at the core of the recursive feature elimination (RFE) technique. The relevant features predicted by the various RFE implementations are then combined using set theory to obtain the optimum feature subset. The reduced feature set is used to build six ML models using logistic regression, decision tree, random forest, linear discriminant analysis, naïve Bayes, and extreme gradient boosting algorithms. We conduct experiments using the complete and reduced feature sets. The results show that the data resampling and feature selection leads to improved classifier performance. The XGBoost classifier achieved the best performance with an accuracy of 95.6%. Compared to some recently developed heart disease prediction methods, our approach obtains superior performance.
AB - Heart disease is among the most prevalent medical conditions globally, and early diagnosis is vital to reducing the number of deaths. Machine learning (ML) has been used to predict people at risk of heart disease. Meanwhile, feature selection and data resampling are crucial in obtaining a reduced feature set and balanced data to improve the performance of the classifiers. Estimating the optimum feature subset is a fundamental issue in most ML applications. This study employs the hybrid Synthetic Minority Oversampling Technique-Edited Nearest Neighbor (SMOTE-ENN) to balance the heart disease dataset. Secondly, the study aims to select the most relevant features for the prediction of heart disease. The feature selection is achieved using multiple base algorithms at the core of the recursive feature elimination (RFE) technique. The relevant features predicted by the various RFE implementations are then combined using set theory to obtain the optimum feature subset. The reduced feature set is used to build six ML models using logistic regression, decision tree, random forest, linear discriminant analysis, naïve Bayes, and extreme gradient boosting algorithms. We conduct experiments using the complete and reduced feature sets. The results show that the data resampling and feature selection leads to improved classifier performance. The XGBoost classifier achieved the best performance with an accuracy of 95.6%. Compared to some recently developed heart disease prediction methods, our approach obtains superior performance.
KW - Feature selection
KW - Heart disease
KW - Machine learning
KW - SMOTE-ENN
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85126989762&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-93314-2_6
DO - 10.1007/978-3-030-93314-2_6
M3 - Conference contribution
AN - SCOPUS:85126989762
SN - 9783030933135
T3 - Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
SP - 94
EP - 107
BT - Pan-African Artificial Intelligence and Smart Systems - 1st International Conference, PAAISS 2021,Proceedings
A2 - Ngatched, Telex Magloire
A2 - Woungang, Isaac
PB - Springer Science and Business Media Deutschland GmbH
T2 - 1st International Conference on Pan-African Intelligence and Smart Systems, PAAISS 2021
Y2 - 6 September 2021 through 8 September 2021
ER -