TY - GEN
T1 - Predicting HIV Status Using Machine Learning Techniques and Bio-Behavioural Data from the Zimbabwe Population-Based HIV Impact Assessment (ZIMPHIA15-16)
AU - Chingombe, Innocent
AU - Musuka, Godfrey
AU - Mbunge, Elliot
AU - Chemhaka, Garikayi
AU - Cuadros, Diego F.
AU - Murewanhema, Grant
AU - Chaputsira, Simbarashe
AU - Batani, John
AU - Muchemwa, Benhildah
AU - Mapingure, Munyaradzi P.
AU - Dzinamarira, Tafadzwa
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - HIV and AIDS continue to be a significant public health concern globally, with about 36 million people currently living with the epidemic. Several HIV interventions have been implemented to intensify virus transmission prevention, screening, and diagnosis in sub-Saharan African countries, including Zimbabwe. HIV prevalence is substantially high in Zimbabwe despite the significant progress made in the previous years. As the country moves closer to attaining the epidemic control status, there is a need for targeted HIV interventions focusing on HIV risk individuals. Most current HIV interventions are based on evidence about specific sub-population groups, undermining the diversity of individual risk levels within such groups. Therefore, this study applied random forest classifier, support vector machine, and logistic regression to predict HIV status outcomes using Zimbabwe Population-Based HIV Impact Assessment data to identify high-risk individuals and develop targeted interventions based on risk. This study shows that logistic regression outperformed the random forest classifier and support vector machine with the prediction accuracy of 85%, recall of 98%, and F1-score of 92%. However, the random forest classifier has the highest precision of 87% compared to the other models. The support vector machine outperformed the random forest classifier in recall and F1-score metrics, with a recall of 96% and F1-score of 91%. Machine learning models can help identify individuals at high risk of contracting HIV and assist policymakers in developing targeted HIV prevention and screening strategies informed with socio-demographic and risk behavioural data. However, this study only used socio-demographic and behavioural predictors to predict HIV status. There is a need to include other HIV clinical predictors to optimise HIV status prediction models better and further integrate them into real-world healthcare settings.
AB - HIV and AIDS continue to be a significant public health concern globally, with about 36 million people currently living with the epidemic. Several HIV interventions have been implemented to intensify virus transmission prevention, screening, and diagnosis in sub-Saharan African countries, including Zimbabwe. HIV prevalence is substantially high in Zimbabwe despite the significant progress made in the previous years. As the country moves closer to attaining the epidemic control status, there is a need for targeted HIV interventions focusing on HIV risk individuals. Most current HIV interventions are based on evidence about specific sub-population groups, undermining the diversity of individual risk levels within such groups. Therefore, this study applied random forest classifier, support vector machine, and logistic regression to predict HIV status outcomes using Zimbabwe Population-Based HIV Impact Assessment data to identify high-risk individuals and develop targeted interventions based on risk. This study shows that logistic regression outperformed the random forest classifier and support vector machine with the prediction accuracy of 85%, recall of 98%, and F1-score of 92%. However, the random forest classifier has the highest precision of 87% compared to the other models. The support vector machine outperformed the random forest classifier in recall and F1-score metrics, with a recall of 96% and F1-score of 91%. Machine learning models can help identify individuals at high risk of contracting HIV and assist policymakers in developing targeted HIV prevention and screening strategies informed with socio-demographic and risk behavioural data. However, this study only used socio-demographic and behavioural predictors to predict HIV status. There is a need to include other HIV clinical predictors to optimise HIV status prediction models better and further integrate them into real-world healthcare settings.
KW - HIV/AIDS
KW - Logistic regression
KW - Machine learning
KW - Prediction
KW - Random forest classifier
KW - Support vector machine
KW - ZIMPHIA
UR - http://www.scopus.com/inward/record.url?scp=85135075691&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-09076-9_24
DO - 10.1007/978-3-031-09076-9_24
M3 - Conference contribution
AN - SCOPUS:85135075691
SN - 9783031090752
T3 - Lecture Notes in Networks and Systems
SP - 247
EP - 258
BT - Artificial Intelligence Trends in Systems - Proceedings of 11th Computer Science On-line Conference 2022, Vol 2
A2 - Silhavy, Radek
PB - Springer Science and Business Media Deutschland GmbH
T2 - 11th Computer Science On-line Conference, CSOC 2022
Y2 - 26 April 2022 through 26 April 2022
ER -