TY - JOUR
T1 - Socioeconomic and demographic factors associated with anaemia among women of reproductive age in Zimbabwe
T2 - a supervised machine learning approach
AU - Chemhaka, Garikayi
AU - Mbunge, Elliot
AU - Dzinamarira, Tafadzwa
AU - Musuka, Godfrey
AU - Batani, John
AU - Muchemwa, Benhildah
AU - Fashoto, Stephen
AU - Mapingure, Munyaradzi
AU - Makota, Rutendo Birri
AU - Petrus, Ester
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Anaemia affects approximately one-third of women of reproductive age globally, with the highest burden observed in resource-limited countries. Therefore, this study aimed to determine the socioeconomic and demographic factors associated with anaemia and predict anaemia among women in Zimbabwe. Using nationally representative, cross-sectional data from the 2015 Zimbabwe Demographic and Health Survey (DHS), a dataset from a sample of 5412 women of reproductive age was analyzed. The Chi-square test and multivariate logistic regression were employed to identify independent predictors of anaemia, while Elastic Net was used for feature importance scoring. To address the class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied. The prevalence of anaemia among women in Zimbabwe was 24.1%. Multivariate logistic regression revealed significant associations between anaemia and several factors, including older age (35–49 years) (adjusted Odds Ratio [aOR] = 1.31), marital status (being married) (aOR = 0.72), higher education (aOR = 0.47), middle household wealth (aOR = 1.32), professional occupation (aOR = 1.60), current use of modern contraceptives (aOR = 0.59), and overweight/obesity (aOR = 0.56). The highest burden was observed in Matabeleland South province (aOR = 3.44). Among prediction models, the random forest classifier outperformed K-Nearest Neighbors (KNN) and decision trees, achieving an accuracy of 74%, recall of 78%, F1-score of 75%, precision of 72%, and an Area Under the Curve (AUC) of 81.5%. Targeted interventions focusing on key socioeconomic and demographic characteristics could help reduce anaemia in women of reproductive age. Predictive models can aid healthcare practitioners in identifying at-risk individuals and implementing timely interventions to mitigate the impact of anaemia.
AB - Anaemia affects approximately one-third of women of reproductive age globally, with the highest burden observed in resource-limited countries. Therefore, this study aimed to determine the socioeconomic and demographic factors associated with anaemia and predict anaemia among women in Zimbabwe. Using nationally representative, cross-sectional data from the 2015 Zimbabwe Demographic and Health Survey (DHS), a dataset from a sample of 5412 women of reproductive age was analyzed. The Chi-square test and multivariate logistic regression were employed to identify independent predictors of anaemia, while Elastic Net was used for feature importance scoring. To address the class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied. The prevalence of anaemia among women in Zimbabwe was 24.1%. Multivariate logistic regression revealed significant associations between anaemia and several factors, including older age (35–49 years) (adjusted Odds Ratio [aOR] = 1.31), marital status (being married) (aOR = 0.72), higher education (aOR = 0.47), middle household wealth (aOR = 1.32), professional occupation (aOR = 1.60), current use of modern contraceptives (aOR = 0.59), and overweight/obesity (aOR = 0.56). The highest burden was observed in Matabeleland South province (aOR = 3.44). Among prediction models, the random forest classifier outperformed K-Nearest Neighbors (KNN) and decision trees, achieving an accuracy of 74%, recall of 78%, F1-score of 75%, precision of 72%, and an Area Under the Curve (AUC) of 81.5%. Targeted interventions focusing on key socioeconomic and demographic characteristics could help reduce anaemia in women of reproductive age. Predictive models can aid healthcare practitioners in identifying at-risk individuals and implementing timely interventions to mitigate the impact of anaemia.
KW - Anaemia
KW - Decision Trees
KW - Demographic
KW - K-Nearest Neighbors
KW - Logistic regression
KW - Machine learning
KW - Random Forest
KW - Socioeconomic
KW - Survey
UR - https://www.scopus.com/pages/publications/105003217767
U2 - 10.1186/s12982-025-00524-7
DO - 10.1186/s12982-025-00524-7
M3 - Article
AN - SCOPUS:105003217767
SN - 1742-7622
VL - 22
JO - Discover public health
JF - Discover public health
IS - 1
M1 - 142
ER -