Evaluating the impact of missing data imputation

Adam Pantanowitz, Tshilidzi Marwala

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Citations (Scopus)

Abstract

This paper presents an impact assessment for the imputation of missing data. The assessment is performed by measuring the impacts of missing data on the statistical nature of the data, on a classifier, and on a logistic regression system. The data set used is HIV seroprevalence data from an antenatal clinic study survey performed in 2001. Data imputation is performed through the use of Random Forests, selected based on best imputation performance above five other techniques. Test sets are developed which consist of the original data and of imputed data with varying numbers of specifically selected missing variables imputed. Results indicate that, for this data set, the evaluated properties and tested paradigms are fairly immune to missing data imputation. The impact is not highly significant, with, for example, linear correlations of 96 % between HIV status probability prediction with a full set and with a set of two imputed variables using the logistic regression analysis.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 5th International Conference, ADMA 2009, Proceedings
Pages577-586
Number of pages10
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event5th International Conference on Advanced Data Mining and Applications, ADMA 2009 - Beijing, China
Duration: 17 Aug 200919 Aug 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5678 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Conference on Advanced Data Mining and Applications, ADMA 2009
Country/TerritoryChina
CityBeijing
Period17/08/0919/08/09

Keywords

  • Impact
  • Imputation
  • Missing data
  • Random forest
  • Sensitivity

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Evaluating the impact of missing data imputation'. Together they form a unique fingerprint.

Cite this