TY - GEN
T1 - Using principal component analysis and autoassociative Neural Networks to estimate missing data in a database
AU - Mistry, Jaisheel
AU - Nelwamondo, Fulufhelo V.
AU - Marwala, Tshilidzi
PY - 2008
Y1 - 2008
N2 - In this paper, three new methods on estimating missing data in a database using Neural Networks, Principal Component Analysis and Genetic Algorithms are presented. The proposed methods are tested on a set of data obtained from the South African Antenatal Survey. The data is a collection of demographic properties of patients. The proposed methods use Principal Component Analysis to remove redundancies and reduce the dimensionality in the data. Variations of autoassociative Neural Networks are used to further reduce the dimensionality of the data. A Genetic Algorithm is then used to find the missing data by optimizing the error function of the three variants of the Autoencoder Neural Network. The proposed system was tested on data with 1 to 6 missing fields in a single record of data and the accuracy of the estimated values were calculated and recorded. All methods are as accurate as a conventional feedforward neural network structure however the use of the newly proposed methods employs neural network architectures that have fewer hidden nodes.
AB - In this paper, three new methods on estimating missing data in a database using Neural Networks, Principal Component Analysis and Genetic Algorithms are presented. The proposed methods are tested on a set of data obtained from the South African Antenatal Survey. The data is a collection of demographic properties of patients. The proposed methods use Principal Component Analysis to remove redundancies and reduce the dimensionality in the data. Variations of autoassociative Neural Networks are used to further reduce the dimensionality of the data. A Genetic Algorithm is then used to find the missing data by optimizing the error function of the three variants of the Autoencoder Neural Network. The proposed system was tested on data with 1 to 6 missing fields in a single record of data and the accuracy of the estimated values were calculated and recorded. All methods are as accurate as a conventional feedforward neural network structure however the use of the newly proposed methods employs neural network architectures that have fewer hidden nodes.
KW - Auto associative Neural Network
KW - Autoencoder Neural Networks
KW - Missing data
KW - Principal Component Analysis and Genetic Algorithm
UR - http://www.scopus.com/inward/record.url?scp=70349110621&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:70349110621
SN - 1934272353
SN - 9781934272350
T3 - WMSCI 2008 - The 12th World Multi-Conference on Systemics, Cybernetics and Informatics, Jointly with the 14th International Conference on Information Systems Analysis and Synthesis, ISAS 2008 - Proc.
SP - 24
EP - 29
BT - WMSCI 2008 - The 12th World Multi-Conference on Systemics, Cybernetics and Informatics, Jointly with the 14th International Conference on Information Systems Analysis and Synthesis, ISAS 2008 - Proc.
T2 - 12th World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI 2008, Jointly with the 14th International Conference on Information Systems Analysis and Synthesis, ISAS 2008
Y2 - 29 June 2008 through 2 July 2008
ER -