TY - JOUR
T1 - Predictive modeling with missing data using an automatic relevance determination ensemble
T2 - A comparative study
AU - Duma, Mlungisi
AU - Twala, Bhekisipho
AU - Nelwamondo, Fulufhelo
AU - Marwala, Tshilidzi
PY - 2012
Y1 - 2012
N2 - The objective of this article is to present an automatic relevance determination ensemble as an effective variable extraction method for insurance datasets with large numbers of variables. Automatic relevance determination is a method that uses a Bayesian neural network and the evidence framework to rank variables in the order of relevance to the target variable. The current approach uses a single Bayesian neural network that searches only for local minima or maxima. In large datasets with numerous variables, this is a concern because we cannot be certain that the outcome is an optimal one. The method used to address this issue in this study is an automatic relevance determination ensemble with various configurations (or structures) of the Bayesian neural networks. Each outcome in the ensemble is determined by using a confidence factor rather than by scrutinizing the most probable weights values or hyperparameters directly. The extraction method is used with the repeated incremental pruning to produce error reduction, logistic discriminant analysis, and k-nearest neighbor models to evaluate the performance. Furthermore, the datasets employed contain escalating missing data to measure the accuracy and resilience of the models when they are used with the proposed ensemble. The ensemble is compared with the principal component analysis method. The results show that with the automatic relevance determination ensemble, the models achieve higher accuracies in performance than when used with the principal component analysis. Furthermore, the resilience and strength of models is higher when using the ensemble, compared with the principal component analysis method.
AB - The objective of this article is to present an automatic relevance determination ensemble as an effective variable extraction method for insurance datasets with large numbers of variables. Automatic relevance determination is a method that uses a Bayesian neural network and the evidence framework to rank variables in the order of relevance to the target variable. The current approach uses a single Bayesian neural network that searches only for local minima or maxima. In large datasets with numerous variables, this is a concern because we cannot be certain that the outcome is an optimal one. The method used to address this issue in this study is an automatic relevance determination ensemble with various configurations (or structures) of the Bayesian neural networks. Each outcome in the ensemble is determined by using a confidence factor rather than by scrutinizing the most probable weights values or hyperparameters directly. The extraction method is used with the repeated incremental pruning to produce error reduction, logistic discriminant analysis, and k-nearest neighbor models to evaluate the performance. Furthermore, the datasets employed contain escalating missing data to measure the accuracy and resilience of the models when they are used with the proposed ensemble. The ensemble is compared with the principal component analysis method. The results show that with the automatic relevance determination ensemble, the models achieve higher accuracies in performance than when used with the principal component analysis. Furthermore, the resilience and strength of models is higher when using the ensemble, compared with the principal component analysis method.
UR - http://www.scopus.com/inward/record.url?scp=84870275930&partnerID=8YFLogxK
U2 - 10.1080/08839514.2012.741377
DO - 10.1080/08839514.2012.741377
M3 - Article
AN - SCOPUS:84870275930
SN - 0883-9514
VL - 26
SP - 967
EP - 984
JO - Applied Artificial Intelligence
JF - Applied Artificial Intelligence
IS - 10
ER -