TY - GEN
T1 - Improving the performance of the ripper in insurance risk classification
T2 - 8th International Conference on Informatics in Control, Automation and Robotics, ICINCO 2011
AU - Duma, Mlungisi
AU - Twala, Bhekisipho
AU - Marwala, Tshilidzi
AU - Nelwamondo, Fulufhelo V.
PY - 2011
Y1 - 2011
N2 - The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, feature selection technique is used to help improve the classification performance of the Ripper algorithm. Principal component analysis and evidence automatic relevance determination techniques are chosen to improve the performance of the Ripper. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the algorithm, and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper. The results show that with principal component analysis, the classification performance improves significantly as well as increase in resilience in the presence of escalating missing data.
AB - The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, feature selection technique is used to help improve the classification performance of the Ripper algorithm. Principal component analysis and evidence automatic relevance determination techniques are chosen to improve the performance of the Ripper. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the algorithm, and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper. The results show that with principal component analysis, the classification performance improves significantly as well as increase in resilience in the presence of escalating missing data.
KW - Artificial neural network
KW - Automatic relevance determination
KW - Missing data
KW - Principal component analysis
KW - Ripper
UR - http://www.scopus.com/inward/record.url?scp=80052566094&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80052566094
SN - 9789898425744
T3 - ICINCO 2011 - Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics
SP - 203
EP - 210
BT - ICINCO 2011 - Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics
Y2 - 28 July 2011 through 31 July 2011
ER -