Abstract
This study explores the application of machine learning (ML) algorithms to predict lapses in investment policies, addressing a big challenge for insurance and financial services companies. The study compares three ensemble techniques: random forest (RF), gradient boosting (GB), and extreme gradient boosting (XGBoost), to identify the most effective model for predicting policy lapses and to determine the key factors influencing these predictions. The dataset used for this analysis is sourced from an anonymous insurance and financial services company on Kaggle, and includes data from 51,685 policies spanning from 2017 to 2020. Thorough data pre-processing, including handling missing values, outlier treatment, and feature scaling, is performed before training and evaluating the models. The results reveal that features such as tenure, number of missed payments, and total sum assured play a big role in predicting lapses. Random Forest is identified as the top-performing model. Furthermore, local interpretable model-agnostic explanations (LIME) is used to improve interpretability, offering detailed insights into feature contributions. These findings suggest that ML models, particularly Random Forest, are highly effective in predicting lapses in investment policies, offering valuable insights for insurance and financial services companies to manage and reduce policy lapses.
| Original language | English |
|---|---|
| Journal | Journal of Management Analytics |
| DOIs | |
| Publication status | Accepted/In press - 2025 |
Keywords
- LIME
- bagging
- boosting
- ensemble models
- investment policy lapses
- machine learning
ASJC Scopus subject areas
- Statistics and Probability
- Business, Management and Accounting (miscellaneous)
- Statistics, Probability and Uncertainty