Advancing Model Performance with ADASYN and Recurrent Feature Elimination and Cross-Validation in Machine Learning-Assisted Credit Card Fraud Detection: A Comparative Analysis

Emmanuel Ileberi, Yanxia Sun

Research output: Contribution to journalArticlepeer-review

Abstract

Online card transactions have become more frequent due to the growth of e-commerce and financial technology apps. However, this also means more opportunities for credit card fraud, which affects banks, retailers, and card issuers. Therefore, we need systems that can protect the security and integrity of credit card transactions. In this study, we use the Adaptive Synthetic Minority Oversampling Technique to balance an imbalanced dataset; then we combine that with the Recursive Feature Elimination with the validation technique to enhance the performance of credit card fraud detection systems. We compare the results of several models, which are Decision Tree, Random Forests, Extreme Gradient Boosting, Light Gradient Boosting Machines, and Linear Regression, on the original imbalanced dataset and the resampled dataset using the Adaptive Synthetic Minority Oversampling Technique before finally applying the Recursive Feature Elimination with Cross Validation technique. We aim to find the best model and method for detecting credit card fraud. During training, k-fold cross-validation is applied to both sets of models in order to prevent overfitting and improve the classification. Our results show that the Adaptive Synthetic Minority Oversampling Technique and Recursive Feature Elimination with Cross Validation modified dataset improved overall classification errors over the baseline dataset. Specifically, the best performing models were the Extreme Gradient Boosting and Random Forests with a Matthew's Correlation Coefficient of 0.8794 and 0.8622, respectively, when used with the baseline dataset, and when used with Adaptive Synthetic Minority Oversampling Technique and Recursive Feature Elimination with Cross Validation dataset, we recorded a Matthew's Correlation Coefficient of 0.9994 and 0.9991 for the Extreme Gradient Boosting and Random Forest, respectively. Furthermore, our results show that the Light Gradient Boosting Machines model recorded the most improvement in Matthew's Correlation Coefficient from 0.3394 to 0.9980 when used with baseline and Adaptive Synthetic Minority Oversampling Technique modified datasets, respectively. This represents an increase of 194% in Matthew's Correlation Coefficient.

Original languageEnglish
Pages (from-to)133315-133327
Number of pages13
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Keywords

  • ADASYN
  • Credit card fraud detection
  • classification
  • imbalanced classes
  • machine learning
  • predictive modeling
  • recursive feature elimination

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Advancing Model Performance with ADASYN and Recurrent Feature Elimination and Cross-Validation in Machine Learning-Assisted Credit Card Fraud Detection: A Comparative Analysis'. Together they form a unique fingerprint.

Cite this