TY - JOUR
T1 - Prediction of antigenic peptides of SARSCoV-2 pathogen using machine learning
AU - Bukhari, Syed Nisar Hussain
AU - Ogudo, Kingsley A.
N1 - Publisher Copyright:
© 2024 Bukhari and Ogudo
PY - 2024
Y1 - 2024
N2 - Antigenic peptides (APs), also known as T-cell epitopes (TCEs), represent the immunogenic segment of pathogens capable of inducing an immune response, making them potential candidates for epitope-based vaccine (EBV) design. Traditional wet lab methods for identifying TCEs are expensive, challenging, and time-consuming. Alternatively, computational approaches employing machine learning (ML) techniques offer a faster and more cost-effective solution. In this study, we present a robust XGBoost ML model for predicting TCEs of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus as potential vaccine candidates. The peptide sequences comprising TCEs and non-TCEs retrieved from Immune Epitope Database Repository (IEDB) were subjected to feature extraction process to extract their physicochemical properties for model training. Upon evaluation using a test dataset, the model achieved an impressive accuracy of 97.6%, outperforming other ML classifiers. Employing a five-fold cross-validation a mean accuracy of 97.58% was recorded, indicating consistent and linear performance across all iterations. While the predicted epitopes show promise as vaccine candidates for SARS-CoV-2, further scientific examination through in vivo and in vitro studies is essential to validate their suitability.
AB - Antigenic peptides (APs), also known as T-cell epitopes (TCEs), represent the immunogenic segment of pathogens capable of inducing an immune response, making them potential candidates for epitope-based vaccine (EBV) design. Traditional wet lab methods for identifying TCEs are expensive, challenging, and time-consuming. Alternatively, computational approaches employing machine learning (ML) techniques offer a faster and more cost-effective solution. In this study, we present a robust XGBoost ML model for predicting TCEs of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus as potential vaccine candidates. The peptide sequences comprising TCEs and non-TCEs retrieved from Immune Epitope Database Repository (IEDB) were subjected to feature extraction process to extract their physicochemical properties for model training. Upon evaluation using a test dataset, the model achieved an impressive accuracy of 97.6%, outperforming other ML classifiers. Employing a five-fold cross-validation a mean accuracy of 97.58% was recorded, indicating consistent and linear performance across all iterations. While the predicted epitopes show promise as vaccine candidates for SARS-CoV-2, further scientific examination through in vivo and in vitro studies is essential to validate their suitability.
KW - Antigenic peptide
KW - COVID-19
KW - Epitope-based vaccine
KW - Machine learning
KW - SARS-CoV-2
KW - T-cell epitope
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85207089446&partnerID=8YFLogxK
U2 - 10.7717/PEERJ-CS.2319
DO - 10.7717/PEERJ-CS.2319
M3 - Article
AN - SCOPUS:85207089446
SN - 2376-5992
VL - 10
JO - PeerJ Computer Science
JF - PeerJ Computer Science
M1 - e2319
ER -