TY - JOUR
T1 - Hybrid Predictive Machine Learning Model for the Prediction of Immunodominant Peptides of Respiratory Syncytial Virus
AU - Bukhari, Syed Nisar Hussain
AU - Ogudo, Kingsley A.
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/8
Y1 - 2024/8
N2 - Respiratory syncytial virus (RSV) is a common respiratory pathogen that infects the human lungs and respiratory tract, often causing symptoms similar to the common cold. Vaccination is the most effective strategy for managing viral outbreaks. Currently, extensive efforts are focused on developing a vaccine for RSV. Traditional vaccine design typically involves using an attenuated form of the pathogen to elicit an immune response. In contrast, peptide-based vaccines (PBVs) aim to identify and chemically synthesize specific immunodominant peptides (IPs), known as T-cell epitopes (TCEs), to induce a targeted immune response. Despite their potential for enhancing vaccine safety and immunogenicity, PBVs have received comparatively less attention. Identifying IPs for PBV design through conventional wet-lab experiments is challenging, costly, and time-consuming. Machine learning (ML) techniques offer a promising alternative, accurately predicting TCEs and significantly reducing the time and cost of vaccine development. This study proposes the development and evaluation of eight hybrid ML predictive models created through the permutations and combinations of two classification methods, two feature weighting techniques, and two feature selection algorithms, all aimed at predicting the TCEs of RSV. The models were trained using the experimentally determined TCEs and non-TCE sequences acquired from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) repository. The hybrid model composed of the XGBoost (XGB) classifier, chi-squared (ChST) weighting technique, and backward search (BST) as the optimal feature selection algorithm (ChST−BST–XGB) was identified as the best model, achieving an accuracy, sensitivity, specificity, F1 score, AUC, precision, and MCC of 97.10%, 0.98, 0.97, 0.98, 0.99, 0.99, and 0.96, respectively. Additionally, K-fold cross-validation (KFCV) was performed to ensure the model’s reliability and an average accuracy of 97.21% was recorded for the ChST−BST–XGB model. The results indicate that the hybrid XGBoost model consistently outperforms other hybrid approaches. The epitopes predicted by the proposed model may serve as promising vaccine candidates for RSV, subject to in vitro and in vivo scientific assessments. This model can assist the scientific community in expediting the screening of active TCE candidates for RSV, ultimately saving time and resources in vaccine development.
AB - Respiratory syncytial virus (RSV) is a common respiratory pathogen that infects the human lungs and respiratory tract, often causing symptoms similar to the common cold. Vaccination is the most effective strategy for managing viral outbreaks. Currently, extensive efforts are focused on developing a vaccine for RSV. Traditional vaccine design typically involves using an attenuated form of the pathogen to elicit an immune response. In contrast, peptide-based vaccines (PBVs) aim to identify and chemically synthesize specific immunodominant peptides (IPs), known as T-cell epitopes (TCEs), to induce a targeted immune response. Despite their potential for enhancing vaccine safety and immunogenicity, PBVs have received comparatively less attention. Identifying IPs for PBV design through conventional wet-lab experiments is challenging, costly, and time-consuming. Machine learning (ML) techniques offer a promising alternative, accurately predicting TCEs and significantly reducing the time and cost of vaccine development. This study proposes the development and evaluation of eight hybrid ML predictive models created through the permutations and combinations of two classification methods, two feature weighting techniques, and two feature selection algorithms, all aimed at predicting the TCEs of RSV. The models were trained using the experimentally determined TCEs and non-TCE sequences acquired from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) repository. The hybrid model composed of the XGBoost (XGB) classifier, chi-squared (ChST) weighting technique, and backward search (BST) as the optimal feature selection algorithm (ChST−BST–XGB) was identified as the best model, achieving an accuracy, sensitivity, specificity, F1 score, AUC, precision, and MCC of 97.10%, 0.98, 0.97, 0.98, 0.99, 0.99, and 0.96, respectively. Additionally, K-fold cross-validation (KFCV) was performed to ensure the model’s reliability and an average accuracy of 97.21% was recorded for the ChST−BST–XGB model. The results indicate that the hybrid XGBoost model consistently outperforms other hybrid approaches. The epitopes predicted by the proposed model may serve as promising vaccine candidates for RSV, subject to in vitro and in vivo scientific assessments. This model can assist the scientific community in expediting the screening of active TCE candidates for RSV, ultimately saving time and resources in vaccine development.
KW - T-cell epitope
KW - hybrid
KW - immunodominant peptides
KW - machine learning
KW - peptide-based vaccine
KW - predictive model
KW - respiratory syncytial virus
UR - http://www.scopus.com/inward/record.url?scp=85202618996&partnerID=8YFLogxK
U2 - 10.3390/bioengineering11080791
DO - 10.3390/bioengineering11080791
M3 - Article
AN - SCOPUS:85202618996
SN - 2306-5354
VL - 11
JO - Bioengineering
JF - Bioengineering
IS - 8
M1 - 791
ER -