Abstract
Protein is always a central part of the biology of the organism, it is essential to be familiar with the nature of proteins’ molecular level communications, in which the prediction of Protein-Protein Interactions (PPIs) plays the main role. This article proposes a new probabilistic feature extraction technique, termed Centroid-based feature (CF) abbreviated as CF-PPI, to generate a new feature from protein sequence, and then the random forest is used as a classifier to predict PPIs. CF-PPI considers the residual energy of the protein bond in the scenario to detect the interaction between proteins and resolve the protein’s length variation issue using probabilistic feature vectors. The PPI datasets which are used in this article are S. cerevisae, H. pylori, and Human, which achieved the average accuracy of 96.25%, 97.68%, and 97.69% respectively using the CF-PPI and Random Forest as a classifier and the comparison result proved superior to other existing results. The AUC score is also evaluated, additionally, a blind test is performed using five other species’ datasets which are independent of the training set with the same proposed feature approach. The experimental results prove that the CF-PPI is very promising and beneficial for looming proteomics research.
Original language | English |
---|---|
Pages (from-to) | 1037-1057 |
Number of pages | 21 |
Journal | Journal of Experimental and Theoretical Artificial Intelligence |
Volume | 35 |
Issue number | 7 |
DOIs | |
Publication status | Published - 2023 |
Externally published | Yes |
Keywords
- centroid
- energy level
- machine learning
- Protein-protein interactions
- random forest
- residual energy
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Artificial Intelligence