Sequence-based Prediction of Pathogen-host Interaction Using an Ensemble Learning Classifier and Moran Autocorrelation Feature Encoding Method

Main Article Content

Mohamad IrlinSunggawa, AlhadiBustamam, Titin Siswantining

Abstract

Pathogen–host protein interaction (PHI) is an interaction between two proteins from different organism. Knowledge about an
effect of a PHI help to study how a virus can infects an organism and also to develop a drug design for treat the corresponding
disease. There are a lot of computational methods that has been developed to predict whether or not an interaction between a
pair of protein so a researchers can learn PHI more efficient, especially in terms of cost and time. One of computational
method is to predict a possibility of protein interaction using only their amino acid sequences. This paper examined a method
of PHI prediction using moran autocorrelation as the encoding feature. In this paper, we develop an ensemble learning model
as classifier (ELC) using combination of SVM, RF and GBDT classifier. We also compare the result obtained from the
proposed method with the use the other machine learning methods such as gradient boosting, random forest, support vector
machine, and recurrent neural network. ELC was superior than the other in terms of accuracy, the MAC-ELC achieved average
accuracy up to 77.85, while the others are below 77%. The method we proposed also good in terms of give an average of
sensitivity 81.69%, specificity 73.90% and F1 score 78.92%.

Article Details

Section
Articles