An enhanced methodology for predicting protein-protein interactions between human and hepatitis C virus via ensemble learning algorithms


Hepatitis C virus (HCV) is responsible for a variety of human life-threatening diseases, which include liver cirrhosis, chronic hepatitis, fibrosis and hepatocellular carcinoma (HCC) . Computational study of protein-protein interactions between human and HCV could boost the findings of antiviral drugs in HCV therapy and might optimize the treatment procedures for HCV infections. In this analysis, we constructed a prediction model for protein-protein interactions between HCV and human by incorporating the features generated by pseudo amino acid compositions, which were then carried out at two levels: categories and features. In brief, extra-tree was initially used for feature selection while SVM was then used to build the classification model. After that, the most suitable models for each category and each feature were selected by comparing with the three ensemble learning algorithms, that is, Random Forest, Adaboost, and Xgboost. According to our results, profile-based features were more suitable for building predictive models among the four categories. AUC value of the model constructed by Xgboost algorithm on independent data set could reach 92.66%. Moreover, Distance-based Residue, Physicochemical Distance Transformation and Profile-based Physicochemical Distance Transformation performed much better among the 17 features. AUC value of the Adaboost classifier constructed by Profile-based Physicochemical Distance Transformation on the independent dataset achieved 93.74%. Taken together, we proposed a better model with improved prediction capacity for protein-protein interactions between human and HCV in this study, which could provide practical reference for further experimental investigation into HCV-related diseases in future.


近年来,基于计算方法的病毒与宿主相互作用的研究受到越来越多的关注。寻找合适的特征表示方法和机器学习算法是所有计算方法的基础。本文根据PSE-in-One生成的特征,对多种机器学习算法进行了比较,最终找到了两个更适合丙型肝炎病毒-人类蛋白-蛋白质相互作用的预测模型。我们还采用该模型预测了四种可能与HCV NS4B相互作用的潜在人类蛋白。虽然这两种模式都能产生很好的效果,但仍有很多问题需要进一步改进。首先,我们没有分析不同功能组合的效果。在未来的研究中,我们将使用像LASSO这样的组特征选择策略来构建更令人满意的预测模型。其次,我们没有考虑丙型肝炎病毒蛋白本身之间的相互作用。例如,NS5A和核心蛋白之间的相互作用可以促进病毒在脂质附近的组装,这也将在我们未来的工作中解决。最后,基于所提出的模型构建一个用于HCV-宿主蛋白-蛋白质相互作用的网络服务器,对于对病毒-宿主相互作用感兴趣的生物学家来说将是一个更有用的工具。


Xin Liu, Liang Wang, Cheng-Hao Liang, Ya-Ping Lu, Ting Yang & Xiao Zhang
(2021): An enhanced methodology for predicting protein-protein interactions between human
and hepatitis C virus via ensemble learning algorithms, Journal of Biomolecular Structure and
Dynamics, DOI: 10.1080/07391102.2021.1946429