基于多头注意力机制的磷酸化位点预测模型Phosphorylation Site Prediction Model Based on Multi-head Attention Mechanism
吴军;欧阳艾嘉;张琳;
摘要(Abstract):
计算预测蛋白质磷酸化位点的方法常用于位点识别的初筛阶段。为了提升位点初筛的准确率,本文提出一个深度学习模型MAPhos。该模型首先运用氨基酸向量与位置向量的和表示每一个氨基酸残基;随后使用双向GRU循环神经网络捕获各氨基酸残基的特征;接着引入多头注意力机制计算各注意力头的子上下文向量,并将它们连接起来构成肽段的上下文向量;最后通过一个全连接神经网络进行非线性变换和结果预测。真实数据集上的实验结果表明,MAPhos模型预测磷酸化位点在AUC值、灵敏度、正确率、精度和F_1分数统计度量上胜过基于特征提取的模型和基于卷积神经网络的模型,同时与基于卷积神经网络的模型相比具有更好的可解释性,这证明了MAPhos模型更加适用于磷酸化位点识别任务的初筛阶段。
关键词(KeyWords): 深度学习;生物信息学;磷酸化位点识别;多头注意力机制;残基表示
基金项目(Foundation): 国家自然科学基金(62066049);; 贵州省教育厅青年科技人才成长项目(黔教合KY字[2017]250);; 贵州省科技厅联合基金(黔科合LH字[2017]7069)
作者(Authors): 吴军;欧阳艾嘉;张琳;
DOI: 10.16088/j.issn.1001-6600.2021071301
参考文献(References):
- [1] 邓新宇,姜颖,贺福初.磷酸化蛋白质及多肽相关研究的技术进展[J].遗传,2007,22(10):1163-1166.DOI:10.16288/j.yczz.2007.10.001.
- [2] 李玲,徐小洁,叶棋浓.蛋白质修饰与肿瘤糖代谢[J].中国科学:生命科学,2015,45(11):1101-1109.DOI:10.1360/N052015-00067.
- [3] 季美超,付斌,张养军.基于质谱的蛋白质组学方法新进展[J].质谱学报,2021,42(5):862-877.DOI:10.7538/zpxb.2021.0091.
- [4] GAO J J,THELEN J J,DUNKER A K,et al.Musite:a tool for global prediction of general and kinase-specific phosphorylation sites[J].Molecular and Cellular Proteomics,2010,9(12):2586-2600.DOI:10.1074/mcp.M110.001388.
- [5] PEJAVER V,HSU W L,XIN F,et al.The structural and functional signatures of proteins that undergo multiple events of post translational modification[J].Protein Science,2014,23(8):1077-1093.DOI:10.1002/pro.2494.
- [6] XUE Y,LI A,WANG L R,et al.PPSP:prediction of pk-specific phosphorylation site with bayesian decision theory[J].BMC Bioinformatics,2006,7(1):163-170.DOI:10.1186/1471-2105-7-163.
- [7] FAN W W,XU X Y,SHEN Y,et al.Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest[J].Amino Acids,2014,46(4):1069-1078.DOI:10.1007/s00726-014- 1669-3.
- [8] 赵凌志,刘颖,覃征.Weighted SVM在蛋白质磷酸化位点预测中的应用[J].计算机工程与应用,2006,3(1):155-157.DOI:10.3778/j.issn.1002-8331.2006.03.155.
- [9] 邓文轩,杨航,靳婷.基于注意力机制的图像分类降维方法[J].广西师范大学学报(自然科学版),2021,39(2):32-40.DOI:10.16088/j.issn.1001-6600.2020090704.
- [10] WANG D L,ZENG S,XU C H,et al.MusiteDeep:a deep-learning framework for general and kinase-specific phosphorylation site prediction[J].Bioinformatics,2017,33(24):3909-3916.DOI:10.1093/bioinformatics/btx496.
- [11] LUO F L,WANG M H,LIU Y,et al.DeepPhos:prediction of protein phosphorylation sites with deep learning[J].Bioinformatics,2019,35(16):2766-2773.DOI:10.1093/bioinformatics/bty1051.
- [12] HU D C.An introductory survey on attention mechanisms in NLP problems[C]// Intelligent Systems and Applications:Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 2.Cham:Nature Switzerland AG,2020:432-448.DOI:10.1007/978-3-030-29513-43_1.
- [13] VENKATESH G,GROVER A,SRINIVA G,et al.MHCAttnNet:predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model[J].Bioinformatics,2020,36(1):399-406.DOI:10.1093/bioinformatics/btaa479.
- [14] UDDIN M,MAHBUB S,RAHMAN M,et al.SAINT:Self-attention augmented inception-inside-inception network improves protein secondary structure prediction[J].Bioinformatics,2020,40(1):1-10.DOI:10.1093/bioinformatics/btaa531.
- [15] HU H L,XIAO A,ZHANG S,et al.DeepHINT:under-standing HIV-1 integration via deep learning with attention[J].Bioinformatics,2020,35(10):1660-1667.DOI:10.1093/bioinformatics/bty842.
- [16] 笱程成,秦宇君,田甜,等.一种基于RNN的社交消息爆发预测模型[J].软件学报,2017,28(11):3030-3042.DOI:10.13328/j.cnki.jos.005333.
- [17] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.DOI:10.1162/neco.1997.9.8.1735.
- [18] 李丽双,周安桥,刘阳,等.基于动态注意力GRU的特定目标情感分类[J].中国科学:信息科学,2019,49(8):1019-1030.DOI:10.1360/N112018-00280.
- [19] YU X,ZHANG Y Q,GONG M G,et al.MGAT:multi-view graph attention networks[J].Neural Networks,2020,132(2):180-189.DOI:10.1016/j.neunet.2020.08.021.
- [20] 于海,赵玉丽,崔坤,等.一种基于交叉熵的社区发现算法[J].计算机学报,2015,38(8):1574-1581.DOI:10.11897/SP.J.1016.2015.01574.
- [21] CONSORTIUM T U,BOUGUELERET L.The universal protein resource (UniProt)[J].Nucleic Acids Research,2007,35(1):D154-D159.DOI:10.1093/nar/gki070.
- [22] HORNBECK P V,KORNHAUSER J M,SASHA T,et al.PhosphoSitePlus:a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse[J].Nucleic Acids Research,2012,42(2):D261-D270.DOI:10.1093/nar/gkr1122.
- [23] LU C ,HUANG K Y,SU M G,et al.dbPTM 3.0:an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications[J].Nucleic Acids Research,2013,41(1):D209-D305.DOI:10.1093/nar/gks1229.
- [24] 向陶然,叶笑春,李文明,等.基于细粒度数据流架构的稀疏神经网络全连接层加速[J].计算机研究与发展,2019,56(6):1192-1204.DOI:10.7544/issn1000-1239.2019.20190117.
- [25] 周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017,40(6):1229-1251.DOI:10.11897/SP.J.1016.2017.01229.
- [26] 吴军,段琼,张琳,等.磷酸化基序精确置换检验p-value的计算方法[J].中国科学:信息科学,2017,47(10):1334-1348.DOI:10.1360/N112017-00012.