标题: 利用蛋白质序列预测双硫键键结情形
Prediction of Disulfide Connectivity from Protein Sequences
作者: 陈玉菁
Chen Yu-Ching
黄镇刚
Hwang Jenn-Kang
生物资讯及系统生物研究所
关键字: 双硫键;双硫键配对情形;支持向量法;遗传演算法;特征选取;S-S Predictor;disulfide bond;disulfide connectivity pattern;support vector machine;genetic algorithm;feature selection;S-S Predictor
公开日期: 2007
摘要: 双硫键对于蛋白质结构的稳定与蛋白质功能的调控有很大的影响力。目前蛋白质序列的资料量远多于蛋白质结构的数目;因此若能发展计算的方法,从蛋白质的序列来预测双硫键的配对情形(disulfide connectivity),将有助于双硫键蛋白质的研究。然而从蛋白质序列直接预测双硫键配对情形的困难度在于双硫键并不是序列上两邻近半胱氨酸(half-cystine) 的键结,而是邻近空间中两个半胱氨酸的键结,因此双硫键配对的预测充满着挑战。然而科学家们也研究各式各样的方法要从蛋白质序列来解开双硫键配对情形的问题,但目前用来预测的方法都局限于在双硫键个数小于等于五的蛋白质中。因为随着蛋白质中双硫键个数的增加,双硫键配对情形的类别变多,预测更为困难。
在此研究中,开发了一个预测双硫键配对情形的方法并命名为S-S predictor,其结合了序列比对与机器学习法。一方面利用序列比对的优点,比对出与欲预测双硫键配对情形蛋白质序列同源性且已知结构的双硫键蛋白;如此整合了序列与结构的关连性达到预测目的。另一方面,当欲预测双硫键之蛋白质无法比对出同源性蛋白质时,就使用支持向量法;本研究中找出有用的特征值来做预测。例如利用两两半胱氨酸周围胺基酸的演化资讯、两两半胱氨酸间在序列上的距离,还有整条蛋白质序列二十种胺基酸的变化。使用此方法,在序列相同程度小于30%的蛋白质作预测,其正确率就双硫键配对情形正确才算正确可达0.81 ( ),而双硫键的正确率达0.84 ( );此正确率超越其它方法,且无双硫键个数限制。S-S predictor的网址是http://140.113.239.214/~ssbond,对研究双硫键配对情形的使用者来说,是一个方便实用的预测方法。
The disulfide bonds have great influences in stabilizing protein structures and regulating protein functions. At present there is a gap between protein sequences and protein structures; therefore, it would be a great help to predict disulfide connectivity from protein sequences. However, the difficulties in predicting disulfide connectivity from protein sequences lie in the nonlocal properties of the disulfide bridges that involve cysteine pairs at large sequence separation. Although many scientists develop various methods to solve this problem; it is still a challenge. These methods are limited by the number of disulfide bonds should equal or less than five, because as the increase of disulfide bonds in proteins the number of disulfide connectivity grows rapidly, and it is more difficult to predict disulfide connectivity.
In this research, I developed a method to predict disulfide connectivity and named S-S predictor; it combines sequence alignment method and machine learning method. The searching dataset of sequence alignment are disulfide proteins with known structures; therefore, the advantages of this method integrate sequence and structure information to predict disulfide connectivity. On the other hand, when homologs of query protein can not be found, the support vector machines are used to solve problem. I found some useful feature vectors in this research; such as the coupling evolutionary information between the local sequence environments of cysteine pairs, the cysteines sequence separations, and the global sequence descriptor, amino acid content. The performance of S-S predictor based on a dataset whose sequence identity between two proteins is lower than 30% is 0.81 and 0.84 in and , respectively. The accuracy of this method is higher than other method, and there is no limitation on the number of disulfide bond. S-S predictor is a useful and practical tool to study disulfide connectivity, and the website of S-S predictor is http://140.113.239.214/~ssbond.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009251802
http://hdl.handle.net/11536/77498
显示于类别:Thesis


文件中的档案:

  1. 180201.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.