標題: 利用蛋白質序列預測雙硫鍵鍵結情形
Prediction of Disulfide Connectivity from Protein Sequences
作者: 陳玉菁
Chen Yu-Ching
黃鎮剛
Hwang Jenn-Kang
生物資訊及系統生物研究所
關鍵字: 雙硫鍵;雙硫鍵配對情形;支持向量法;遺傳演算法;特徵選取;S-S Predictor;disulfide bond;disulfide connectivity pattern;support vector machine;genetic algorithm;feature selection;S-S Predictor
公開日期: 2007
摘要: 雙硫鍵對於蛋白質結構的穩定與蛋白質功能的調控有很大的影響力。目前蛋白質序列的資料量遠多於蛋白質結構的數目;因此若能發展計算的方法,從蛋白質的序列來預測雙硫鍵的配對情形(disulfide connectivity),將有助於雙硫鍵蛋白質的研究。然而從蛋白質序列直接預測雙硫鍵配對情形的困難度在於雙硫鍵並不是序列上兩鄰近半胱氨酸(half-cystine) 的鍵結,而是鄰近空間中兩個半胱氨酸的鍵結,因此雙硫鍵配對的預測充滿著挑戰。然而科學家們也研究各式各樣的方法要從蛋白質序列來解開雙硫鍵配對情形的問題,但目前用來預測的方法都局限於在雙硫鍵個數小於等於五的蛋白質中。因為隨著蛋白質中雙硫鍵個數的增加,雙硫鍵配對情形的類別變多,預測更為困難。 在此研究中,開發了一個預測雙硫鍵配對情形的方法並命名為S-S predictor,其結合了序列比對與機器學習法。一方面利用序列比對的優點,比對出與欲預測雙硫鍵配對情形蛋白質序列同源性且已知結構的雙硫鍵蛋白;如此整合了序列與結構的關連性達到預測目的。另一方面,當欲預測雙硫鍵之蛋白質無法比對出同源性蛋白質時,就使用支持向量法;本研究中找出有用的特徵值來做預測。例如利用兩兩半胱氨酸周圍胺基酸的演化資訊、兩兩半胱氨酸間在序列上的距離,還有整條蛋白質序列二十種胺基酸的變化。使用此方法,在序列相同程度小於30%的蛋白質作預測,其正確率就雙硫鍵配對情形正確才算正確可達0.81 ( ),而雙硫鍵的正確率達0.84 ( );此正確率超越其它方法,且無雙硫鍵個數限制。S-S predictor的網址是http://140.113.239.214/~ssbond,對研究雙硫鍵配對情形的使用者來說,是一個方便實用的預測方法。
The disulfide bonds have great influences in stabilizing protein structures and regulating protein functions. At present there is a gap between protein sequences and protein structures; therefore, it would be a great help to predict disulfide connectivity from protein sequences. However, the difficulties in predicting disulfide connectivity from protein sequences lie in the nonlocal properties of the disulfide bridges that involve cysteine pairs at large sequence separation. Although many scientists develop various methods to solve this problem; it is still a challenge. These methods are limited by the number of disulfide bonds should equal or less than five, because as the increase of disulfide bonds in proteins the number of disulfide connectivity grows rapidly, and it is more difficult to predict disulfide connectivity. In this research, I developed a method to predict disulfide connectivity and named S-S predictor; it combines sequence alignment method and machine learning method. The searching dataset of sequence alignment are disulfide proteins with known structures; therefore, the advantages of this method integrate sequence and structure information to predict disulfide connectivity. On the other hand, when homologs of query protein can not be found, the support vector machines are used to solve problem. I found some useful feature vectors in this research; such as the coupling evolutionary information between the local sequence environments of cysteine pairs, the cysteines sequence separations, and the global sequence descriptor, amino acid content. The performance of S-S predictor based on a dataset whose sequence identity between two proteins is lower than 30% is 0.81 and 0.84 in and , respectively. The accuracy of this method is higher than other method, and there is no limitation on the number of disulfide bond. S-S predictor is a useful and practical tool to study disulfide connectivity, and the website of S-S predictor is http://140.113.239.214/~ssbond.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009251802
http://hdl.handle.net/11536/77498
Appears in Collections:Thesis


Files in This Item:

  1. 180201.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.