標題: 利用支援向量機及胺基酸成對位置權重矩陣特徵預測蛋白質中酪胺酸硫酸化位置
Protein Tyrosine Sulfation Sites Predcition: Based on Support Vector Machine and Pairwise Position Weighted Matrix of Amino Acid Sequence
作者: 黃柏淳
Huang, Po-Tsun
荊宇泰
Ching, Yu-Tai
生醫工程研究所
關鍵字: 轉譯後修飾;酪胺酸;硫酸化;支援向量機;預測;成對位置權重矩陣;硫酸化酪胺酸;Post-translational modification;tyrosine;sulfation;support vector machine;prediction;pairwise position weighted matrix;sulfotyrosine
公開日期: 2012
摘要: 蛋白質酪胺酸硫酸化是一個常見的轉譯後修飾,知道酪胺酸硫酸化的位置可以預測後續生化反應之重要胺基酸,但目前沒有準確的特徵能完全決定酪胺酸硫酸化的位置,而且已知的酪胺酸硫酸化位置不多,其數量與不能被硫酸化的酪胺酸數量相差了26倍,本論文利用支援向量機及經由成對位置權重矩陣所編碼的胺基酸序列進行酪胺酸硫酸化位置的預測,利用重複抽樣已知的資料訓練出多個支援向量機預測模組,由這些模組的預測結果進行多數決當成最終的預測結果來解決可以硫酸化與不能硫酸化數量落差的問題。實驗結果顯示,單一支援向量機預測模組在5次的交叉驗證下,平均可以到達99.2%的準確率,而對於所有已知資料的預測,則有98.3%的預測準確率。最後我們對於成對位置權重矩陣做了分析,發現一些胺基酸成對出現的特性,例如,可以被硫酸化的酪胺酸兩側同時出現酸性胺基酸的機率相對較高,以及色胺酸搭配酸性胺基酸出現的機率也相對較高,這些特性可以幫助生物學家對於酪胺酸硫酸化做進一步的研究。
Protein tyrosine sulfation is one of the common post-translation modifications. Identifying the tyrosine sulfation sites is important for biologists to predict biochemical interactions. However, the determinant features of tyrosine sulfation sites are unknown. Moreover, the number of experimental sulfotyrosine sites is few, and the number of non-sulfotyrosine sites is 26 times more than the number of sulfotyrosine sites. The thesis presents a prediction method based on support vector machine (SVM) with amino acid sequence encoded by pairwise position weighted matrix (PPWM) to predict tyrosine sulfation sites. Due to the number of sulfotyrosine sites are less than non-sulfotyrosine sites, we incorporates resampling of training data to build multiple SVM models. The final prediction is made by a voting mechanism from those models. A single SVM model achieves an accuracy of 99.2% in average under five-fold cross validation. The proposed method achieves an accuracy of 98.3% when testing all known tyrosine sites with voting. In addition, we discovered that some patterns such as acidic amino acid occurs on each side of tyrosine residue, and Tryptophan (W) couples with acidic amino acid occur more frequently within sulfotyrosine subsequence by analyzing PPWM. The results may help biologists to discover tyrosine sulfation.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079930509
http://hdl.handle.net/11536/49997
顯示於類別:畢業論文