標題: 從蛋白質序列預測殘基相對溶劑可接觸性
Prediction of Protein Relative Solvent Accessibility from Amino Acid Sequence
作者: 徐蔚倫
黃鎮剛
生物資訊及系統生物研究所
關鍵字: 結構預測;溶劑;蛋白質序列;solvent accessibility;protein structure;prediction
公開日期: 2004
摘要: 從序列資訊來預測蛋白質三級結構是目前生物學研究上非常重要的目標之一,而正確的預測蛋白質相對溶劑可接觸性則可以提供蛋白質三級結構相關的資訊。蛋白質相對溶劑可接觸性(RSA)代表著蛋白質上某一個氨基酸和溶劑接觸的程度。通常蛋白質的結合處會位於它的表面,因此,若能正確的預測蛋白質位於表面的氨基酸位置,就能夠更進一步的瞭解該蛋白質的功能。此外,一個蛋白質位於表面和包埋在蛋白質內部的氨基酸分佈,也被觀察到和蛋白質在細胞內的位置有很大的關連性。 我的方法是利用支持向量機將局部和整體的蛋白質資訊,其中最好的結果是利用位置加權矩陣(PSSM)、二級結構特徵值(secondary structure profile)和氨基酸親水程度(hydropathy indexes)作為輸入向量。這個方法對於RS126資料群在以25%為分類閾值時,可以達到77.2%,和最近幾年的在這方面的研究成果75%-78.3%達到相近的程度,而在將RSA分成十類的預測結果中也可達到15.2% 平均絕對誤差,預測值和實驗值達到0.51的相關性。
The prediction of the three-dimensional structure from its sequence is probably one of the most important goals of modern biology. The accurate prediction of protein relative solvent accessibility is useful for the prediction of tertiary structure of a protein. Amino acid solvent accessibility is the degree to which a residue in a protein is accessible to a solvent molecule. Because the binding sites of a protein are usually located on its surface, accurately predicting the surface residues can be regarded as an important step toward determining its function. On the other hand, it has been observed that the distribution of surface residues of a protein is correlated with its subcellular environments; consequently, information of surface residues may improve the prediction of protein subcellular localization. Presently, out best method is based on the support vector machines using as the input feature vectors, the sliding window that includes the local environment descriptors such as PSSM, secondary structure profile and hydropathy indexes. In my work, relative solvent accessibility based on a 2-state model, for 25%, 16%, 5%, and 0% accessibility are predicted at 77.2%, 77.1%, 80.4%, and 88.4% accuracy, respectively. Furthermore, solvent accessibility prediction methods have in recent years reached accuracy in the range of 75.0-78.3% at 25% threshold. And the results based in a 10-state model can reach 15.2% mean absolute error and 0.51 correlations.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009251507
http://hdl.handle.net/11536/77488
Appears in Collections:Thesis


Files in This Item:

  1. 150701.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.