標題: 利用支持向量機器預測蛋白質中金屬鍵結區域
Prediction of Metal-Binding Site Residues Using Support Vector Machine
作者: 林肇基
Jau-Ji Lin
黃鎮剛
Jenn-Kang Hwang
生物資訊及系統生物研究所
關鍵字: 金屬;鍵結;區域;支持;向量;機器;學習;訓練;預測;蛋白質;資料庫;序列;結構;Metal;Binding;Site;Support;Vector;Machine;Learning;Training;Prediction;Protein;Data;Bank;PDB;Sequence;Structure
公開日期: 2004
摘要: 在工業及醫療應用上,能夠正確地辨別和分析蛋白質中的金屬鍵結區域(metal-binding site),將有助於鍵結區的模型建構和設計。近年來由於實驗技術的進展,生物相關方面的資料庫規模也快速成長,這使得利用機器學習(machine learning)來做預測的方法變得比以往更加實用及可靠。在本篇論文中,我們發展了一個利用支持向量機器(Support Vector Machine, SVM)的方法,在含有金屬離子的蛋白質中,預測金屬鍵結區域。我們同時利用了一維的胺基酸序列和三維的結構資訊來對一條蛋白質鏈作編碼。實驗結果發現,使用緩衝區(buffer zone)來區別鍵結和非鍵結區域的殘基,可有效地提高預測準確度。經過五重交互驗證的結果,預測平均正確率可達到97.4%,在偽陽性比例(false positive rate)5%的情況下,真陽性比例(true positive rate)可達到46.2%。這個結果顯示,SVM的使用並配合適當的編碼資訊,能夠有效地預測蛋白質中金屬鍵結區域。
Correct identification and analysis of the metal-binding site provides useful clues to the modeling and designing of the binding site in proteins for industrial and therapeutic purposes. As the number of the biological data is rapidly accumulated, the use of machine learning approach to do the prediction becomes more reliable now than ever. We have developed a method using support vector machine (SVM) to predict the metal-binding site residues in proteins containing metal ions. The information used to encode the site residues includes sequence profiles and structural features. The results show that the use of buffer zone can effectively improve the true positive rate (TPR) of the prediction. On five-fold cross-validation, we obtain an average prediction accuracy of 97.4% and 46.2% TPR at a 5% false positive rate (FPR). The results indicate that the use of SVM with suitable coding schemes is an effective way to predict the metal-binding sites in proteins.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009251508
http://hdl.handle.net/11536/77489
顯示於類別:畢業論文


文件中的檔案:

  1. 150801.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。