標題: 使用二級式 QuickRBF 及 Fuzzy ARTMAP 演算法於蛋白質二級結構預測
Two-Stage QuickRBF and Fuzzy ARTMAP to Protein Secondary Structure Prediction
作者: 林彥宏
Yen-Hung Lin
張志永
Jyh-Yeong Chang
電控工程研究所
關鍵字: 基因體;生物資訊;蛋白質;分類;快速輻射半徑基底函數;支持向量機;genome;bioinformatic;protein;classification;QuickRBF;SVM
公開日期: 2005
摘要: 隨著人類基因定序及許多基因定序計畫陸續完成,序列的資料量將大幅成長,有效地分析這些序列更顯得重要了。基於物運作的原則(Central Dogma),蛋白質的功能與結構遂成為相當重要的研究議題,而目前在蛋白質相關問題的解決上,科學家都是利用X光繞射以及核磁共振 (NMR) 來取得實驗結果。這些方法雖然正確率高,但是相對地所要花費的時間及成本是相當高的。因此,研究人員便利用電腦科學來協助解決這些問題,相信是能夠有效降低實驗成本的。由於要了解完整蛋白質的功能必需從三級結構著手,但直接從蛋白質序列去預測它的三級結構是非常困難的議題,因此一個間接且有助益的方式,便是預測其二級結構。過去的研究中,學者們通常將蛋白質二級結構分成三種類別,分別是螺旋體(helix)、摺疊(sheet)、其他部份歸類為迴圈(loop)。因此我們可以將蛋白質二級結構預測視為一個普遍的分類問題 本篇論文中,我們提出利用二層快速徑向基底函數網路分類器(Quick Radial Basis Function)來預測蛋白質二級結構。快速徑向基底函數網路分類器能夠迅速地建構分類器的模型,其預測準確性更是不亞於目前廣受歡迎的機器學習演算法支持向量機(Support Vector Machine)。最後,將各層分類之結果合併,而有效的提高預測之準確度。在本研究中,我們使用著名的 RS126 資料集,以及 PSI- BLAST 所產生的 PSSM,所達到的最佳準確率為76.7%。
The majority of human coding regions have been sequenced and several genome sequencing projects have been completed. With the growth of large-scale sequencing data, an efficient approach to analyze protein is more important since protein function and structures are crucial issues in bioinformatics. Nowadays, scientists use X-ray diffraction or nuclear magnetic resonance (NMR) to solve the protein structure problems. Even though chemical experiments can achieve high accuracy, they in the mean time incur high cost and long time to solve the protein problems. Hence, computational tools are then applied thereto and considered as promising ways which not only reduce the time and the cost but also maintain reliable predictive results. The protein secondary prediction (PSS) is an intermediate but useful step for the three-dimensional (tertiary) structure prediction. In the previous work, researchers always focused on classifying three states of protein secondary structure: helix, strand and coil classes. It is a common classification problem for the prediction of protein secondary structure. In this thesis, a high-performance method was developed for protein secondary structure prediction based on the dual-layer QuickRBF technology that has been successfully applied in solving problems in the field of bioinformatics. The QuickRBF is capable of delivering the same level of performance as the state of art approach, SVM, while having execution efficiency during the phase to construct the classifier. The performance was further improved by combining PSSM profiles with the QuickRBF analysis where the PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the first fusion method. We report a maximum prediction accuracy of 76.7% on the famous RS126 dataset based on the PSI-BLAST profiles.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009312515
http://hdl.handle.net/11536/78195
顯示於類別:畢業論文


文件中的檔案:

  1. 251501.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。