標題: 資料融合及山峰群聚法應用於改善蛋白質結構預測與分析
Improvement of Protein Structure Prediction and Analysis by Data Fusion and Mountain Clustering Approaches
作者: 林肯豊
Lin, Ken-Li
林進燈
Lin, Chin-Teng
電控工程研究所
關鍵字: 蛋白質結構;資料融合;山峰群聚法;二階段群聚法;Protein Structure;Data Fusion;Mountain Clustering;Two Stage Clustering
公開日期: 2008
摘要: 在本研究中,主要探討二個與蛋白質結構預測與分析相關的問題,首先,將類神經網路以二階段分類的階層式學習架構用於蛋白質的結構預測分類問題,並進一步延伸,結合資訊融合的組合分析技術,有系統的利用多樣性次序/計分圖 (Diversity rank/score graph),選取出重要的分類特徵,藉以提昇第一階段及第二階段的分類正確率分別逹到87%及69.6%,印證此組合特徵擷取方式及系統分類架構,確為有效的方法,可協助改善此類蛋白質結構預測分類的問題,提昇正確率。其次,利用山峰群聚法來分析蛋白質3D結構的組成區塊,結合Best Molecular Fit (BMF)方法用於計算3D結構距離,使傳統山峰群聚法,可轉而用於立體三維空間向量之分群(稱之為Structural Mountain Clustering Method,簡稱SMCM),藉由估測區域密度來找出有用的3D結構組成區塊,並以實例驗證當這些組成區塊用於重建蛋白質3D結構時,以整體及區段均方根誤差值(Global-fit Root Mean Square Error及Local-fit Root Mean Square Error) 作為衡量標準時,均獲得良好的效果。另外,也針對SMCM山峰群聚法進行計算複雜度的探討,並提出遞增法(Incremental approach)來運用山峰群聚法,以因應一次處理大量訓練資料時,計算複雜度高而耗時過久的情形,此外,文中也採用不同的效能評比方式,以實例驗證本方法較以往二階段群聚法有更好的效果。
In this dissertation, we focus on two issues concerning protein structure prediction and analysis. First, we have applied a two-level classification strategy called hierarchical learning architecture (HLA) using neural networks to differentiate proteins according to their classes and folding patterns and then use a combinatorial fusion technique to facilitate feature selection and combination for improving predictive accuracy in protein structure classification. When applying combinatorial fusion to the protein fold prediction problem using neural networks with HLA, the resulting classification has an overall prediction accuracy rate of 87% for four classes and 69.6% for 27 folding categories. These rates are higher than previous results and it demonstrates that data fusion is a viable method for feature selection and combination in the prediction and classification of protein structure. Second, we propose an algorithm named Structural Mountain Clustering Method (SMCM) to find a library of short 3-D structural motifs (building blocks) for construction of 3-D structures of proteins/peptides. The algorithm finds the building blocks based on an estimate of local "density" of 3-D fragments computed using a measure of structural similarity that is obtained after best molecular fit alignment of pairs of fragments. The algorithm is tested on two well known benchmark datasets and is found to successfully reconstruct the test peptides in terms of both global-fit Root-Mean-Square (RMS) errors and local-fit RMS errors. The good local-fit RMS errors achieved by SMCM indicate that these short structural motifs extracted by our algorithm can model the nearby fragments quite accurately. We then analyze the computational complexity of the SMCM and propose an incremental version of SMCM to deal with large training dataset. In addition to using the global-fit and local-fit RMS errors, we propose and use two alternative ways to compare the quality of such quantization and reconstruction results between SMCM and Two Stage Clustering Algorithm (TSCA) to show the superiority of SMCM.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009012813
http://hdl.handle.net/11536/80980
顯示於類別:畢業論文


文件中的檔案:

  1. 281301.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。