標題: 基於類神經網路之蛋白質序列特徵選擇及分類
Protein Sequences Feature Selection and Classification based on Artificial Neural Network.
作者: 沈亞欣
Ya-Hsin, Shen
周志成
林進燈
Dr. Chi-Cheng, Jou
Prof. Chin-Teng, Lin
電控工程研究所
關鍵字: 類神經網路(ANNs);蛋白質序列;SCOP;分類;特徵選擇;編碼;主成份分析(PCA);artificial neural networks;protein sequences;Structural Classification of Proteins;classification;feature selection;coding;principle component analysis
公開日期: 2002
摘要: 本論文以具自動特徵選擇的類神經網路分類器,將蛋白質序列予以特徵選擇,並將特徵選擇的分類結果與人工的蛋白質分類器—SCOP的分類基礎相比較。其中蛋白質序列以二級結構及演化的資訊為加入資訊的方法,將二級結構及演化的資訊的特徵序列以創新的編碼方法—GLOBAL DESCRIPTOR及LOCAL DESCRIPTOR予以編碼,並以主成份分析轉換減少特徵數目,結果顯示主成份分析可減少百分之九十五的特徵數目,而特徵選擇的分類結果與SCOP的分類基礎大部分一致,其中分類的正確率在SCOP的三個階層分別為百分之九十點七一、百分之六十一點六七及百分之八十八。
In this thesis, we propose an artificial neural networks classifier which automatically selects feature. The classifier does protein sequences feature selection, and we compare the classification result after feature selection with SCOP, where the classification result is done manually. The way to add information into protein sequences in this thesis is using information of secondary structure and evolution. The coding method is new, which is GLOBAL DESCRIPTOR and LOCAL DESCRIPTOR. After coding sequences above, we employ principle component analysis (PCA) to extract features which can averagely reduce 95% amount of input vectors. Compared to the basis of classification in three levels of SCOP, we show an agreement in class level, quasi-agreement in fold level and superfamily level. And the final predictive result shows 90.71% accuracy in class level, 61.67% in fold level and 88% in superfamily level.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT910591091
http://hdl.handle.net/11536/71065
Appears in Collections:Thesis