蛋白質二級結構預測 - 使用基因演算法

標題:	蛋白質二級結構預測 - 使用基因演算法 Protein Secondary Structure Prediction Using Genetic Algorithm
作者:	林凡凱 Fan Kai, Lin 孫春在　 Dr. Chuen-Tsai Sun 資訊科學與工程研究所
關鍵字:	蛋白質二級結構預測;基因演算法;基模;Protein Secondary Structure Prediction;Genetic Algorithm;Schema
公開日期:	2001
摘要:	依據目前的蛋白質二級結構(protein secondary structure)預測方法的準確率，其結果無法充分的信任，只能當作輔助工具，如果預測的方法能夠提供預測者更多的資訊來幫助判斷預測結果的好壞，此預測方法將更有參考價值。過去的蛋白質二級結構(protein secondary structure)預測方法大多都是屬於「黑盒子」的方法，如類神經網路，預測者得到的只是一個預測結果以及數千萬個用來作判斷的權重(weight)，如此預測者很難分析預測的結果，這些權重對於預測者來說是沒有意義的。本研究利用基因演算法，在蛋白質一級結構中(primary structure)中，找到一些基模(schema)，利用這些基模來預測蛋白質二級結構可以得到60%以上的預測準確率(本研究並未加入多重序列比對的資訊)，預測者可以根據這些基模提供的資訊來分析預測的結果，預測者可以獲得的資訊包括：該基模存在於哪一個蛋白質中，該基模的搜尋是利用哪一個取代矩陣，以及在做預測時用來做判斷的是哪個基模。除此之外，本研究也找出一些基模，其預測摺板的預測準確率可以達到70%，在過去的研究中，摺板是最不易被預測的。本研究也針對NRPDB(non-redundant PDB)中的蛋白質做搜尋，搜尋的結果發現，某幾條基模出現的次數多達三百多次，而且預測的準確率可以達到八成，甚至九成。 Accuracy of protein secondary structure prediction presently is about 75%. We cannot predict protein secondary structure exactly. In other words, the accuracy “75%” cannot tell us if it also can predict an unknown protein with that accuracy. Users need more information to analyze if the result of prediction is good enough. Most past protein secondary structure prediction models, like neural network, belong to “black box” method and users get little information from thousands of weights of those models to analyze the result of prediction. In this thesis, we found some schemas in protein primary structure using genetic algorithm. The accuracy of our model, which does not include information of multiple sequences alignment, is about 60%. Users will get information about which protein the schema comes from, which substitution matrix we used to find the schema, which schema we used to predict, etc. These kinds of information can help users to analyze the result of prediction. Furthermore, we found some schemas to predict sheet with accuracy about 70%. The secondary structure “sheet” is difficultly to predict in the past. Another contribution of this thesis is that we found some schemas appearing in NRPDB(non-redundant PDB) above 300 times, and accuracy between 80% and 90%.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT900394049 http://hdl.handle.net/11536/68575
顯示於類別：	畢業論文