標題: 使用計分卡分析系統來預測並詮釋 蛋白質功能
Prediction and Characterization of Protein Functions Using a Scoring Card Method
作者: 劉一帆
何信瑩
Liou, Yi-Fan
Ho, Shinn-Ying
生物資訊及系統生物研究所
關鍵字: 計分卡方法;蛋白質功能預測;智慧型基因演算法;Scoring card method;protein function prediction;intelligent genetic algorithm
公開日期: 2016
摘要: 本研究主要提出一套蛋白質分析系統,使用蛋白質序列即可預測蛋白質功能和找出該蛋白質的特性。這套系統主要包含三大部分:蛋白質抽樣部分、機械學習部分和分析部分。和目前已經發展出來的蛋白質功能預測器比較,這套系統不只可以用來預測蛋白質功能,而且也能進一步發現蛋白質的特徵。在蛋白質抽樣部分,使用了基因註解和SwissProt資料庫進行取樣,而機械學習部分則是採用易讀的計分卡方法。在分析部分,主要由兩個方法所構成,其一為可以用來找出高相關物化特性的方法稱之為SCM-PCPI,和一個可將看出計分卡上分數在蛋白質上分布的方法稱之為SCM-VISU。為了這套稱之為計分卡啟發式蛋白質分析系統確實可行,本研究使用了四種不同功能的蛋白質做驗證。 在第一個研究中,預測血紅素結合蛋白為主要的預測目標。計分卡分析結果顯示血紅素結合蛋白具有比非血紅素結合蛋白更堅硬的結構。這結果說明血紅素結合蛋白可能在結構上十分穩定。這可能是因為在蛋白質的工作環境因子,例如溫度和酸鹼度都會影響蛋白質整體結構,而蛋白質的結構也影響了蛋白質的功能是否正常。因此血紅素結合蛋白必須要讓自身結構足夠穩定以應付當蛋白質運作時可能要穿越不同環境所帶來對蛋白質結構的衝擊。第二個主題是光合作用蛋白質的預測。光合作用蛋白主要負責將太陽能轉換成生物可使用的化學能。因此光合作用蛋白所工作的環境通常都含有許多的高能量的過氧化物。這些過氧化物通常都會攻擊蛋白質造成這些蛋白質失去功能。在製作光合作用蛋白預測器後,計分卡也被拿來分析。分析結果顯示光合作用蛋白傾向使用可以吸收過氧化物的氨基酸所構成。這可能是因為在光合作用系統中,光合作用蛋白也需要扮演清理過氧化物的角色,以防止這些過氧化物去攻擊其他蛋白質。第三個主題為研究穿膜運輸蛋白。透過物化特性和計分卡視覺化分析,穿膜運輸蛋白的通道部分並非完全使用親水性的氨基酸。原因可能是因為穿膜運輸蛋白需要在膜環境中摺疊成具有功能的蛋白質,另外可以讓水分子可以快速通過增加運輸效能。最後一個主題為醣結合蛋白。在這研究中,我們發現在糖結合蛋白結合位的地方交雜式樣,另外也提出了四項改造醣結合蛋白的準則,提供蛋白質工程參考用。 最後結果顯示,計分卡的分析結果除了符合之前研究所提出的證據外,也一並發現了其他特徵。
This study aims to propose a protein analyzing system to predict and characterize proteins using their sequences. This system contains three parts: the protein sampling part, machine learning part and analyzing part. Comparing to the existing protein function predictors, this system is not only used to predict protein functions, but also find the characteristics of proteins. In the data sampling part, the GO terms and SwissProt is used while the machine learning part use the scoring card method due to its easy interpretability. The analysis part contain SCM-PCPI which is used to find the highly correlated physicochemical properties and SCM-VISU which is use to visualize the score distribution on the protein.To evaluate this system, name Scoring Card Inspired Protein Analysis System, SCIPAS, four topics are applied. The first topic is the heme-binding protein (HBP) prediction. The scoring card analyzing results suggest that HBPs have more rigid structures than non-HBPs do. This means HBPs would have stable structures. Since the environment conditions, such as the temperatures and pH, affect the protein structures which decide the protein function, HBP need to keep their structures stable enough to proof denaturation when they pass through different work environments. The second topic is the photosynthetic protein (PSP) prediction. PSPs are responsible for transform the solar energy into the chemical energy that the live organisms can use. PSPs work at the environment containing ROS which are high energy compounds and often attack proteins. After creating the PSP predictor, the scoring card is analyzed using SCM-PCPI. PSPs show to be composed of the amino acids that can neutralize ROS. This would be due to PSPs need to clean the ROS to keep other protein away from the ROS attacking. In the third topic, membrane transporter proteins (MTPs) are studied. After analyzing the physicochemical properties and visualizing the protein using scoring card, the amino acids composed of the channels are not very hydrophilic. This would be caused from MTP folding and keeping the channel transport efficient. In the fourth topic, carbohydrate binding proteins (CBPs) are investigated. In this part, the interlacing patterns are shown and four criteria for CBP engineering are also proposed. Finally, in those topics, the characteristic results have good agreements with previous studies, some characteristics of the proteins are also found.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079851812
http://hdl.handle.net/11536/143048
Appears in Collections:Thesis