標題: 利用蛋白質區域資訊進行賴胺酸乙醯化分析與分組並建立預測系統
Predicting lysine acetylation site based on protein domain information and multilevel support vector machines
作者: 莫士逸
Mo, Shi-I
胡毓志
Hu, Yuh-Jyh
生醫工程研究所
關鍵字: 賴胺酸乙醯化預測;機器學習;蛋白質區域;lysine acetylation prediction;machine learning;protein domain
公開日期: 2012
摘要: 蛋白質賴胺酸乙醯化是常見且重要的轉譯後修飾之一,在生物反應中有著改變或是擴充蛋白質生物功能,達成基因調控、DNA的複製與修復、維持細胞正常代謝等作用。在醫學上,同樣發現了組蛋白(histone)上賴胺酸乙醯化的表現會影響到心血管疾病與前列腺癌等疾病的發生,也可經由檢測這些蛋白質上賴胺酸乙醯化的程度達到預防的效果。也由於賴胺酸乙醯化在生物與醫學上扮演的角色日益重要,能夠發展一套穩定的賴胺酸乙醯化預測系統,針對大量的蛋白質序列進行快速的分析與預測對往後的研究與發現有相當的幫助。本論文收集所有賴胺酸乙醯化樣本在蛋白質區域(Domain)的分佈位置,以及二級結構的資料,對所有賴胺酸樣本進行分析,並重新分成三個資料組並各進行交叉驗證。實驗結果顯示,這三組資料經過交叉驗證後,各自有平均百分之80.0、76.3、79.3預測準確率。而這三組資料各自建立預測模組後,藉由獨立測試資料預測亦有百分之80.8、77.9、75.2等穩定並優於其他預測工具的預測準確率,可提供相關研究人員,藉由分析蛋白質區域獲得更精確的賴胺酸乙醯化預測。
Lysine acetylation is an important and well-known post-translational protein modification with a role in regulating gene expression, DNA repair and replication. Lysine acetylation on histone are also lined with diseases such as cardiovascular and cancer. Therefore, a computational identification of actyllysine is needed to help experimental work in wet lab. In this paper, we use protein domain information to partition all lysine samples into three training sets. The physicochemical properties of amino acids, secondary structure and Position-Specific Scoring Matrix of proteins are collected as features on multiple-level-SVM-training. The accuracies of 10-fold cross-validation of the three individual training sets are 80.0%, 76.3%,and 79.3%. The predictive accuracies of independent data are 80.8%, 77.9%, 75.2%, higher than other predictors, and the system can provide user a more predictive lysine acetyaltion identification.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079830519
http://hdl.handle.net/11536/72401
顯示於類別:畢業論文