標題: 預測和分析人類膜轉運蛋白的成孔次單位
Predicting and analyzing pore-forming subunits of human membrane transport proteins
作者: 李銘哲
Li, Ming-Che
何信瑩
黃慧玲
Ho, Shinn-Ying
Huang, Hui-Ling
生物資訊及系統生物研究所
關鍵字: 膜轉運蛋白;成孔次單位;資料庫;編碼;繼承式雙目標基因演算法;預測器;membrane transport proteins;pore-forming subunits;database;code;Inheritable Bi-objective Genetic Algorithm;predictor
公開日期: 2011
摘要: 膜轉運蛋白為數個膜轉運蛋白次單位所組成的複合體,其中膜轉運蛋白成孔次單位所構成的穿膜孔主掌了細胞和胞器內外物質的進出,藉以調控著諸多重要的生物程序。由於許多疾病的病因與膜轉運蛋白成孔次單位的缺陷有關(如離子通道病和噬血症候群等疾病)且許多藥物皆針對膜轉運蛋白成孔次單位而設計(如藥物抗性和專一性),因此驗證和研究膜轉運蛋白成孔次單位一直以來都是生物和醫學領域的重點。 近年來許多驗證膜轉運蛋白成孔次單位的研究使用各種生物資訊的工具協助驗證前的初步篩選,這些篩選的方式和結果仍然分別需要使用者設計和判斷,使得最終需驗證的蛋白質數目因而不同。且就驗證尚未發現的新類別膜轉運蛋白成孔次單位而言,生物資訊工具初步篩選後所需驗證的蛋白質數目依然很大,而使得後續的驗證依然耗費許多額外的時間和成本。因此建立一個能準確預測膜轉運蛋白成孔次單位的預測器預測膜轉運蛋白成孔次單位有其必要性,以縮減驗證的時間和成本。 在近期預測膜轉運蛋白的相關文獻中,大多研究只有預測和分類膜轉運蛋白和其所隸屬的TC家族,而非預測膜轉運蛋白的成孔次單位,目前尚未有預測膜轉運蛋白成孔次單位的研究和膜轉運蛋白成孔次單位的專門資料庫。也因此在本篇研究中我們以全球最大的人工註解蛋白質資料庫和透過搜尋文獻確認人類膜轉運蛋白成孔次單位並建立了人類膜轉運蛋白成孔次單位的dataset (POSATS),之後以此dataset為基礎建立人類膜轉運蛋白成孔次單位的預測器(POSTPred 1.0)並探討預測結果。 POSATS共包含了5176條人類穿膜蛋白,共9個超家族、916個家族、728條膜轉運蛋白成孔次單位、190條有潛力的膜轉運蛋白成孔次單位、4258條非膜轉運蛋白成孔次單位和引用379篇相關文獻。隨後我們隨機取728條膜轉運蛋白成孔次單位和非膜轉運蛋白成孔次單位再各隨機取2/3以智慧型基因演算法擷取特徵和支援向量機建立POSTPred 1.0,其餘1/3測試POSTPred 1.0的效能。 POSTPred 1.0共使用了18個特徵,其五折交叉驗證之準確度為86.42%,獨立測試的準確度為84.71%。此18個特徵經由主效果分析顯示該特徵組中最有影響力的兩個特徵皆為蛋白質內部胺基酸轉移至外表所需要的能量,其與膜轉運蛋白成孔次單位成孔有極大的關聯性,這說明了POSTPred 1.0除了有不錯的預測準確度且其所使用的特徵組確實能有效區分膜轉運蛋白成孔次單位和非膜轉運蛋白成孔次單位。
A membrane transport protein is a protein complex which composed by several membrane transport protein subunits. Among these subunits, the pore-forming subunits play a key role in composed of the transmembrane pore. Transmembrane pores dominate the passing of inner and outer substances through plasmamembranes and organelle membranes; these pores regulate various important biological processes. Owing to the fact that many diseases are caused by the defection of pore-forming subunits such as channelopathies and hemophagocytic syndrome and many drug designing are based on pore-forming subunits such as drug resistance and specificity, identifying and researching of pore-forming subunits is the point of biological and medical research field from the past to nowadays. Recently, there are many researches using various bioinformatic tools to do the preliminary screening before identifying, but these screening methods and results need to be designed and judged by users respectively; thus, the final protein numbers are different after various screening methods. When it comes to dealing with new classes of pore-forming subunits which were not found before, the protein number after preliminary screening by bioinformatic tools is still large, which will cost a lot of time and prime cost. Therefore, it is necessary to provide a precise pore-forming subunits prediction tool to shorten the identigying time and reduce the prime cost. In the recent related works of predicting membrane transport proteins, most researches only predicted and classified the membrane transport proteins and the TC family it belongs to, not the pore-forming subunits of membrane transport proteins. Up to now, there are no researches about predicting pore-forming subunits and databases specialized for pore-forming subunits. Therefore, in this work, we began with a well-known, manual annotated protein database and search for the papers which provided evidences for human pore-forming subunits. After collecting enough information, we constructed Pore-fOrming Subunits of humAn Transporter Set (POSATS). Based on this dataset, we also constructed a predictor for human pore-forming subunits (POSTPred 1.0) and analyzed the result. POSATS comprised 5176 human transmembrane proteins, totally 9 superfamilies, 916 families, 728 pore-forming subunits, 190 potential pore-forming subunits, 4258 non -pore-forming subunits and 379 curated literature references. For predicting, we randomly chose 728 pore-forming subunits and 728 non-pore-forming subunits first; next we randomly chose 2/3 as input for IBCGA-SVM; after 30 independent runs, we got an output of an informative feature set. This optimized feature set was used to construct POSTPred 1.0. Last, the remaining 1/3 part was for POSTPred 1.0 performance testing. The optimized feature set of POSTPred 1.0 totally used 18 features, and its 5-fold cross validation accuracy and independent test accuracy were 86.42% and 84.71% respectively. After MED analysis, we found that the top 2 of the 18 features are energy requirements of amino acids transferring from inside to outside. These two features are also highly related to the poring of pore-forming subunits. We concluded that this result consisted with the good prediction accuracy of POSTPred 1.0 and the optimized feature set of POSTPred 1.0 could efficiently differentiate between pore-forming subunits and non-pore-forming subunits.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079951520
http://hdl.handle.net/11536/50400
顯示於類別:畢業論文