標題: 以基因演算法為基礎的資訊探勘系統(GAINER)來偵測細胞內蛋白質用來定位的特徵
GAINER: Genetic Algorithm based INformation minER for subcellular localization signatures
作者: 江萬田
Chiang, Wan-Tien
黃明經
胡毓志
Ming-Jing Hwang
Yuh-Jyh Hu
資訊科學與工程研究所
關鍵字: 基因演算法;資訊探勘;細胞內蛋白質定位;蛋白質定位;蛋白質特徵;Genetic Algorithm;information miner;subcellular localization;GAINER;data mining;protein signatures
公開日期: 2004
摘要: 生物學家想要知道蛋白質功能的第一歩驟就是鑑定其位於細胞內哪一位置。然而,儘管科學家經過多年來相當努力地研究,理想的有效且實用的鑑定方法卻仍闕如。在此,我們提出一利用基因演算法為基礎的系統”GAINER”,來偵測細胞內蛋白質用以定位的特徵。GAINER是一個新的整合性系統,可分析蛋白質中用以作細胞定位的特徵,此系統整合了當前常用的氨基酸索引、字母代換索引及相近的樣式為本問題的特徵。並且,利用已知細胞位置的蛋白質當作機器學習的訓練資料,來找尋具有鑑別率的特徵集合。同時,我們也發展了一利用貝氏理論為基礎的預測系統”GALOP”,可以對任一給予之胺基酸序列,偵測此序列是否存在GAINER所挖掘出來之特徵,再利用貝氏理論來計算此序列上偵測到的特徵集合會屬於細胞內哪一個位置。藉由與眾所皆知的iPSORT及TargetP比較,我們證明利用GAINER可以正確且有效的挖掘出細胞內蛋白質用以定位的特徵,進而將這些特徵加以整理歸納可以得知其所代表的生化功能上的意義,幫助生物學家了解這些蛋白質在細胞內定位的機制;最後,GALOP可以對相關生物資料庫做準確且詳盡的註解,以供生物學家做更進一步蛋白質功能方面的分析研究。
The first step to know the function(s) of a protein is often to identify its subcellular location(s). Though scientists have been making efforts to identify the subcellular locations of proteins, an effective and efficient way to distinguish protein subcellular location(s) has yet to be completely achieved. Here, we introduce GAINER, a novel genetic algorithm based integrative for discovering protein subcellular localization signatures. GAINER encodes amino acid indices, alphabet indexing and approximate patterns as signatures candidates, and uses known subcellular location proteins as training data to mine discriminative signatures. Furthermore, we also developed a Bayesian based classifier, GALOP, to predict a protein’s subcellular location(s) based on the probabilities of the detected signatures on distinct subcellular locations. By comparing with the well-known tools TargetP and iPSORT, we show that GAINER can effectively and efficiently discover the protein subcellular localization signatures. In addition, we can know the biochemical meanings by inspecting these signatures, and help biologists to understand the protein subcellular sorting and targeting mechanisms. Finally, GALOP can annotate relevant databases accurately and thoroughly, which can greatly help biologists in proteomics research.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009123615
http://hdl.handle.net/11536/53691
顯示於類別:畢業論文