Title: | 使用物化性質為基礎的最佳化方法來預測DNA 鍵結蛋白質 Predicting DNA-binding proteins using a physicochemical-property-based optimization method |
Authors: | 林意哲 Lin, I-Che 何信瑩 黃慧玲 Ho, Shinn-Ying Huang, Hui-Lin 生物科技學系 |
Keywords: | DNA鍵結蛋白質;物化性質;支持向量機;DNA-binding protein;physicochemical-property;Support vector machines |
Issue Date: | 2009 |
Abstract: | 辨認去氧核醣核酸(DNA)鍵結蛋白質不僅是一種在基因組註解領域中的一個重要挑戰,在研究基因調控中也扮演了非常重要的作用,包括從去氧核醣核酸的複製到基因表達的調控。近年來,許多研究預測去氧核醣核酸的鍵結蛋白質是使用物化性質當為特性,但使用物化性質的全面性研究仍有待發展。本研究使用最佳化方式挑選文獻中的眾
多物化性質來預測蛋白質序列是否為去氧核醣核酸鍵結蛋白質。我們提出一個繼承式雙目標基因演算法為基礎的物化性質挑選方法,利用支持向量機與物化性質結合得到了一組物化性質來預測是否為去氧核醣核酸鍵結蛋白質。一般而言,生物學家需要領域知識來選擇有效的物化性質進行蛋白質的分析和預測。本研究方法可以用來了解去氧核醣核酸鍵結蛋白質和非去氧核醣核酸鍵結蛋白質之間的不同點,是一種容易被用於預測與了解各種鍵結蛋白質的功能和特色的有效方法。
在本實驗中使用文獻提供的多個資料集來做分析比較,包括DNA 鍵結蛋白質與DNA 鍵結功能域的預測,預測方法分別使用了22 和28 個由AAindex 資料庫挑選的物化性質來預測,得到和文獻方法相近的預測正確率。從物化性質的分析中,我們將物化性質用Fuzzy C-means 演算法再加以分群歸類,了解鍵結蛋白質與鍵結功能域的特徵差異。這個挑選物化性質為特徵的最佳化方法可當做核心方法,進一步用於設計其它預測去氧核醣核酸結合蛋白質的問題。 Identification of DNA-binding proteins is not only a kind of key challenge in the field of genome annotation but also plays a very important role in investigating gene regulation, from DNA replication to gene expression control. In recent years, many studies of predicting DNA-binding proteins have used physicochemical properties as features, but the comprehensive study of using physicochemical properties can be further investigated. In this thesis, we use an optimization approach to selecting informative physicochemical properties from a database AAindex to predict DNA-binding proteins. We proposed a prediction method SVM-PCP of using support vector machine (SVM) and informative physicochemical properties as the features to predict DNA-binding domains and proteins. SVM-PCP uses an inheritable bi-objective genetic algorithm to identify a small set of informative physicochemical properties while maximizing the prediction accuracy. Generally, biologists need domain knowledge to identify the physicochemical properties for analyzing and predicting DNA-binding domains and proteins. In this thesis, the computational method can be used to analyze the similarity and difference between the DNA-binding and non-DNA binding domains/proteins, which is an effective method to further understand the functions of DNA-binding domains and proteins. Several data sets were used in the experiments to evaluate the proposed method, including two data sets of DNA-binding domains and proteins. SVM-PCP identified 22 and 28 physicochemical properties from a database AAindex for predicting DNA-binding domains and proteins, respectively. The performance of SVM-PCP is comparable to that of using PSSM, compared with an existing method. The physicochemical properties are clustered by using a fuzzy C-means algorithm for further understanding the functions and characteristics of DNA-binding domains and proteins. From the analysis of informative physicochemical properties, some knowledge of DNA-binding and non DNA-binding proteins can be further investigated. The proposed physicochemical-property-based optimization method can be used conveniently as the core for designing predictors for various DNA-binding problems. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079728501 http://hdl.handle.net/11536/45277 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.