标题: 使用物化性质为基础的最佳化方法来预测DNA 键结蛋白质
Predicting DNA-binding proteins using a physicochemical-property-based optimization method
作者: 林意哲
Lin, I-Che
何信莹
黄慧玲
Ho, Shinn-Ying
Huang, Hui-Lin
生物科技学系
关键字: DNA键结蛋白质;物化性质;支持向量机;DNA-binding protein;physicochemical-property;Support vector machines
公开日期: 2009
摘要: 辨认去氧核醣核酸(DNA)键结蛋白质不仅是一种在基因组注解领域中的一个重要挑战,在研究基因调控中也扮演了非常重要的作用,包括从去氧核醣核酸的复制到基因表达的调控。近年来,许多研究预测去氧核醣核酸的键结蛋白质是使用物化性质当为特性,但使用物化性质的全面性研究仍有待发展。本研究使用最佳化方式挑选文献中的众
多物化性质来预测蛋白质序列是否为去氧核醣核酸键结蛋白质。我们提出一个继承式双目标基因演算法为基础的物化性质挑选方法,利用支持向量机与物化性质结合得到了一组物化性质来预测是否为去氧核醣核酸键结蛋白质。一般而言,生物学家需要领域知识来选择有效的物化性质进行蛋白质的分析和预测。本研究方法可以用来了解去氧核醣核酸键结蛋白质和非去氧核醣核酸键结蛋白质之间的不同点,是一种容易被用于预测与了解各种键结蛋白质的功能和特色的有效方法。
在本实验中使用文献提供的多个资料集来做分析比较,包括DNA 键结蛋白质与DNA 键结功能域的预测,预测方法分别使用了22 和28 个由AAindex 资料库挑选的物化性质来预测,得到和文献方法相近的预测正确率。从物化性质的分析中,我们将物化性质用Fuzzy C-means 演算法再加以分群归类,了解键结蛋白质与键结功能域的特征差异。这个挑选物化性质为特征的最佳化方法可当做核心方法,进一步用于设计其它预测去氧核醣核酸结合蛋白质的问题。
Identification of DNA-binding proteins is not only a kind of key challenge in the field of genome annotation but also plays a very important role in investigating gene regulation, from DNA replication to gene expression control. In recent years, many studies of predicting
DNA-binding proteins have used physicochemical properties as features, but the comprehensive study of using physicochemical properties can be further investigated. In this thesis, we use an optimization approach to selecting informative physicochemical properties from a database AAindex to predict DNA-binding proteins. We proposed a prediction method SVM-PCP of using support vector machine (SVM) and informative physicochemical properties as the features to predict DNA-binding domains and proteins. SVM-PCP uses an inheritable bi-objective genetic algorithm to identify a small set of informative physicochemical properties while maximizing the prediction accuracy. Generally, biologists need domain knowledge to identify the physicochemical properties for analyzing and predicting DNA-binding domains and proteins. In this thesis, the computational method can be used to analyze the similarity and difference between the DNA-binding and non-DNA binding domains/proteins, which is an effective method to further understand the functions of DNA-binding domains and proteins. Several data sets were used in the experiments to evaluate the proposed method, including two data sets of DNA-binding domains and proteins. SVM-PCP identified 22 and
28 physicochemical properties from a database AAindex for predicting DNA-binding domains and proteins, respectively. The performance of SVM-PCP is comparable to that of
using PSSM, compared with an existing method. The physicochemical properties are clustered by using a fuzzy C-means algorithm for further understanding the functions and
characteristics of DNA-binding domains and proteins. From the analysis of informative physicochemical properties, some knowledge of DNA-binding and non DNA-binding proteins can be further investigated. The proposed physicochemical-property-based optimization method can be used conveniently as the core for designing predictors for various DNA-binding problems.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079728501
http://hdl.handle.net/11536/45277
显示于类别:Thesis


文件中的档案:

  1. 850101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.