標題: | Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method |
作者: | Huang, Wen-Lin Chen, Hung-Ming Hwang, Shiow-Fen Ho, Shinn-Ying 生物科技學系 生物資訊及系統生物研究所 Department of Biological Science and Technology Institude of Bioinformatics and Systems Biology |
關鍵字: | amino acid composition;enzyme subfamily class prediction;fuzzy theory;k-nearest neighbor;support vector machine |
公開日期: | 1-九月-2007 |
摘要: | Amphiphilic pseudo-amino acid composition (Am-Pse-AAC) with extra sequence-order information is a useful feature for representing enzymes. This study first utilizes the k-nearest neighbor (k-NN) rule to analyze the distribution of enzymes in the Am-Pse-AAC feature space. This analysis indicates the distributions of multiple classes of enzymes are highly overlapped. To cope with the overlap problem, this study proposes an efficient non-parametric classifier for predicting enzyme subfamily class using an adaptive fuzzy r-nearest neighbor (AFK-NN) method, where k and a fuzzy strength parameter m are adaptively specified. The fuzzy membership values of a query sample Q are dynamically determined according to the position of Q and its weighted distances to the k nearest neighbors. Using the same enzymes of the oxidoreductases family for comparisons, the prediction accuracy of AFK-NN is 76.6%, which is better than those of Support Vector Machine (73.6%), the decision tree method C5.0 (75.4%) and the existing covariant-discriminate algorithm (70.6%) using a jackknife test. To evaluate the generalization ability of AFK-NN, the datasets for all six families of entirely sequenced enzymes are established from the newly updated SWISS-PROT and ENZYME database. The accuracy of AFK-NN on the new large-scale dataset of oxidoreductases family is 83.3%, and the mean accuracy of the six families is 92.1 %. (c) 2006 Elsevier Ireland Ltd. All rights reserved. |
URI: | http://dx.doi.org/10.1016/j.biosystems.2006.10.004 http://hdl.handle.net/11536/10347 |
ISSN: | 0303-2647 |
DOI: | 10.1016/j.biosystems.2006.10.004 |
期刊: | BIOSYSTEMS |
Volume: | 90 |
Issue: | 2 |
起始頁: | 405 |
結束頁: | 413 |
顯示於類別: | 期刊論文 |