標題: 設計最佳化演算法預測蛋白質功能和辨認神經細胞影像
Designing Optimization Methods to Predict Protein Functions and Recognize Neuron Images
作者: 李光成
Phasit Charoenkwan
何信瑩
Shinn-Ying Ho
生物資訊及系統生物研究所
關鍵字: 預測蛋白質;辨識神經細胞影像;特徵擷取;特徵選取;最佳化;Protein prediction;Neuron image classification;Feature extraction;Feature selection;Optimization
公開日期: 2013
摘要: The massive growth of protein sequence and neuron image datasets leads to the need of computation-based methods to predict and analyse their biological functions. To predict protein functions and recognize neurons images, machine-learning-based classifiers are regularly suggested. In present, the desired predictor of protein functions should provide both prediction efficiency and knowledge discovery. Meanwhile, the identification of informative features for recognizing neuron images is not easy due to a large number of available image features. This dissertation develops optimization methodologies for both predicting protein sequences and recognizing neuron images based on an intelligent genetic algorithm (IGA). The scoring card method (SCM) is a simple and highly interpretable method for prediction and analysis of protein functions. The SCM calculates dipeptides propensity scores of an interested protein function from the difference of dipeptide compositions between positive and negative sequences. The propensity scores of 400 dipeptides are optimized by IGA to enhance prediction accuracy while conserving the original characteristics of amino acid composition. A sequence score is derived by utilizing these propensity scores to predict its protein function. Two SCM-based methods, SCMSOL and SCMCRYS, are proposed for prediction and analysis of protein solubility and crystallizability, and their tests accuracies are 84.3% and 76.1%, respectively, which are comparable to the support vector machine based methods using the same dipeptide composition features. Moreover, the biological knowledge discovery and mutagenesis analysis for soluble and crystallizable proteins from the propensity scores are illustrated. The procedure of developing SCM-based methods for protein function prediction can also be applied to design other methods for predicting protein functions with high prediction performance and high interpretable results. This dissertation also presents an automated neuron image feature identification system (Auto-NIFI) which is a user-friendly tool for automatically extracting and identifying a small set of informative neuron image features utilizing an inheritable bi-objective combinatorial genetic algorithm (IBCGA). The feature selection of Auto-NIFI allows biologists to construct a suitable classifier for particular neuron image classification problems. To identify neuron image features, Auto-NIFI provides a comprehensive set of image feature extraction modules together with the IBCGA feature selection modules. Notably, according to the huge collection of image feature extraction modules available in this tool, this system is also capable of applying to a wide variety of biological image classification problems. Two methods, HCS-Neurons and DescNeuro, are proposed for neuron image classification. In the HCS-Neurons method, the usefulness of Auto-NIFI is demonstrated in identifying phenotypic changes in multi-neuron images upon response to drug treatments of high-content screening. The identified three features of morphology were able to achieve an independent accuracy of 90.28% for recognizing neurons into six classes corresponding to six different nocodazole drug concentrations. By using the Auto-NIFI, DescNeuro can recognize a neuron in the 3D Drosophila neuron database from a 2D image with promising recognition results.
The massive growth of protein sequence and neuron image datasets leads to the need of computation-based methods to predict and analyse their biological functions. To predict protein functions and recognize neurons images, machine-learning-based classifiers are regularly suggested. In present, the desired predictor of protein functions should provide both prediction efficiency and knowledge discovery. Meanwhile, the identification of informative features for recognizing neuron images is not easy due to a large number of available image features. This dissertation develops optimization methodologies for both predicting protein sequences and recognizing neuron images based on an intelligent genetic algorithm (IGA). The scoring card method (SCM) is a simple and highly interpretable method for prediction and analysis of protein functions. The SCM calculates dipeptides propensity scores of an interested protein function from the difference of dipeptide compositions between positive and negative sequences. The propensity scores of 400 dipeptides are optimized by IGA to enhance prediction accuracy while conserving the original characteristics of amino acid composition. A sequence score is derived by utilizing these propensity scores to predict its protein function. Two SCM-based methods, SCMSOL and SCMCRYS, are proposed for prediction and analysis of protein solubility and crystallizability, and their tests accuracies are 84.3% and 76.1%, respectively, which are comparable to the support vector machine based methods using the same dipeptide composition features. Moreover, the biological knowledge discovery and mutagenesis analysis for soluble and crystallizable proteins from the propensity scores are illustrated. The procedure of developing SCM-based methods for protein function prediction can also be applied to design other methods for predicting protein functions with high prediction performance and high interpretable results. This dissertation also presents an automated neuron image feature identification system (Auto-NIFI) which is a user-friendly tool for automatically extracting and identifying a small set of informative neuron image features utilizing an inheritable bi-objective combinatorial genetic algorithm (IBCGA). The feature selection of Auto-NIFI allows biologists to construct a suitable classifier for particular neuron image classification problems. To identify neuron image features, Auto-NIFI provides a comprehensive set of image feature extraction modules together with the IBCGA feature selection modules. Notably, according to the huge collection of image feature extraction modules available in this tool, this system is also capable of applying to a wide variety of biological image classification problems. Two methods, HCS-Neurons and DescNeuro, are proposed for neuron image classification. In the HCS-Neurons method, the usefulness of Auto-NIFI is demonstrated in identifying phenotypic changes in multi-neuron images upon response to drug treatments of high-content screening. The identified three features of morphology were able to achieve an independent accuracy of 90.28% for recognizing neurons into six classes corresponding to six different nocodazole drug concentrations. By using the Auto-NIFI, DescNeuro can recognize a neuron in the 3D Drosophila neuron database from a 2D image with promising recognition results.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079855863
http://hdl.handle.net/11536/75086
Appears in Collections:Thesis