Machine learning with automatic feature selection for multi-class protein fold classification

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Huang, CD	en_US
dc.contributor.author	Liang, SF	en_US
dc.contributor.author	Lin, CT	en_US
dc.contributor.author	Wu, RC	en_US
dc.date.accessioned	2014-12-08T15:18:50Z	-
dc.date.available	2014-12-08T15:18:50Z	-
dc.date.issued	2005-07-01	en_US
dc.identifier.issn	1016-2364	en_US
dc.identifier.uri	http://hdl.handle.net/11536/13539	-
dc.description.abstract	In machine learning, both the properly used networks and the selected features are important factors which should be considered carefully. These two factors will influence the result, whether for better or worse. In bioinformatics, the amount of features may be very large to make machine learning possible. In this study we introduce the idea of feature selection in the problem of bioinformatics. We use neural networks to complete our task where each input node is associated with a gate. At the beginning of the training, all gates are almost closed, and, at this time, no features are allowed to enter the network. During the training phase, gates are either opened or closed, depending oil the requirements. After the selection training phase has completed, gates corresponding to the helpful features are completely opened while gates Corresponding to the useless features are closed more tightly. Some gates may be partially open, depending oil the importance of the corresponding features. So, the network can not only select features in an online manner during learning, but it also does some feature extraction. We combine feature selection with our novel hierarchical machine learning architecture and apply it to multi-class protein fold classification. At the first level the network classifies the data into four major folds: all alpha, all beta, alpha + beta and alpha/beta. In the next level, we have another set of networks which further classifies the data into twenty-seven folds. This approach helps achieve the following. The gating network is found to reduce the number of features drastically. It is interesting to observe that, for the first level using just 50 features selected by the gating network, we can get a test accuracy comparable to that using 125 features in neural classifiers. The process also helps us get a better insight into the folding process. For example, tracking the evolution of different gates, we call find which characteristics (features) of the data are more important for the folding process. Eventually, it reduces the computation time. The use of the hierarchical architecture helps LIS get a better performance also.	en_US
dc.language.iso	en_US	en_US
dc.subject	machine learning	en_US
dc.subject	hierarchical architecture	en_US
dc.subject	feature selection	en_US
dc.subject	gate	en_US
dc.subject	neural network	en_US
dc.subject	protein fold	en_US
dc.subject	bioinformatics	en_US
dc.title	Machine learning with automatic feature selection for multi-class protein fold classification	en_US
dc.type	Article	en_US
dc.identifier.journal	JOURNAL OF INFORMATION SCIENCE AND ENGINEERING	en_US
dc.citation.volume	21	en_US
dc.citation.issue	4	en_US
dc.citation.spage	711	en_US
dc.citation.epage	720	en_US
dc.contributor.department	電控工程研究所	zh_TW
dc.contributor.department	Institute of Electrical and Control Engineering	en_US
dc.identifier.wosnumber	WOS:000230500600003	-
dc.citation.woscount	1	-
顯示於類別：	期刊論文

文件中的檔案：

000230500600003.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。