標題: 運用整體學習分類法對癌症作樣本分類
Using Ensemble Classifier Learning for Cancer Classification
作者: 李怡萱
Yi-Syuan Lee
謝筱齡
蔡文能
Sheau-Ling Hsieh
Wen-Nung Tsai
網路工程研究所
關鍵字: 整體學習;模糊類神經網路;K個最近鄰居;二次分類法;資訊獲利;微陣列晶片;ensemble learning;neural fuzzy;KNN;quadratic classifier;information gain;microarray technology
公開日期: 2008
摘要: 目前有兩百種以上不同種類的癌症,每一種癌症的症狀和治療方法都不盡相同,即便是專業的醫護人員對於正確的分類癌症樣本都是有困難的。因此,我們利用特徵選取和整體學習分類法對癌症做樣本分類。 微陣列晶片在微小面積上種植高密度的生物探針,做為大量篩檢及平行分析上千個基因的工具。上千個基因對於癌症的樣本分類並不是都有幫助的,我們使用特徵選取中資訊獲利的方法來挑選對於癌症樣本分類有幫助的特徵值,我們再對做完特徵選取的癌症資料做樣本分類。 我們所使用的癌症資料有白血病以及乳癌。使用模糊類神經網路、K個最近鄰居、二次分類法、和它們個別的整體學習分類法,以及整合它們的整體學習分類法來對癌症做樣本分類。實驗的結果發現,整體學習分類法可以提升個別分類法的正確率。
There are over 200 various types of cancer, each of which has a unique set of clinical characteristics, and different chances of being cured. Unfortunately, it is sometimes difficult, for even the experienced specialists, to determine among particular cancers and their subtypes. Therefore we use the feature selection methods and machine learning classifiers for cancer classification. DNA microarray technology can simultaneously monitor the expression of thousands of genes. It can offer the analyses of gene expression data to the physician for diagnose cancer or the research of classifying cancer. To accurately classify cancer we need to select the related genes because some extracted genes form microarray are useless for classify. In this study we classify two kinds of cancers. One is Leukemia cancer gene expression data set. Another is breast cancer of medical diagnostic data set. In the research, information gain has been used for feature selections. Neural fuzzy (NF), k-nearest neighbor (KNN), quadratic classifier (QC), and their associated, ensemble models, and as well as these three combined model have been utilized for classification. Experimental results show the ensemble learning performs better then individual classifiers in classification.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009556556
http://hdl.handle.net/11536/39652
顯示於類別:畢業論文