標題: | 癌症系集分類—特徵選擇與基因組合 Cancer Classifier – Feature Selection and Gene Feature Combinations |
作者: | 吳珮慈 Wu, Pei-Tzu 謝筱齡 林正中 Hsieh, Sheau-Ling Lin, Cheng-Chung 資訊科學與工程研究所 |
關鍵字: | 系集分類法;K個最近鄰居;二次分類法;支持向量機;資訊獲利;乳癌診斷資料;ensemble classification;K-nearest neighbor;quadratic classifier;support vector machine;information gain;breast diagnostic data |
公開日期: | 2009 |
摘要: | 惡性腫瘤常久以來居於國人十大死因之首,而女性癌症疾病尤其以女性乳癌的發生率甚高,利用癌症基因樣本做分類研究一直是學者致力於的研究目標。因此,我們採用特徵選擇法和系集分類法對乳癌作樣本的分類。
在微陣列晶片上儲存了高密度的生物探針,可以作為大量篩檢及平行分析大量基因的工具。但是癌症中的基因群並不完全對於分類演算法有顯著的影響力,我們利用特徵選取找出其中資訊含量高且依照其重要性排序的基因特徵子集,再對這些子及基因特性組合做癌症的樣本分析。
此研究使用的癌症基因資料為乳癌樣本,並且使用K個最近鄰居、二次分類法、支持向量機的個別單一分類法、個別系集分類法和整合系集分類法做癌症樣本分析。實驗結果有效率的提升辨識準確度與找出影響乳癌病因判定的最佳基因特徵組合。 Breast cancer is the main cause of death for women. Many researchers dedicate to the investigation of cancer classifications, attempting to find malignant tumors and directing therapies in early stages. Therefore, we used feature selection methods and ensemble classifier models to identify and predict on breast cancer classifications. The diagnostic data of breast cancer provide informative and significant knowledge for cancerous classifications. Thus, we apply feature selection technique to retrieve and rank the importance of attributes. Use the attributes we obtained to classify by diversifying of attribute combinations. The study used breast cancer datasets, K-nearest neighbor, Quadratic Classifier, Support Vector Machine classification of individual classifier, ensemble models and combined model to classify. The goal is to construct an efficient classification model to improve the performance of accuracy and to obtain the most significant features identifying the malignant breast cancer. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079755582 http://hdl.handle.net/11536/45927 |
顯示於類別: | 畢業論文 |