Title: 非線性基因選取方法
Nonlinear Gene Selection Method
Authors: 張淑淨
Shu-Jing Chang
洪慧念
洪志真
Hui-Nien Hung
Jyh-Jen Horng Shiau
統計學研究所
Keywords: 非線性;基因選取;Nonlinear;Gene Selection
Issue Date: 2003
Abstract: 微生物晶片資料通常包含的基因數非常多(數千個),但相對的腫瘤樣本數不到100個。從這些大量的基因中去挑選對於分類具有顯著關係的基因稱為基因選取(gene or feature selection)。我們在本文中回顧了一些基因選取的方法以及統計學家對於"大p小n "問題的處裡。我們著重的方法是Support Vector Machines (SVMs),將從模擬實驗去探討線性以及非線性分類問題。對於線性分類問題,我們主要探討基因之間相關性的影響和資料具有部份重疊(overlap)的情況;對於非線性分類問題,我們使用兩種基因選取方法,並比較其重要基因的選取結果及分類精確度。
Microarray data contains large number of p genes (usually several thousands) and small number of n patients (usually nearly 100 or less). The problem of identifying the features best discriminate among the classes to improve the ability of a classifier is known as feature selection. Some current feature selection methods and the problem of dealing with "large p, small n" are reviewed. The Support Vector Machines (SVMs) has proofed excellent performance in practice as a classification methodology. For linear classification problem, this paper studies the following two issues: (i) the number of one gene s surrogates somehow affects the importance of the gene; (ii) the case of overlapping classes. For nonlinear classification problem, we utilize two procedures: 1. mapping the original nonlinear separable data to the high dimension space, and then use SVM RFE with linear kernel to find crucial genes; 2. using SVM RFE with nonlinear kernel. Then we compare these two methods on nonlinear toy problem.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009126522
http://hdl.handle.net/11536/55590
Appears in Collections:Thesis


Files in This Item:

  1. 652201.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.