標題: 在微陣列資料上利用基因分群以減少冗贅之基因選取方法
Redundancy-Reducing Feature Selection from Microarray Data Based on Gene-Grouping
作者: 張寶文
Bao-wen Chang
洪志真
洪慧念
Jyh-Jen Horng Shiau
Hui-Nien Hung
統計學研究所
關鍵字: 基因微陣列;基因選取;群集分析;microarray;gene selection;clustering
公開日期: 2003
摘要: 微陣列資料集通常包含數千個基因,但只有數十個樣本。這種所謂“大p (基因),小n (樣本)”的特性會為統計分析帶來一些困難。基因選取是處理這類問題的一種典型方法。其中,Filters和wrappers是兩種常用的基因選取方法。Filters利用一個排序準則來判斷一個基因是否被選取;因此,這種方法在計算上非常快速,但可能選到高度相關的基因而造成冗贅。另一方面,wrappers通常能夠選取一個不冗贅的基因子集但卻需要龐大的運算量。這篇研究中採用上述二種方法的組合。我們先根據一個排序準則過濾掉對分類無益的基因,再利用K-means分群演算法對其餘基因分群以避免冗贅。然後,應用Guyon et al. (2002) 所提出的SVM-RFE基因選取方法於自每群選出的候選基因。最後,我們利用所提出的方法來分析三個常見的癌症資料集。其結果顯示,當選出的基因數目少時,我們的方法表現地比所討論的三種filters好。
A microarray dataset contains thousands of genes but only tens of subjects in general. This so-called “large (gene), small (subject)” feature brings about some difficulties to statistical analysis. Gene selection is a typical approach to deal with this problem. There are two conventional gene selection methods, filters and wrappers. Filters judge whether a gene should be selected based on a ranking criterion; therefore, they are very fast in computation but might select highly correlated genes that give rise to redundancy. On the other hand, wrappers usually select a small set of non-redundant genes but require extensive computation. A combination of these two methods is adopted in this study. We first filter out irrelevant genes according a ranking criterion and then group the rest to avoid redundancy via K-means clustering algorithm. Then, the SVM-RFE gene selection method proposed by Guyon et al. (2002) is applied to a list of candidate genes selected from each cluster. Three popular cancer data sets are analyzed by means of the proposed method. The results show that the proposed method performs better than three filter methods under study when the number of selected genes is small.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009126506
http://hdl.handle.net/11536/55423
顯示於類別:畢業論文


文件中的檔案:

  1. 650601.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。