标题: | 在微阵列资料上利用基因分群以减少冗赘之基因选取方法 Redundancy-Reducing Feature Selection from Microarray Data Based on Gene-Grouping |
作者: | 张宝文 Bao-wen Chang 洪志真 洪慧念 Jyh-Jen Horng Shiau Hui-Nien Hung 统计学研究所 |
关键字: | 基因微阵列;基因选取;群集分析;microarray;gene selection;clustering |
公开日期: | 2003 |
摘要: | 微阵列资料集通常包含数千个基因,但只有数十个样本。这种所谓“大p (基因),小n (样本)”的特性会为统计分析带来一些困难。基因选取是处理这类问题的一种典型方法。其中,Filters和wrappers是两种常用的基因选取方法。Filters利用一个排序准则来判断一个基因是否被选取;因此,这种方法在计算上非常快速,但可能选到高度相关的基因而造成冗赘。另一方面,wrappers通常能够选取一个不冗赘的基因子集但却需要庞大的运算量。这篇研究中采用上述二种方法的组合。我们先根据一个排序准则过滤掉对分类无益的基因,再利用K-means分群演算法对其余基因分群以避免冗赘。然后,应用Guyon et al. (2002) 所提出的SVM-RFE基因选取方法于自每群选出的候选基因。最后,我们利用所提出的方法来分析三个常见的癌症资料集。其结果显示,当选出的基因数目少时,我们的方法表现地比所讨论的三种filters好。 A microarray dataset contains thousands of genes but only tens of subjects in general. This so-called “large (gene), small (subject)” feature brings about some difficulties to statistical analysis. Gene selection is a typical approach to deal with this problem. There are two conventional gene selection methods, filters and wrappers. Filters judge whether a gene should be selected based on a ranking criterion; therefore, they are very fast in computation but might select highly correlated genes that give rise to redundancy. On the other hand, wrappers usually select a small set of non-redundant genes but require extensive computation. A combination of these two methods is adopted in this study. We first filter out irrelevant genes according a ranking criterion and then group the rest to avoid redundancy via K-means clustering algorithm. Then, the SVM-RFE gene selection method proposed by Guyon et al. (2002) is applied to a list of candidate genes selected from each cluster. Three popular cancer data sets are analyzed by means of the proposed method. The results show that the proposed method performs better than three filter methods under study when the number of selected genes is small. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009126506 http://hdl.handle.net/11536/55423 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.