生成基因表現量晶片資料方法之研究

標題:	生成基因表現量晶片資料方法之研究 A Study on Simulating Realistic Gene Expression Microarray Data
作者:	黃冠華 Huang Guan-Hua 國立交通大學統計學研究所
關鍵字:	艾菲爾基因晶片;雲端運算;基因表現量微陣列晶片;微陣列資料庫;平行運算;模擬;Affymetrix GeneChip;cloud computing;gene expression microarray;microarray data archive;parallel computing;simulation
公開日期:	2012
摘要:	微陣列晶片已經成為一種廣泛被應用的基因技術，許多分析方法也應運而生。我們嘗試建立經驗模型去模擬每個基因的基因表現量，這些模擬的基因表現量可用於評估各種分析方法。為了達到基因組織的多樣性，我們蒐集在Gene Expression Omnibus與ArrayExpress這兩資料庫儲存的基因原始表現資料，我們著重的平臺是艾菲爾(Affymetrix)公司所製造的HG-U133A基因晶片。將這些資料經過預處理後，可得到22283個基因表現量的經驗分配模型。我們運用這22283個分配去模擬基因表現量。在此計畫我們將提供模擬方法的步驟，並嘗試模擬了多組不同片數的嵌釘(spike-in)資料，觀察基因表現量模擬值和原始值的差異。本計畫亦將透過OpenMP與MPI平行運算，使得程式在執行大量基因晶片預處理計算的時間縮短，並且在高效能個人電腦工作站、國家高速電腦中心與Amazon EC2雲端運算三種不同電腦環境上運作，觀測他們的平行效率。由此得到的結果與經驗，將有可用於未來執行高維度基因資料分析之所需。本計畫原定自上一年度分三年期執行，但僅獲通執行一年期計畫。我們現已完成運用高效能平行運算，來執行大量基因晶片預處理計算，及其效率的評估。本年度將接續其餘未完成部分，加以執行。 Microarray gene expression analysis has become one of the most widely used functional genomics tools. Since that, many analytical methods have been proposed. It is desirable to develop realistic models that can be applied in simulating expression values of each gene, and can then be used to assess the analysis methods and testing approaches. In this project, we plan to download publicly available raw data of the Affymetrix HG-U133A platform for various tissues from two public repositories: Gene Expression Omnibus and ArrayExpress. Then, an empirical approach is developed to determine the distribution of expression intensity for each gene, which can be used to simulate realistic gene expression data. The proposed method has several unique features that resolve the shortage of previous research. To evaluate the proposed simulating approach, we will examine the distributions of housekeeping genes, compare the simulated and real gene expression data, and simulate gene expression intensities, which mimic the expression patterns shown in the HG-U133A tag spike-in dataset, to determine the sensitivity and specificity of various differential expression detecting methods. This project also attempts to use OpenMP and MPI parallel computing to reduce computing time when reprocessing the large amount of downloaded microarray raw data. We will compare the parallel efficiency of OpenMP and MPI in the high efficient personal workstation, the National Center for High-performance Computing and the Amazon EC2 cloud computing environment. The results and experiences gained from this experiment can be applied to future high-dimensional genomic data computation. This study was proposed for a three-year project on year 2011, but we only obtained funding for one year. Our team has finished implementing high efficient parallel computing for reprocessing the large amount of downloaded microarray raw data. We are now planning to continue the un-done part of the study in the coming two years.
官方說明文件#:	NSC101-2118-M009-004-MY2
URI:	http://hdl.handle.net/11536/98666 https://www.grb.gov.tw/search/planDetail?id=2590004&docId=391241
Appears in Collections:	Research Plans