標題: 對高維度基因型資料採用貝氏分群法偵測基因間的交互作用
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
作者: 陳穗碧
Chen, Sui-Pi
黃冠華
Huang, Guan-Hua
統計學研究所
關鍵字: Dirichlet process mixtures;交互作用項;排序檢定;隨機搜索;Dirichlet process mixtures;Epistasis;Permutation test;Stochastic search
公開日期: 2011
摘要: 對人類遺傳學家而言,偵測出複雜疾病的致病基因是一項很重要的挑戰,可惜傳統的統計分析方法很難明確找出致病基因間的交互作用項 (epistasis)。發展至今,有三種類別的方法可處理此問題:第一類,是由傳統的logistic迴歸模型變形而來,但是此方法不能描述致病基因和疾病的非線性關係;第二類,經由資料探勘或機械學習企圖以地毯式的搜尋,一層一層根據不同交互作用項的個數來找出最相關的致病基因,著名的方法有MDR 和CART。但隨著基因組學技術的進步,從高維度資料找出致病基因的交互作用項所需要計算的數量與時間也跟著相對增加,問題的複雜程度無法以現今的電腦進行運算。第三種類別是利用貝式模型來找出與疾病有關的SNP和SNP間的交互作用項,Bayesian epistasis association mapping (BEAM) 是這類方法中最有代表性的演算法。 本論文提出一個新的模型-the Bayesian clustering for detecting epistasis (BCDE)model-是採用貝式分群方法對BEAM的模型進行微幅修正以用來偵測基因間的交互作用項。 BCDE 模型利用Dirichlet process mixtures 把單一核甘酸多型性分群並使用Gibbs weighted Chinese restaurant 演算法來生成分群變數的後驗分配。BEAM 把所有單一核甘酸多型性只分成三類,但BCDE模型卻沒有這樣的限制,可以適用於任何分群的組合。另外,我們更進一步發展一套排序檢定來確認那些由BCDE模型所找出和疾病有關聯性的單一核甘酸多型性組合,使得所產生出來的分析結果比較不受模型和先驗分配假設的影響。最後利用多種模擬資料和精神分裂症的資料來比較BCDE和BEAM那個方法比較能有效找出真正和疾病相關的交互作用項。
The detection of susceptibility genes for complex disease is a major challenge for human geneticists. The phenomenon of epistasis, or gene-gene interactions, is particularly difficult to handle for traditional statistical techniques. Over the past few decades, three kind of approaches have been proposed to address this issue. First, a set of approaches modified logistic regression have the direct ability to interpret the result. But it is limited by the parametric model which does not describe the nonlinear relationship between the epistasis and the phenotype. Second, data-mining or machine-learning methods, such as MDR and CART, do not fit a single prespecified model, but rather they attempt to step through the space of possible models in a computationally efficient way to address the problem from the regression-based approach. However, as genomic technologies rapidly advance, the explosion of epistasis makers make exhaustive searches of multilocus combinations computationally infeasible. Bayesian model selection techniques offer an alternative approach for selecting loci and the interactions between them that are the best predictors of phenotype. A representative algorithm is Bayesian epistasis association mapping (BEAM). This paper applies a Bayesian formulation of a clustering procedure for identification of gene-gene interactions under case-control studies, called Bayesian clustering for detecting epistasis (BCDE) model. BCDE model uses the Dirichlet process mixtures to model SNP marker partitions and the Gibbs weighted Chinese restaurant sampling to simulate posterior distributions of these partitions. Unlike the representative Bayesian epistasis detection algorithm BEAM where markers are partitioned into three groups, BCDE model can be evaluated at any given partition, regardless of the number of groups. We further develop a permutation test to validate the disease association for SNP subsets identified by BCDE model, which can yield results that are more robust to model specification and prior assumptions. Performance of BCDE model and comparison with BEAM are examined on various simulated data and a schizophrenia SNP dataset.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079226514
http://hdl.handle.net/11536/40422
顯示於類別:畢業論文