標題: | 利用多功能的貝氏程序由序列片段深度資料偵測基因體拷貝數變異 A Multifunctional Bayesian Procedure for Detecting Copy Number Variations from Sequencing Read Depths |
作者: | 魏裕中 Wei, Yu-Chung 黃冠華 Huang, Guan-Hua 統計學研究所 |
關鍵字: | 貝氏推論;拷貝數變異;次世代定序;可逆躍式馬可夫鏈蒙地卡羅法;Bayesian inference;Copy number variations;next generation sequencing;reversible jump Markov chain Monte Carlo |
公開日期: | 2013 |
摘要: | 拷貝數變異是由長度數百至數百萬鹼基對的片段缺失或插入,所造成基因序列非正常倍數的重複。而次世代定序技術的進步有助於更精準的偵測此類型的變異,其中利用定序片段堆疊出的深度資料能較直接的反應出拷貝數的變異。目前由基因序列深度估計拷貝數變異的各種方法中,由於模型基本假設的差異,因此通常僅能應用在連續的全基因體資料或片段的外顯子區域資料;此外,又由於處理資料的程序不同,這些分析工具僅能使用單一樣本偵測基因序列中絕對的拷貝數,或者須使用病例對照的雙樣本資料來估計患病者獨特的相對拷貝數變異。因為現有方法擁有眾多限制,若能提出一個能廣泛應用於各種資料型態的拷貝數變異偵測方法是具有挑戰性但是富有價值的。
本論文提出一套由定序資料偵測拷貝數變異的完整程序,稱為CONY,其利用貝氏階層式模型縝密地建立資料與拷貝數變異間的關係,進而運用可逆躍式馬可夫鏈蒙地卡羅法並創新各狀態間的跳躍方式,以估計具有拷貝數變異的區域。模型中納入各資料點的相對位置資訊,使得本方法不受限於連續的全基因體資料,也同時適用於不連續的外顯子資料;此外,利用簡單的資料轉換與適當的參數設定,將此方法廣泛的應用於單一樣本與雙樣本的資料,以偵測絕對的拷貝數與病患特定的相對拷貝數。
另一方面,現有的方法常因定序資料涵蓋量的多寡而影響偵測拷貝數變異的準確性,且易受限於模型的設定,僅能偵測特定大小的拷貝數變異區域;本文修改計算序列片段深度的方法,以降低資料涵蓋量不足而造成估計準確度的影響,並選擇適當的分析片段長度,以有效偵測各種大小的拷貝數變異區域。千人基因組計畫資料與模擬資料將應用於探討本程序的整體表現。 Copy number variations (CNVs) are genomic structural mutations with abnormal gene fragment copies. Read depths signal mirrors the variants directly from the next generation sequencing data. Some tools have been published to predict CNVs by depths, but most of them just apply to a specific data type. Providing a multifunctional detection algorithm that can easily make use of a variety of data types is difficult but valuable. We develop a multifunctional COpy Number variation detection tool by a BaYesian procedure, CONY, which adopts an efficient reversible jump Markov chain Monte Carlo inference algorithm for analyzing sequencing read depths. CONY is suitable for reads from both whole genome and targeted exome sequencing. Additionally, CONY can be applied not only to an individual for estimating the absolute number of copies but also to case-control samples for detecting patient specific variations. We evaluate the performance of CONY and compare it with competing approaches using both simulations and real data from the 1000 Genomes Project. Gap settings for keeping the depth location information and analytic genomic length decisions for reducing the statistical unbalanced effects would enhance the CNV capability in this procedure. Moreover, CONY can have a great performance regardless of high or low data coverage. CONY outperforms other existing methods in accuracy for both whole genome and targeted exome data. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079726801 http://hdl.handle.net/11536/74782 |
Appears in Collections: | Thesis |