標題: 次世代定序配接序列偵測演算法實作及其效能驗證
The Implementation of a Paired-End Adapter Sequence Detecting Algorithm and its Performance Evaluation
作者: 翁瑞成
Weng, Jui-Cheng
洪瑞鴻
Hung, Jui-Hung
生物資訊及系統生物研究所
關鍵字: 配接裁剪器;次世代定序;adapter trimmer;NGS
公開日期: 2013
摘要: 次世代定序已經成為現代檢測基因組遺傳信息的主要技術;而在此技術下所定序出的短去氧核醣核酸(DNA)序列,會被定序時的添加物-配接器(Adapter)序列所污染。然而,我們認為現在仍不存在一個有效率的方法去移除這些污染物。雖然這些短去氧核醣核酸序列約只佔整體定序產量的少部分,但這其中仍存在著對研究人員有用的資訊。 為此我們開發出一個精準且快速的雙端定序(paired-end sequencing)配接器偵測演算法,我們命名為PEAT (Paired-End Adapter Trimmer)。PEAT專門用來處理雙端定序的序列資料。除了優異的速度與精確度之外,PEAT最大的優點是在使用時不需提供配接器序列;對目前許多配接器偵測演算法來說,提供的配接器序列長短與正確性直接的影響到該工具的效能,而PEAT則省去了輸入配接器序列的麻煩,尤其適合大規模處理不同來源的定序資料。 其後,我們使用了模擬資料和從GEO資料庫下載的真實定序資料來與目前現有的配接器偵測工具做效能比較。在模擬資料與真資料的驗證上,PEAT的效能都優於其他工具。最後,我們再找了三組不同定序應用的資料組 (ChIP-seq、RNA-seq和MNase-seq),配合各自的下游分析來比較PEAT和Bowtie2-aligner在區域排比(local alignment)模式下所處理過的資料對回參照序列後會有怎麼樣的表現;在這些分析的結果中,我們也再度驗證了PEAT所處理過的資料,會有更多的短序列被保留,並對資料的解讀有顯著的影響。我們因此建議次世代定序資料分析時宜採用高效的配接器偵測演算法。
Next Generation Sequencing (NGS) has become the choice of detecting genetic information in cells. In a typical sequencing protocol, short DNA fragments sequenced by sequencers are contaminated by adapter sequences—an addition of unwanted sequence on the 3’ end. Unfortunately, a robust method that could deal with such contamination efficiently is still absent. Although the short DNA fragments only occupy about a small portion of the entire library, it may still possess useful information for researchers. For this reason, we herein propose an accurate and fast paired-end adapter detecting algorithm, namely PEAT (Paired-End Adapter Trimmer). PEAT is specially designed for processing paired-end sequencing data. The biggest advantage of PEAT besides its speed and accuracy is the non-requirement of a priori adapter sequences; for some other adapter detecting algorithms, the size of provided adapter sequences directly influences the performance of the trimming tools, which is not a problem for PEAT. This feature makes PEAT especially suitable for processing different sequencing datasets in large-scale analysis. We used the simulation and true sequencing datasets downloaded from GEO Library to conduct performance evaluation between PEAT and other existing adapter detecting tools. In the tests, PEAT outperformed all the other tools we tested. Additionally, we downloaded real life datasets from three different types of experimental applications (i.e., ChIP-seq, RNA-seq and MNase-seq) and processed them along with specific downstream analyses to compare the results with and without PEAT. Based on the results, we again verified that the data processed by PEAT preserved more short reads and helped correct interpretation of the data. We therefore strongly recommend the use of an efficient and accurate adapter trimmer, like PEAT, for the analysis of NGS data for all sorts of applications.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070157216
http://hdl.handle.net/11536/75469
顯示於類別:畢業論文