高度可平行化的長片段生物序列之FM-index變體

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	李昀龍	en_US
dc.contributor.author	Li, Yun-Lung	en_US
dc.contributor.author	洪瑞鴻	en_US
dc.contributor.author	Hung, Jui-Hung	en_US
dc.date.accessioned	2015-11-26T00:55:19Z	-
dc.date.available	2015-11-26T00:55:19Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070157219	en_US
dc.identifier.uri	http://hdl.handle.net/11536/125690	-
dc.description.abstract	基於Burrows-Wheeler轉換之Ferragina-Manzini index(FM-index)目前已經被廣泛地應用在針對基因體序列進行快速字串搜尋的應用中。在實際應用FM-index時，我們會進入建表階段，於其中針對目標字串建立index資料結構，據此我們便可在之後的字串搜尋階段中，利用index資料結構達到在目標字串中進行快速字串搜尋的目的。在現有技術中，FM-index的建表操作需執行大量的運算，而且整個建表實作亦十分複雜，且整體的執行效率不彰。而de novo assembly一類利用FM-index來進行overlap graph資料結構建立的應用來說，其需動態地(On-the-fly)進行FM-index建表操作，而 FM-index低落的建表效率亦使得此類應用的整體運算效率不彰。在本論文中，我們提出了k-ordered FM-index，其可透過簡化的演算法來完成其資料結構的建表，同時仍可針對高度重複(Highly Repetitive)的目標字串支援快速的字串搜尋操作。此外，簡化的k-ordered FM-index建表演算法中的桶排序(Bucket sort)階段更具有可輕易利用硬體平行來加速的特性，這使得k-ordered FM-index整體的建表階段可以在未來，進一步地利用硬體平行的方，式來進行加速。	zh_TW
dc.description.abstract	The Ferragina-Manzini index (FM-index) derived from the Burrows-Wheeler transform (BWT) is broadly used for fast string searching in genomes. An indexing phase, where the index structure of a target text is constructed, is entered beforehand, so that efficient string searching among the indexed target text can be facilitated in the following string searching phase. Conventionally, FM-index construction takes a great deal of computation, and the implementation of FM-index construction is also extremely complex and handled in an ineffective manner. On-the-fly indexing demanding applications, such as constructing overlap graphs for de novo assembly, implemented with current FM-index schemes also suffers greatly from low performance. In this work, we provide a k-ordered FM-index that can be built with significantly simplified algorithms, and still suffices fast string searching and matching in highly repetitive references. Besides, the proposed k-ordered FM-index is constructed with a much-simplified algorithm with a bucket sort stage that is highly parallelizable, so that performance of the indexing phase of the proposed k-ordered FM-index can be further enhanced with future parallelization implementation.	en_US
dc.language.iso	en_US	en_US
dc.subject	次世代定序	zh_TW
dc.subject	FM-index	en_US
dc.subject	Ferragina-Manzini index	en_US
dc.subject	sort transform	en_US
dc.subject	next generation sequencing	en_US
dc.title	高度可平行化的長片段生物序列之FM-index變體	zh_TW
dc.title	A highly parallelizable FM-index variant for long biological sequences	en_US
dc.type	Thesis	en_US
dc.contributor.department	生物資訊及系統生物研究所	zh_TW
顯示於類別：	畢業論文