Title: | 建構節省空間的去氧核醣核酸序列後置樹 Space Economical Suffix Trees for DNA Sequences |
Authors: | 陳昀慶 Chen, Yun-Ching 李素瑛 Lee, Suh-Yin 資訊科學與工程研究所 |
Keywords: | 後置樹;核醣核酸;Suffix Trees;DNA Sequences;Prefix Table;Bit Layout |
Issue Date: | 2002 |
Abstract: | 近年來由於生物資料大量增加,使得生物資訊學變得越來越重要。因為利用電腦科技才能對這樣大量的資料進行快速的分析。在資訊領域中,後置樹是一個快速且強大的字串分析資料結構,且剛好能解決許多基因體序列的相關問題。但它唯一的致命缺點,就是在處理龐大的基因序列時會耗費太多記憶體空間。因此,我們在這篇論文中利用基因體序列的特性縮小後置樹所需要使用的空間。我們對每一個節點提出了更少資料量的儲存方式。再者,我們利用自己提出的前置陣列取代靠近樹根的眾多節點,讓使用空間更進一步的縮小。其三,我們根據新型後置樹的架構提出了一個加快建構速度的前置處理方法。最後,實驗結果顯示在針對基因體序列的表現上,我們提出的架構使用最少的空間,且在建構速度上也有不錯的表現。 In recent years, bioinformatics becomes an important research field because there are more and more genetic data to be analyzed. The suffix tree is a powerful data structure for string analysis and has many applications on bioinformatics. Besides, its linear construction time, linear construction space and short search time all make it very impressive. However, consuming huge space is a fatal drawback especially while using suffix trees to handle the large amount of DNA sequences. In this thesis, we utilize some characteristics of DNA sequences to reduce the space requirement of suffix trees. A new bit layout is proposed for the node of a suffix tree which requires less space than others. We also use an index table, called “prefix table”, which can reduce the number of internal nodes in suffix trees. In addition, we propose a preprocessing technique to improve the construction time based on our data structure. The experiments shows that our proposed method is the most space-parsimony implementation of suffix trees for DNA sequences and it also has a good performance in construction time. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910392022 http://hdl.handle.net/11536/70093 |
Appears in Collections: | Thesis |