標題: 有效率的位元平行演算法以解決近似字串比對問題
Efficient Bit-Parallel Algorithms for Solving the Approximation String Matching Problem
作者: 盧錦隆
LU CHIN LUNG
國立交通大學生物科技學系(所)
關鍵字: 近似字串比對;樣本辨識;計算生物;動態規劃;位元平行計演算法;邏輯向量;邏輯運算;Approximation String Matching;Pattern Recognition;Computational Biology;Dynamic Programming;Bit-Parallel Algorithm;Logical Vector;Logical Operation
公開日期: 2011
摘要: 在傳統的字串演算法裡,近似字串比對是主要的問題之一,因為它可應用在許多不同的地方包括文章 搜尋、字串辨識與計算生物等等。給予一個全文字串T t1t2tn  與一個樣本字串m P p1 p2p  ,近 似字串比對問題的目的是要去尋找全文字串裡所有的位置使得結束在這些位置的子字串都與樣本字 串之間的編輯距離小於或等於一個事先給定的錯誤邊界值k,或者同等於對全文字串裡的每一個位置 i,去找出T(1,i)的字尾使得這個字尾與樣本字串之間的編輯距離小於或等於k,其中n T t1t2t  (即T 中長度為i 的字首)。傳統解決這個近似字串比對問題的方法就是利用一個動態規劃演算法去填寫一個 大小為(m+1)×(n+1)的矩陣,因此它的時間複雜度為O(mn)。到目前為止,這個動態規劃演算法的執行 效能已經被利用許多不同的方法加以改進了許多次,其中之一便是所謂的位元平行演算法,它是利用 硬體上位元運算的平行特行來加速執行效能。如果同時允許插入、刪除與替換這三種編輯指令的話, 那麼Myers 在1999 年所提出的位元平行演算法是目前最好的之一,時間複雜度為O(mn/w),其中w 為電腦字的長度。然而,在實際的應用上,位元平行演算法經常得飽受一個嚴重的問題,那就是樣本 字串的長度得受限於電腦字的長度。如何讓位元平行演算法能以一種比較有效率的方法來處理長的樣 本字串目前仍是一個挑戰。因此在本計劃中,我們將去研究如何利用邏輯向量與邏輯運算來設計出一 個比較有效率的位元平行演算法可以打破樣本字串的長度限制並且能夠更有效率地來解決近似字串 比對問題。
Approximate string matching is one of the main problems in classical string algorithms, with applications to text searching, pattern recognition, computational biology, etc. Given a text string T t1t2tn  and a pattern string m P p1 p2p  , the approximation string matching problem is to find all text positions such that there exists substrings ending at these positions which match P with up to edit distance less than or equal to a pre-specified error bound k, or equivalently for each location i in text T , to find the suffix of T(1,i) whose edit distance with P is less than or equal to k , where i T i t1t2t (1, )  (i.e., the prefix of length i in T). The classical solution to the approximation string matching problem is based on filling out an (m+1)× (n+1) dynamic programming matrix and therefore needs O(mn) time. So far, the performance of this dynamic programming method has been further improved many times using several different approaches. One thread of these approaches is the so-called bit-parallel algorithms that exploit the hardware parallelism of bit-vector operations to speed up their performance. If all the edit operations of insertions, deletions and substitutions are allowed, the bit-parallel algorithm proposed by Myers in 1999 is one of the best current algorithms for the approximation string matching problem and its time complexity is O(mn/w), where w is the computer word size. In practical applications, however, the bit-parallel algorithms often suffer from a serious problem that the pattern length is limited by the computer word size. Currently, it is still a challenge how to make the bit-parallel algorithms to deal with long patterns in a more efficient way. In this project, therefore, we are going to study how to use logical vectors and operations to design a more efficient bit-parallel algorithm so that it can break through the limitation of pattern length and solve the approximation string matching problem more efficiently.
官方說明文件#: NSC100-2221-E009-101
URI: http://hdl.handle.net/11536/99049
https://www.grb.gov.tw/search/planDetail?id=2324637&docId=364082
顯示於類別:研究計畫