標題: | 應用於LZW壓縮序列之高效能字串比對機制 Efficient Pattern Matching Scheme in LZW Compressed Sequences |
作者: | 黃迺倫 Nai-Lun Huang 李程輝 Tsern-Huei Lee 電信工程研究所 |
關鍵字: | 字串比對;壓縮字串比對;字元串映像;平行處理;字典模式壓縮法;無失真壓縮;pattern matching;compressed pattern matching;bitmap;bit-parallelism;LZW compression;CPM |
公開日期: | 2005 |
摘要: | 壓縮字串比對(CPM; compressed pattern matching)乃一新興之研究領域,著眼於此類問題:給定一壓縮序列及一字串,在最少量之解壓縮作用(或無解壓縮作用)下,於序列中尋找字串的出現(pattern occurrences)。可應用於直接在壓縮檔案中做電腦病毒及機密資訊外洩漏與否的偵查。LZW為一廣受應用且壓縮效率高的壓縮演算法,本論文即記述了我們在壓縮字串比對上,對於處理LZW壓縮序列的研究成果。著名的Amir-Benson-Farach演算法是最早能在LZW壓縮序列中尋得字串的首次出現位置的演算法,我們針對此演算法提出一以位元串映像為基礎的(bitmap-based)實現方式,同時亦將其推廣為可尋得所有的字串出現,並回報它們在序列未壓縮前所處的絕對位置。程式模擬結果顯示,相較於解壓縮後再於解開之序列中搜尋的機制,我們採用的機制具有較高的時間效能;而相較於另一套同樣以位元串映像實現的壓縮字串比對演算法──Navarro-Raffinot機制,我們的機制對於中等長度以上的字串,在所需的記憶體空間上,是較為節省的。 Compressed pattern matching (CPM) is an emerging research field addressing the problem: Given a compressed sequence and a pattern, find the pattern occurrence(s) in the (uncompressed) sequence with minimal (or no) decompression. It can be applied to detection of computer virus and confidential information leakage in compressed files directly. In this thesis, we report our work of CPM in LZW compressed sequences. LZW is one of the most effective compression algorithms used extensively. We propose a simple bitmap-based realization of the well-known Amir-Benson-Farach algorithm. We also generalize the algorithm to find all pattern occurrences (rather than just the first one) and to report their absolute positions in the uncompressed sequence. Experiments are conducted to compare the performance of our proposed generalization with the decompress-then-search scheme. We found that our proposed generalization is much faster than the decompress-then-search scheme. The memory space requirement of our proposed generalization is compared with that of the Navarro-Raffinot scheme, an alternative CPM algorithm which can also be realized with bitmaps. Results show that our proposed generalization has better space performance than the Navarro-Raffinot scheme for moderate and long patterns. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009313523 http://hdl.handle.net/11536/78338 |
顯示於類別: | 畢業論文 |