標題: | An efficient pattern matching scheme in LZW compressed sequences |
作者: | Lee, Tsern-Huei Huang, Nai-Lun 電信工程研究所 Institute of Communications Engineering |
關鍵字: | bit-parallelism;compressed pattern matching;information search and retrieval;LZW compression;malwares detection;string matching |
公開日期: | 1-Jul-2008 |
摘要: | Compressed pattern matching (CPM) is an emerging research field addressing the problem: given a compressed sequence and a pattern, process the sequence with minimal (or no) decompression to find the pattern occurrence(s) in the uncompressed sequence. It can be applied to detect malwares and confidential information leakage in compressed files directly. In this paper, we report our work of CPM in Lempel-Ziv-Welch (LZW) compressed sequences. We propose an efficient bitmap-based realization of the Amir-Benson-Farach algorithm. We also generalize the algorithm to find all pattern occurrences and report their absolute positions in the uncompressed sequence. Experiments are conducted to test the space requirements of our proposed generalization and two related CPM schemes which can also be realized with bitmaps. Results show that our proposed generalization requires the least amount of storage for moderate and long patterns. We also conduct experiments to compare the throughput performance of our proposed generalization with these two related CPM schemes and the decompress-then-search scheme. Results show that our proposed generalization outperforms the decompress-then-search scheme significantly. When scanning a file with pattern occurrences, our proposed generalization performs slightly better than the two related CPM schemes. The difference is significant when scanning a file with no pattern occurrence. Copyright (c) 2008 John Wiley & Sons, Ltd. |
URI: | http://dx.doi.org/10.1002/sec.32 http://hdl.handle.net/11536/8693 |
ISSN: | 1939-0114 |
DOI: | 10.1002/sec.32 |
期刊: | SECURITY AND COMMUNICATION NETWORKS |
Volume: | 1 |
Issue: | 4 |
起始頁: | 325 |
結束頁: | 335 |
Appears in Collections: | Articles |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.