標題: 壓縮檔案的字樣比對
Pattern Matching in Compressed Files
作者: 邱登煌
關鍵字: 字樣比對;壓縮;pattern matching
公開日期: 2006
摘要: 字樣比對是一門重要的技術,可以在檔案中搜尋特定的內容,我們可以將它應用在病毒掃描和資料檢索上。隨著資料壓縮的使用變得愈來愈普遍,對壓縮檔案進行字樣比對是無可避免的,針對這種情況,我們必須提供有效的方法來提升比對的效率。在這篇論文中,我們針對病毒掃描和資料檢索於壓縮檔案上的應用,提出能夠提升比對效率的方法。我們提出可以對gzip壓縮檔案進行串流掃描的機制,當封包陸續到達閘道口時,對它們進行即時地掃描。另外,我們提出可以在不解壓縮的情況下,對LZW壓縮檔案進行正規表示式比對的機制,在短字樣的比對上,具有比解壓縮後再比對更好的效率,我們可以將這個機制應用在資料檢索系統上。
Pattern matching is an important technique and it can be use to search specific contents in the files. We can apply pattern matching to virus detection and information retrieval. As data compression becomes more and more popular, the use of pattern matching in compressed files is avoidless, we must provide available approaches to improve the efficiency of search for this situation. This thesis presents the approaches for the applications of virus detection and information retrieval in compressed files to improve the efficiency. We propose the scheme of stream-based scanning in gzip compressed files, when packets arrive at Gateway continually, we can scan them immediately. Besides, we present the scheme of pattern matching for regular expression in LZW files with no decompression and it has better efficiency than decompress and then search in short patterns. We can apply the scheme to information retrieval system.