狀態性預先過濾器及高效能驗證引擎構成之字串比對系統

完整后设资料纪录

DC 栏位	值	语言
dc.contributor.author	黄乃伦	en_US
dc.contributor.author	Huang, Nai-Lun	en_US
dc.contributor.author	李程辉	en_US
dc.contributor.author	Lee, Tsern-Huei	en_US
dc.date.accessioned	2014-12-12T01:24:48Z	-
dc.date.available	2014-12-12T01:24:48Z	-
dc.date.issued	2012	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT079513802	en_US
dc.identifier.uri	http://hdl.handle.net/11536/41106	-
dc.description.abstract	本论文研究旨在处理字串比对议题。字串比对为在文字串列中搜寻特定关键字（组）的程序，其中，文字串列及关键字（组）皆由已知的字符表中的字符所构成。知名且被广为采用的Aho-Corasick（AC）演算法可同时比对多个关键字，并在任何情况下都确保线性时间效能。　　本论文中，我们首先提出一种可有效节省资料结构储存空间的AC字串比对机实现方式，称之为compressed AC。相较于原始的AC字串比对机，compressed AC达到98.90%的资料结构储存空间缩减率，此外，在我们习知的所有AC-based压缩机制中，compressed AC所需的储存空间是最少的，在比对效能方面，它也具有最佳表现。　　其后，我们进而提出一套由预先过滤器（pre-filter）及验证引擎（verification engine）所组成的字串比对架构。预先过滤器旨在迅速过滤掉不可能含有关键字（组）的文字串列片段，能过滤掉愈大量的片段愈好，它藉由询问事先根据关键字（组）所建造的资料结构，在文字串列中寻找可疑的关键字出现位置，验证引擎则负责验证这些可疑的位置是否真有关键字（组）的存在。　　我们的预先过滤器可累计询问结果，称之为状态性预先过滤器（stateful pre-filter）。我们提出了极易实现的实作方式，仅需一个存放位元映象（bitmap）的暂存器及简单的位元运算即可。这个实作方式等效于采用所有过去习得的询问结果来做过滤，因此在此层面上为最佳解。我们的验证引擎是compressed AC的变型，可以同时验证所有关键字，此外，它所需的资料结构储存空间比compressed AC更少。我们称这整套字串比对架构为Pre-filter+AC，无论在比对效能或储存空间上，它的表现都比compressed AC为佳。　　达成了利用所有过去的询问结果（及现阶段的询问结果）来决定预先过滤器现阶段可以过滤掉的片段之后，我们持续致力于在现阶段的询问结果获知之前就推测得知可以过滤掉的片段，提出stateful pre-filter的进化型，称之为speculation pre-filter。speculation pre-filter能推测有多少字符可能可以被过滤掉，若其在所推测可能可以过滤掉的字符数中，采用绝对安全的策略，仅选择最少量的过滤数，我们称其为保守（conservative）。无论是在平均上或是最糟情况下，保守的（conservative）speculation pre-filter都具有比stateful pre-filter更好的比对效能，值得一提的是，最糟情况下的效能提升为双倍。　　当然，本论文研究亦存在可以改进之处，我们的预先过滤器对于处理中等长度以上的关键字（组），具有良好的吞吐效能，然而，处理长度短的关键字（组）时，效能则会降低，事实上，所有无需检验文字串列中每一字符的预先过滤器都具有这种特性。对于短的关键字（组），预先过滤机制可能无法提升（甚或会阻碍）系统效能，庆幸的是，我们的状态性设计仅需一个位元映象的暂存器及简单的位元运算之代价，即可等效于采用所有已习得的询问结果来做过滤，提升吞吐效能。再者，我们相信此特性是能够根除的，信念来自此观察：验证引擎除了验证功能之外，其实可以具备担保过滤字符数量的能力，这着实也是一个引人入胜的研究主题啊。	zh_TW
dc.description.abstract	This dissertation addresses the problem of string matching, which can be defined as a process of searching for all occurrences of given keyword(s) in a text string. Here, the text string and keyword(s) are stings of symbols over a given alphabet. The Aho-Corasick (AC) algorithm, which can match multiple keywords simultaneously and guarantee linear-time deterministic performance under all circumstances, is a well-known string matching algorithm widely adopted. We first provided a memory-efficient implementation of the AC string matching machine. The implementation, called compressed AC, achieves 98.90% reduction of memory requirement compared with the original AC string matching machine. Among all the investigated AC-based compression schemes, our implementation requires the least memory space. As for the throughput performance, it also performs the best. Then, we proposed a string matching architecture consisting of a pre-filter and a verification engine. The pre-filter queries data structures built from keywords to find the starting positions of suspicious keyword occurrences. It aims to quickly skip as much as possible the pieces not in a match, while the verification engine is responsible for confirming whether the suspicious occurrences are true matches. Our pre-filter, named stateful pre-filter, can accumulate query results. We realize this with a bitmap and simple bitwise-AND and shift operations. The proposed stateful realization is proved to be optimal in the sense that it is equivalent to utilizing all previous query results. Our verification engine is designed based on the AC algorithm so that all candidate keywords can be verified simultaneously. It is a variant of our proposed compressed AC. Because of the adoption of a pre-filter, its required data structures can be further reduced. The resulting string matching architecture, named Pre-filter+AC, performs better than compressed AC regarding both throughput performance and memory requirements. After making use of all previous query results (and the current one) to determine current skipped distance of the pre-filter, we kept devoting to speculate skipped distances before the current query result is obtained. The proposed pre-filter, named speculation pre-filter, can speculate the number of symbols that are likely to be skipped. It is called conservative if it safely speculates the minimum possible skipped distance. The conservative speculation pre-filter outperforms our previously proposed stateful pre-filter on average. Moreover, the worst case performance of it doubles that of the stateful pre-filter. It should be pointed out that our proposed pre-filters are suitable for keywords of moderate or large lengths. For short keywords, the throughput performance degrades. This is a common drawback of schemes using skip-based pre-filters. Nevertheless, the stateful design improves throughput with little cost of a bitmap and simple operations. Although pre-filtering may not help (and possibly even hurts) system performance if there are short keywords, there exist one possible remedy, which is another interesting topic. Conceptually, one can utilize the verification results to help determine the skipped distances of a pre-filter.	en_US
dc.language.iso	en_US	en_US
dc.subject	字串比对演算法	zh_TW
dc.subject	预先过滤器	zh_TW
dc.subject	验证引擎	zh_TW
dc.subject	封包检验	zh_TW
dc.subject	入侵侦测/预防	zh_TW
dc.subject	恶意程式/恶意软体	zh_TW
dc.subject	电脑病毒	zh_TW
dc.subject	搜寻引擎	zh_TW
dc.subject	string matching algorithm	en_US
dc.subject	pre-filter	en_US
dc.subject	verification engine	en_US
dc.subject	deep packet inspection	en_US
dc.subject	intrusion detection/prevention	en_US
dc.subject	malware	en_US
dc.subject	virus	en_US
dc.subject	search engine	en_US
dc.title	状态性预先过滤器及高效能验证引擎构成之字串比对系统	zh_TW
dc.title	String matching system with stateful pre-filter and efficient verification engine	en_US
dc.type	Thesis	en_US
dc.contributor.department	电信工程研究所	zh_TW
显示于类别：	Thesis

APA	黄., Huang, N., 李., & Lee, T. (2012). 状态性预先过滤器及高效能验证引擎构成之字串比对系统. http://hdl.handle.net/11536/41106.
Bibtex	@article{黄乃伦 and Huang2012, title={状态性预先过滤器及高效能验证引擎构成之字串比对系统}, author={黄乃伦 and Huang, Nai-Lun and 李程辉 and Lee, Tsern-Huei}, journal={http://hdl.handle.net/11536/41106}, year={2012}, url={https://ir.lib.nycu.edu.tw/handle/11536/41106?mode=full}, }