標題: | Complete Fix-Free Codes for the Statistical Dictionary-Based String Matching Problem |
作者: | Suri, Meer Rini, Stefano 電機工程學系 Department of Electrical and Computer Engineering |
關鍵字: | Dictionary-based string matching;Small vs small algorithm;Inverted index;Complete fix-free codes |
公開日期: | 1-Jan-2019 |
摘要: | In the dictionary based string matching problem, the positions in which a short sequence of symbols, query, occurs in a longer sequence, source, have to be retrieved. To improve the query search efficiency, the source can be pre-processed to build an inverted index, that is a table storing positions of occurrences of certain substrings in the text. Given an inverted index, various algorithms can be utilized to retrieve the position of the query in the source. In the literature, motivated by text retrieval and web search applications, the small v/s small (SvS) algorithm is often considered for the case in which the substrings stored are k-gram codes, i.e. all possible substrings of length k. In [9] we have proposed a novel approach to the string matching problem by letting the strings generating the inverted index be a variable length code, i.e. posting code, to be optimally design for a given performance metric. Posting codes [9] generalize classic k-gram codes and enable new query retrieval algorithms with potentially lower complexity than the SvS algorithm. In this paper, we propose a new query retrieval algorithm, the forward-backward SvS (FB-SvS) algorithm which can be applied when the posting code is a complete fix-free code. The complexity of the FB-SvS algorithm for various fix-free codes is investigated and numerical experiments using the human DNA sequence as the source are presented. |
URI: | http://hdl.handle.net/11536/155065 |
ISBN: | 978-1-7281-4300-2 |
ISSN: | 1058-6393 |
期刊: | CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS |
起始頁: | 1389 |
結束頁: | 1393 |
Appears in Collections: | Conferences Paper |