標題: Complete Fix-Free Codes for the Statistical Dictionary-Based String Matching Problem
作者: Suri, Meer
Rini, Stefano
電機工程學系
Department of Electrical and Computer Engineering
關鍵字: Dictionary-based string matching;Small vs small algorithm;Inverted index;Complete fix-free codes
公開日期: 1-Jan-2019
摘要: In the dictionary based string matching problem, the positions in which a short sequence of symbols, query, occurs in a longer sequence, source, have to be retrieved. To improve the query search efficiency, the source can be pre-processed to build an inverted index, that is a table storing positions of occurrences of certain substrings in the text. Given an inverted index, various algorithms can be utilized to retrieve the position of the query in the source. In the literature, motivated by text retrieval and web search applications, the small v/s small (SvS) algorithm is often considered for the case in which the substrings stored are k-gram codes, i.e. all possible substrings of length k. In [9] we have proposed a novel approach to the string matching problem by letting the strings generating the inverted index be a variable length code, i.e. posting code, to be optimally design for a given performance metric. Posting codes [9] generalize classic k-gram codes and enable new query retrieval algorithms with potentially lower complexity than the SvS algorithm. In this paper, we propose a new query retrieval algorithm, the forward-backward SvS (FB-SvS) algorithm which can be applied when the posting code is a complete fix-free code. The complexity of the FB-SvS algorithm for various fix-free codes is investigated and numerical experiments using the human DNA sequence as the source are presented.
URI: http://hdl.handle.net/11536/155065
ISBN: 978-1-7281-4300-2
ISSN: 1058-6393
期刊: CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS
起始頁: 1389
結束頁: 1393
Appears in Collections:Conferences Paper