標題: Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems
作者: Cheng, CS
Shann, JJJ
Chung, CP
資訊工程學系
Department of Computer Science
關鍵字: inverted index compression;inverted file;prefix-free coding;interpolative coding;fast decoding
公開日期: 1-三月-2006
摘要: This paper presents a size reduction method for the inverted file, the most suitable indexing structure for an information retrieval system (IRS). We notice that in an inverted file the document identifiers for a given word are usually clustered. While this Clustering property can be used in reducing the size of the inverted file, good compression as well as fast decompression must both be available. In this paper, we present it method that can facilitate coding and decoding processes for interpolative coding using recursion elimination and loop unwinding. We call this method the unique-order interpolative coding. It can calculate the lower and upper bounds of every document identifier for a binary code without using a recursive process, hence the decompression time can be greatly reduced. Moreover, it also can exploit document identifier Clustering to compress the inverted file efficiently. Compared with the other well-known compression methods, our method provides fast decoding speed and excellent compression. This method can also be used to support a self-indexing strategy. Therefore our research work in this paper provides a feasible way to build a fast and space-economical IRS. (c) 2005 Elsevier Ltd. All rights reserved.
URI: http://dx.doi.org/10.1016/j.ipm.2005.02.002
http://hdl.handle.net/11536/12595
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2005.02.002
期刊: INFORMATION PROCESSING & MANAGEMENT
Volume: 42
Issue: 2
起始頁: 407
結束頁: 428
顯示於類別:期刊論文


文件中的檔案:

  1. 000233061300005.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。