標題: Inverted file compression through document identifier reassignment
作者: Shieh, WY
Chen, TF
Shann, JJJ
Chung, CP
資訊工程學系
Department of Computer Science
關鍵字: information retrieval;inverted file;d-gap;document identifier reassignment;traveling salesman problem
公開日期: 1-Jan-2003
摘要: The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%. (C) 2002 Elsevier Science Ltd. All rights reserved.
URI: http://dx.doi.org/10.1016/S0306-4573(02)00020-1
http://hdl.handle.net/11536/28202
ISSN: 0306-4573
DOI: 10.1016/S0306-4573(02)00020-1
期刊: INFORMATION PROCESSING & MANAGEMENT
Volume: 39
Issue: 1
起始頁: 117
結束頁: 131
Appears in Collections:Articles


Files in This Item:

  1. 000180495500006.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.