標題: A statistics-based approach to incrementally update inverted files
作者: Shieh, WY
Chung, CP
資訊工程學系
Department of Computer Science
關鍵字: information retrieval;inverted file;incremental update;statistical approach;spare space
公開日期: 1-三月-2005
摘要: Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, requires inefficient reorganization when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for incremental updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization reduction and space utilization. Experiment results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected. (C) 2003 Elsevier Ltd. All rights reserved.
URI: http://dx.doi.org/10.1016/j.ipm.2003.10.004
http://hdl.handle.net/11536/13941
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2003.10.004
期刊: INFORMATION PROCESSING & MANAGEMENT
Volume: 41
Issue: 2
起始頁: 275
結束頁: 288
顯示於類別:期刊論文


文件中的檔案:

  1. 000225323100007.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。