標題: Posting file partitioning and parallel information retrieval
作者: Ma, YC
Chen, TF
Chung, CP
資訊工程學系
Department of Computer Science
公開日期: 15-Aug-2002
摘要: The rapid growth in Internet usages brings new challenges on designing a scalable information retrieval system. To reduce the response time of a query to a large database, we parallelize both CPU computation and disk access of Boolean query processing on a cluster of workstations. The key issue is to partition the inverted file such that, during parallel query processing, each workstation consults only its own locally resident data to complete its task. To achieve this goal, we treat the set of all postings referring to a document ID as an object to be allocated in the develop data placement problem. Following the partitioning by document ID principle, we develop posting file partitioning algorithms to transform a sequential information retrieval system to a parallel information retrieval system. The advantage is that a better speed-up can be achieved by deriving from the fast sequential approach the compressed posting file. The partitioning schemes are designed to balance work-load of workstations in parallel query processing without increasing the average disk access time per posting. The experiment shows that almost linear speed-up can be achieved and the performance bottleneck in previous work, which parallelize only disk access, can be removed. This work shows that, by using parallel processing technique, it is feasible to build a scalable information retrieval system. (C) 2001 Elsevier Science Inc. All rights reserved.
URI: http://dx.doi.org/10.1016/S0164-1212(01)00119-4
http://hdl.handle.net/11536/28582
ISSN: 0164-1212
DOI: 10.1016/S0164-1212(01)00119-4
期刊: JOURNAL OF SYSTEMS AND SOFTWARE
Volume: 63
Issue: 2
起始頁: 113
結束頁: 127
Appears in Collections:Articles


Files in This Item:

  1. 000178736400004.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.