標題: | Posting file partitioning and parallel information retrieval |
作者: | Ma, YC Chen, TF Chung, CP 資訊工程學系 Department of Computer Science |
公開日期: | 15-Aug-2002 |
摘要: | The rapid growth in Internet usages brings new challenges on designing a scalable information retrieval system. To reduce the response time of a query to a large database, we parallelize both CPU computation and disk access of Boolean query processing on a cluster of workstations. The key issue is to partition the inverted file such that, during parallel query processing, each workstation consults only its own locally resident data to complete its task. To achieve this goal, we treat the set of all postings referring to a document ID as an object to be allocated in the develop data placement problem. Following the partitioning by document ID principle, we develop posting file partitioning algorithms to transform a sequential information retrieval system to a parallel information retrieval system. The advantage is that a better speed-up can be achieved by deriving from the fast sequential approach the compressed posting file. The partitioning schemes are designed to balance work-load of workstations in parallel query processing without increasing the average disk access time per posting. The experiment shows that almost linear speed-up can be achieved and the performance bottleneck in previous work, which parallelize only disk access, can be removed. This work shows that, by using parallel processing technique, it is feasible to build a scalable information retrieval system. (C) 2001 Elsevier Science Inc. All rights reserved. |
URI: | http://dx.doi.org/10.1016/S0164-1212(01)00119-4 http://hdl.handle.net/11536/28582 |
ISSN: | 0164-1212 |
DOI: | 10.1016/S0164-1212(01)00119-4 |
期刊: | JOURNAL OF SYSTEMS AND SOFTWARE |
Volume: | 63 |
Issue: | 2 |
起始頁: | 113 |
結束頁: | 127 |
Appears in Collections: | Articles |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.