標題: | Load and storage balanced posting file partitioning for parallel information retrieval |
作者: | Ma, Yung-Cheng Chung, Chung-Ping Chen, Tien-Fu 資訊工程學系 Department of Computer Science |
關鍵字: | Load balancing;Storage balancing;Parallel information retrieval;Inverted file |
公開日期: | 1-五月-2011 |
摘要: | Many recent major search engines on Internet use a large-scale cluster to store a large database and cope with high query arrival rate. To design a large scale parallel information retrieval system, both performance and storage cost has to be taken into integrated consideration. Moreover, a quantitative method to design the cluster in systematical way is required. This paper proposes posting file partitioning algorithm for these requirements. The partitioning follows the partition-by-document-ID principle to eliminate communication overhead. The kernel of the partitioning is a data allocation algorithm to allocate variable-sized data items for both load and storage balancing. The data allocation algorithm is proven to satisfy a load balancing constraint with asymptotical 1-optimal storage cost. A probability model is established such that query processing throughput can be calculated from keyword popularities and data allocation result. With these results, we show a quantitative method to design a cluster systematically. This research provides a systematical approach to large-scale information retrieval system design. This approach has the following features: (1) the differences to ideal load balancing and storage balancing are negligible in real-world application. (2) Both load balancing and storage balancing can be taken into integrated consideration without conflicting. (3) The data allocation algorithm is capable to deal with data items of variable-sizes and variable loads. An algorithm having all these features together is never achieved before and is the key factor for achieving load and storage balanced workstation cluster in a real-world environment. (C) 2011 Elsevier Inc. All rights reserved. |
URI: | http://dx.doi.org/10.1016/j.jss.2011.01.028 http://hdl.handle.net/11536/8971 |
ISSN: | 0164-1212 |
DOI: | 10.1016/j.jss.2011.01.028 |
期刊: | JOURNAL OF SYSTEMS AND SOFTWARE |
Volume: | 84 |
Issue: | 5 |
起始頁: | 864 |
結束頁: | 884 |
顯示於類別: | 期刊論文 |