標題: Load and storage balanced posting file partitioning for parallel information retrieval
作者: Ma, Yung-Cheng
Chung, Chung-Ping
Chen, Tien-Fu
資訊工程學系
Department of Computer Science
關鍵字: Load balancing;Storage balancing;Parallel information retrieval;Inverted file
公開日期: 1-May-2011
摘要: Many recent major search engines on Internet use a large-scale cluster to store a large database and cope with high query arrival rate. To design a large scale parallel information retrieval system, both performance and storage cost has to be taken into integrated consideration. Moreover, a quantitative method to design the cluster in systematical way is required. This paper proposes posting file partitioning algorithm for these requirements. The partitioning follows the partition-by-document-ID principle to eliminate communication overhead. The kernel of the partitioning is a data allocation algorithm to allocate variable-sized data items for both load and storage balancing. The data allocation algorithm is proven to satisfy a load balancing constraint with asymptotical 1-optimal storage cost. A probability model is established such that query processing throughput can be calculated from keyword popularities and data allocation result. With these results, we show a quantitative method to design a cluster systematically. This research provides a systematical approach to large-scale information retrieval system design. This approach has the following features: (1) the differences to ideal load balancing and storage balancing are negligible in real-world application. (2) Both load balancing and storage balancing can be taken into integrated consideration without conflicting. (3) The data allocation algorithm is capable to deal with data items of variable-sizes and variable loads. An algorithm having all these features together is never achieved before and is the key factor for achieving load and storage balanced workstation cluster in a real-world environment. (C) 2011 Elsevier Inc. All rights reserved.
URI: http://dx.doi.org/10.1016/j.jss.2011.01.028
http://hdl.handle.net/11536/8971
ISSN: 0164-1212
DOI: 10.1016/j.jss.2011.01.028
期刊: JOURNAL OF SYSTEMS AND SOFTWARE
Volume: 84
Issue: 5
起始頁: 864
結束頁: 884
Appears in Collections:Articles


Files in This Item:

  1. 000289179300013.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.