完整後設資料紀錄
DC 欄位語言
dc.contributor.authorMa, Yung-Chengen_US
dc.contributor.authorChung, Chung-Pingen_US
dc.contributor.authorChen, Tien-Fuen_US
dc.date.accessioned2014-12-08T15:11:42Z-
dc.date.available2014-12-08T15:11:42Z-
dc.date.issued2011-05-01en_US
dc.identifier.issn0164-1212en_US
dc.identifier.urihttp://dx.doi.org/10.1016/j.jss.2011.01.028en_US
dc.identifier.urihttp://hdl.handle.net/11536/8971-
dc.description.abstractMany recent major search engines on Internet use a large-scale cluster to store a large database and cope with high query arrival rate. To design a large scale parallel information retrieval system, both performance and storage cost has to be taken into integrated consideration. Moreover, a quantitative method to design the cluster in systematical way is required. This paper proposes posting file partitioning algorithm for these requirements. The partitioning follows the partition-by-document-ID principle to eliminate communication overhead. The kernel of the partitioning is a data allocation algorithm to allocate variable-sized data items for both load and storage balancing. The data allocation algorithm is proven to satisfy a load balancing constraint with asymptotical 1-optimal storage cost. A probability model is established such that query processing throughput can be calculated from keyword popularities and data allocation result. With these results, we show a quantitative method to design a cluster systematically. This research provides a systematical approach to large-scale information retrieval system design. This approach has the following features: (1) the differences to ideal load balancing and storage balancing are negligible in real-world application. (2) Both load balancing and storage balancing can be taken into integrated consideration without conflicting. (3) The data allocation algorithm is capable to deal with data items of variable-sizes and variable loads. An algorithm having all these features together is never achieved before and is the key factor for achieving load and storage balanced workstation cluster in a real-world environment. (C) 2011 Elsevier Inc. All rights reserved.en_US
dc.language.isoen_USen_US
dc.subjectLoad balancingen_US
dc.subjectStorage balancingen_US
dc.subjectParallel information retrievalen_US
dc.subjectInverted fileen_US
dc.titleLoad and storage balanced posting file partitioning for parallel information retrievalen_US
dc.typeArticleen_US
dc.identifier.doi10.1016/j.jss.2011.01.028en_US
dc.identifier.journalJOURNAL OF SYSTEMS AND SOFTWAREen_US
dc.citation.volume84en_US
dc.citation.issue5en_US
dc.citation.spage864en_US
dc.citation.epage884en_US
dc.contributor.department資訊工程學系zh_TW
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.identifier.wosnumberWOS:000289179300013-
dc.citation.woscount3-
顯示於類別:期刊論文


文件中的檔案:

  1. 000289179300013.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。