標題: 資料擷取系統中使用部分媒合之轉置檔快取技術
An Inverted File Cache with Partial Matching in Information Retrieval Systems
作者: 蕭雅心
Ya-Hsin Hsiao
鍾崇斌
Dr. Chung-Ping Chung
資訊科學與工程研究所
關鍵字: 轉置檔快取系統;部分媒合機制;Inverted file cache;Partial match mechanism
公開日期: 2000
摘要: 隨著網際網路的盛行,搜尋引擎所要處理的資料庫越趨龐大。大部分的資料擷取系統是使用轉置檔 (inverted file) 來加速搜尋文件的速度。當使用者送出檢索命令時,伺服器會查詢儲存在硬碟中的轉置檔來找出檢索結果,但是硬碟的存取卻十分的浪費時間。因此我們設計一個有部分媒合機制的快取系統來減少回應使用者檢索的時間。 在本篇論文中,為了能夠更有效的利用使用者檢索命令之間的重複性,我們在轉置檔快取系統中提出一個部分媒合的機制。再者,伺服器所提供出的資料,真正會被使用者所存取不到 20%。如果我們能夠在快取系統中儲存常常被檢索的指令及它的結果,經由此部分媒合的機制組合新的檢索命令以求得部分結果,即使無法找到完全相同的檢索指令,也不一定要利用硬碟的存取來找到答案,如此便可以縮短回應使用者檢索的時間。我們同時修改轉置檔快取系統的架構以符合現代使用者檢索命令的特性。實驗的結果顯示,利用我們的設計轉置檔快取系統的命中率可提高約32%,而回應第一頁檢索指令的結果花費的時間可縮短約52%。
As the World Wide Web (WWW) becomes more and more popular, the database size in a modern search engine grows larger and larger. The server often uses an inverted file to index such a large database. When a query is requested, the server needs to look up the inverted file in the disk to return the answers. And a disk access is indeed very time consuming. In this thesis, we propose a partial match mechanism in the inverted file cache to efficiently exploit more locality of the user queries. Less than 20% of the searched data will be really referenced by the users. If we store the frequently queries and their result lists in the cache, and compose the answers to the new coming queries whenever possible, then the response time can be reduced. We also modify the architecture of the inverted file cache to fit the modern user query behavior. At last, the experiment results show that the hit rate improves about 32% while the average first page response time of the user queries reduces 52%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT890392056
http://hdl.handle.net/11536/66845
顯示於類別:畢業論文