标题: 资料撷取系统中使用部分媒合之转置档快取技术
An Inverted File Cache with Partial Matching in Information Retrieval Systems
作者: 萧雅心
Ya-Hsin Hsiao
钟崇斌
Dr. Chung-Ping Chung
资讯科学与工程研究所
关键字: 转置档快取系统;部分媒合机制;Inverted file cache;Partial match mechanism
公开日期: 2000
摘要: 随着网际网路的盛行,搜寻引擎所要处理的资料库越趋庞大。大部分的资料撷取系统是使用转置档 (inverted file) 来加速搜寻文件的速度。当使用者送出检索命令时,伺服器会查询储存在硬碟中的转置档来找出检索结果,但是硬碟的存取却十分的浪费时间。因此我们设计一个有部分媒合机制的快取系统来减少回应使用者检索的时间。
在本篇论文中,为了能够更有效的利用使用者检索命令之间的重复性,我们在转置档快取系统中提出一个部分媒合的机制。再者,伺服器所提供出的资料,真正会被使用者所存取不到 20%。如果我们能够在快取系统中储存常常被检索的指令及它的结果,经由此部分媒合的机制组合新的检索命令以求得部分结果,即使无法找到完全相同的检索指令,也不一定要利用硬碟的存取来找到答案,如此便可以缩短回应使用者检索的时间。我们同时修改转置档快取系统的架构以符合现代使用者检索命令的特性。实验的结果显示,利用我们的设计转置档快取系统的命中率可提高约32%,而回应第一页检索指令的结果花费的时间可缩短约52%。
As the World Wide Web (WWW) becomes more and more popular, the database size in a modern search engine grows larger and larger. The server often uses an inverted file to index such a large database. When a query is requested, the server needs to look up the inverted file in the disk to return the answers. And a disk access is indeed very time consuming.
In this thesis, we propose a partial match mechanism in the inverted file cache to efficiently exploit more locality of the user queries. Less than 20% of the searched data will be really referenced by the users. If we store the frequently queries and their result lists in the cache, and compose the answers to the new coming queries whenever possible, then the response time can be reduced. We also modify the architecture of the inverted file cache to fit the modern user query behavior. At last, the experiment results show that the hit rate improves about 32% while the average first page response time of the user queries reduces 52%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT890392056
http://hdl.handle.net/11536/66845
显示于类别:Thesis