完整后设资料纪录
DC 栏位语言
dc.contributor.author王俊程en_US
dc.contributor.authorChung Cheng Wangen_US
dc.contributor.author单智君en_US
dc.contributor.authorJean Jyh-Jiun Shannen_US
dc.date.accessioned2014-12-12T02:22:51Z-
dc.date.available2014-12-12T02:22:51Z-
dc.date.issued1999en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT880392047en_US
dc.identifier.urihttp://hdl.handle.net/11536/65444-
dc.description.abstract由于目前网际网路中之资料库搜寻系统越趋庞大,因此普遍采用转置档 (inverted file) 搜寻系统来加速文件的搜寻速度。然而,转置档的大小通常是原本资料量的1至3倍,所以直接搜寻转置档将会导致频繁的磁碟读取运作。因此,设计一个转置档的快取系统,将可以有效减少磁碟读取运作,增加搜寻系统的效能。
在本篇论文中,我们首先讨论适合用在转置档资料结构上的传统索引方法,然后提出一个适用于转置档快取系统的新的索引方法称为具虚链之增强型杂凑函数 (enhanced hashing function with pseudo link)。此方法乃根据传统杂凑函数的观念,并在其处理冲突 (collision)与管理记忆体的方式上作改良,使得新的方法可以减少发生冲突时搜寻索引表的次数并增加转置档快取系统的使用率与随机存取的效能。实验结果显示,转置档快取系统大约有65% 的命中率,与没有转置档快取系统的搜寻引擎比较,使用传统索引方法的转置档快取系统可以使搜寻引擎系统效能增加大约59.2%,使用新的索引方法的转置档快取系统可以使搜寻引擎系统效能增加大约81%。
zh_TW
dc.description.abstractThe size of the database of the modern search engine in the Internet becomes larger and larger today. The inverted file is commonly used in the modern search engine in order to improve the performance of search engine. However, the size of the inverted file is very large. If we access the inverted file directly, frequent disk I/O operations will occur and become the bottleneck in an information retrieval system. Therefore, we add a cache inside the search engine, called inverted file cache, to reduce the number of the disk I/O operations.
In this thesis, first, we discuss the traditional data structure that can be used for indexing the inverted file. And then we propose a new method, called enhanced hashing function with pseudo link, which is suitable for indexing inverted file cache. Comparing with the traditional hashing function, this new method improves the processes of the collision handling and the memory management to increase the utilization of the inverted file cache and the performance of random accesses in the cache. The simulation results show that the hit rate of the inverted file cache is about 65%. Comparing to the search engine without the inverted file cache, the performance of search engine with the inverted file cache in traditional method will increase about 59.2% and that in new method will increase up to 81%.
en_US
dc.language.isoen_USen_US
dc.subject转置档zh_TW
dc.subject搜寻引擎zh_TW
dc.subject杂凑函数zh_TW
dc.subject冲突zh_TW
dc.subject转置档快取zh_TW
dc.subjectinverted fileen_US
dc.subjectsearch engineen_US
dc.subjecthashing functionen_US
dc.subjecthash functionen_US
dc.subjectcollisionen_US
dc.subjectinverted file cacheen_US
dc.title现代网路搜寻引擎中转置档快取系统之设计zh_TW
dc.titleA Cache Mechanism for the Inverted Files in Modern Search Enginesen_US
dc.typeThesisen_US
dc.contributor.department资讯科学与工程研究所zh_TW
显示于类别:Thesis