完整后设资料纪录
DC 栏位 | 值 | 语言 |
---|---|---|
dc.contributor.author | 王俊程 | en_US |
dc.contributor.author | Chung Cheng Wang | en_US |
dc.contributor.author | 单智君 | en_US |
dc.contributor.author | Jean Jyh-Jiun Shann | en_US |
dc.date.accessioned | 2014-12-12T02:22:51Z | - |
dc.date.available | 2014-12-12T02:22:51Z | - |
dc.date.issued | 1999 | en_US |
dc.identifier.uri | http://140.113.39.130/cdrfb3/record/nctu/#NT880392047 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/65444 | - |
dc.description.abstract | 由于目前网际网路中之资料库搜寻系统越趋庞大,因此普遍采用转置档 (inverted file) 搜寻系统来加速文件的搜寻速度。然而,转置档的大小通常是原本资料量的1至3倍,所以直接搜寻转置档将会导致频繁的磁碟读取运作。因此,设计一个转置档的快取系统,将可以有效减少磁碟读取运作,增加搜寻系统的效能。 在本篇论文中,我们首先讨论适合用在转置档资料结构上的传统索引方法,然后提出一个适用于转置档快取系统的新的索引方法称为具虚链之增强型杂凑函数 (enhanced hashing function with pseudo link)。此方法乃根据传统杂凑函数的观念,并在其处理冲突 (collision)与管理记忆体的方式上作改良,使得新的方法可以减少发生冲突时搜寻索引表的次数并增加转置档快取系统的使用率与随机存取的效能。实验结果显示,转置档快取系统大约有65% 的命中率,与没有转置档快取系统的搜寻引擎比较,使用传统索引方法的转置档快取系统可以使搜寻引擎系统效能增加大约59.2%,使用新的索引方法的转置档快取系统可以使搜寻引擎系统效能增加大约81%。 | zh_TW |
dc.description.abstract | The size of the database of the modern search engine in the Internet becomes larger and larger today. The inverted file is commonly used in the modern search engine in order to improve the performance of search engine. However, the size of the inverted file is very large. If we access the inverted file directly, frequent disk I/O operations will occur and become the bottleneck in an information retrieval system. Therefore, we add a cache inside the search engine, called inverted file cache, to reduce the number of the disk I/O operations. In this thesis, first, we discuss the traditional data structure that can be used for indexing the inverted file. And then we propose a new method, called enhanced hashing function with pseudo link, which is suitable for indexing inverted file cache. Comparing with the traditional hashing function, this new method improves the processes of the collision handling and the memory management to increase the utilization of the inverted file cache and the performance of random accesses in the cache. The simulation results show that the hit rate of the inverted file cache is about 65%. Comparing to the search engine without the inverted file cache, the performance of search engine with the inverted file cache in traditional method will increase about 59.2% and that in new method will increase up to 81%. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | 转置档 | zh_TW |
dc.subject | 搜寻引擎 | zh_TW |
dc.subject | 杂凑函数 | zh_TW |
dc.subject | 冲突 | zh_TW |
dc.subject | 转置档快取 | zh_TW |
dc.subject | inverted file | en_US |
dc.subject | search engine | en_US |
dc.subject | hashing function | en_US |
dc.subject | hash function | en_US |
dc.subject | collision | en_US |
dc.subject | inverted file cache | en_US |
dc.title | 现代网路搜寻引擎中转置档快取系统之设计 | zh_TW |
dc.title | A Cache Mechanism for the Inverted Files in Modern Search Engines | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | 资讯科学与工程研究所 | zh_TW |
显示于类别: | Thesis |