標題: WebPuzzle: 網際網路中的真實物件搜尋
WebPuzzle: Object Discovery in World Wide Web
作者: 蔡尚樺
Shang-Hua Tsai
彭文志
Wen-Chih Peng
資訊科學與工程研究所
關鍵字: 物件搜尋;資料擷取;entity search;information retrieval
公開日期: 2007
摘要: 物件搜尋技術已被證明在網際網路中,不直接檢視每一個頁面,而是直接搜尋如電話號碼與電子郵件地址等實際物件之功能十分有用,該技術透過將網路中的文件模型轉換為物件模型使得直接搜尋物件變的更加直覺。然而,由於過去的物件搜尋系統中的限制,對於搜尋多種物件之要求,唯有曾出現在同一頁面之物件群能夠被搜尋。在這篇論文中,我們將物件搜尋的問題建置成一個解謎的模型,本系統將對網際網路之文件進行分析後,創造一個宏觀的觀點並建立每一個物件之間的關係圖,藉由物件之關係圖來搜尋無直接關聯性之物件群,我們使用真實世界之資料作為實驗依據,並經由實驗證明本論文提出之系統在物件搜尋之效率與準確性。
Entity search has been proved to be handy to search data “entities” such as phone number and email without looking them indirectly from individual pages. The technique transforms the web from the document view to the entity view, which enable more holistically search. However, due to the limitation of previous models, the resulting entities are limited in the same pages. In this paper, we model the entity search problem as a puzzle problem. The framework digests all pages and builds a global view on how the entities should be combined. By utilizing the entity graph, the framework is able to composes entities into tuples even they are not directly related. We evalulate our system using real world web pages and show that the system is efficient for searching entity tuples.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009555529
http://hdl.handle.net/11536/39482
Appears in Collections:Thesis


Files in This Item:

  1. 552901.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.