標題: 基於MMG圖形模型與隨機漫步之多重型態學術搜尋
Exploring Mixed Media Graph and Random Walk with Restart for Academic Search
作者: 劉浚頡
Liou, Jiun-Jiue
彭文志
Peng, Wen-Chih
資訊科學與工程研究所
關鍵字: 複合型態資訊網路;隨機漫步;heterogeneous information network;random walk with restart
公開日期: 2010
摘要: 現今有眾多的學術資料庫,網路上也有許多學術搜尋引擎,例如Google Scholar, CiteSeerX。大多數的學術搜尋引擎都是採用關鍵字比對的技術,挑選出的文件通常是文件標題或者摘要要與使用者所輸入的查詢關鍵字完全相同才會被挑選出來。因此,在這篇論文中,我們使用基於Mixed Media Graph的圖形模型以及Random Walk with Restart的演算法,來解決關鍵字搜尋的問題。除此之外,我們的方法可以提供使用者查詢不同類型的項目。使用者可以輸入關鍵字、作者名稱、期刊名稱,我們的系統可以回傳給使用者相關的論文、關鍵字、作者名稱、期刊名稱。在這篇論文中,我們提出兩種方法:Global-MMG以及Net-MMG。Global-MMG是直接在整個學術網路中套用Random Walk with Restart演算法來找出與使用者的輸入相關的項目。為了改善Global-MMG執行所花費的時間,我們又提出Net-MMG,使得Random Walk with Restart只需要在整個學術網路裡面的一些子圖上運作即可,也節省了大量的時間。在我們的實驗裡面,Global-MMG與Net-MMG皆提供了良好的查詢品質。此外,Net-MMG改善了查詢所花費的時間,也同時保持了一定的查詢品質。
With a huge amount of bibliographic datasets, existing on-line academic search services are now widely available. Most of on-line academic search retrieve those papers that have their terms in the titles or abstracts matched query terms. As such, the drawback of keyword-matching problem exists in the query results. In this paper, we explore Mixed Media Graph (abbreviated as MMG) in which each vertex represents one entity and edges reflect linkage relationships. Note that vertexes in MMG may represent different entity types, such as papers, authors and terms. Thus, MMG fully reflects linkage relationships among different entities. Note that prior works have demonstrated that by using similarity search via cross-entity and identical-entity relationships, MMG is able to retrieve more relevant entities. Furthermore, our proposed academic search could provide a variety of query results, such as relevant papers, relevant authors and relevant conferences, via one-time query. Once MMG is used, when a user submits a query, we explore Random Walk with Restart (abbreviated as RWR) to retrieve and determine ranking scores of relevant entities. Explicitly, given a whole bibliographic dataset, we propose Global-MMG in which a global MMG graph is built for RWR. To reduce the query response time, we further develop Net-MMG (standing for NetClus based MMG) which performs RWR in topic-based sub-graphs derived by prior work NetClus). We implement our academic search and conduct extensive experiments on ACM Digital Library to evaluate our proposed Global-MMG and Net-MMG. Experimental results show that by exploring MMG and RWR, both Global-MMG and Net-MMG are able to have good precision and accuracy. In addition, Net-MMG has short query response time while still guaranteeing good quality of query results.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079755516
http://hdl.handle.net/11536/45862
顯示於類別:畢業論文