以使用記錄分析探索網路使用者檢索興趣之研究

Title:	以使用記錄分析探索網路使用者檢索興趣之研究 An Exploration of Web Users' Search Interests Using Log-Based Approaches
Authors:	卜小蝶 Hsiao-Tieh Pu 楊千 Chyan Yang 資訊管理研究所
Keywords:	檢索記錄分析;網路使用者研究;網路資訊檢索;圖書借閱記錄分析;關聯分析;使用者導向分類機制;Query Log Analysis;Network User Studies;Network Information Retrieval;Circulation History Analysis;Association Analysis;User-oriented Classification Scheme
Issue Date:	2001
Abstract:	隨著網際網路的日漸普及，利用搜尋引擎查詢資訊已成為網路上最重要的活動之一，而瞭解這些網路使用者的檢索行為也成為各項研究的重用基礎。使用記錄(Logs)對於瞭解使用者是相當重要的來源，例如檢索記錄(Query Logs)包含了檢索詞彙及檢索過程等資訊，是分析使用者需求的重要線索。因此，本研究主要目的在設計多種以使用記錄為基礎的方法，藉以發展一整合性架構來有效觀察分析網路使用者的檢索興趣，並進一步作為瞭解使用者資訊需求及提昇網路檢索系統之參考。本研究所提出的整體架構主要包括三項範疇，分別為將搜尋引擎中的檢索詞彙記錄以事先製定(Predefined)之主題範疇(Category)進行分類、建構一適合組織檢索詞彙的分類架構、及探索階層式分類架構中主題類別間的關聯等。實驗資料包括在不同時期所收集的三種搜尋引擎超過五百萬筆的檢索記錄。本研究首先提出一整合人工分析及電腦自動的主題分類方法，能有效處理大量檢索詞彙的分類工作，而各項主題範疇則分別代表某一類的檢索興趣；同時，本研究所使用的分類架構，是根據熱門檢索需求．透過一系統化方法所建構；此外，由於所設計的主題分類屬於階層式架構，不同主題範疇或檢索興趣間的關聯(Association)，是透過分析相似使用行為的使用者，以協力式方法求得。研究結果主要分為三方面：第一部分是有關網路使用者的檢索興趣分析及觀觀，分析結果如初步瞭解台灣地區使用者具有檢索詞彙簡短、存在核心詞彙、及檢索專有名詞比例高等，而檢索興趣的分析則包括如即時觀察熱門檢索主題類別的分佈情形、及其在不同時期的變化等。第二部分則是有關建構一適合組織檢索詞彙的分類架構，本研究初步建構了一包含15大類、100小類的分類架構，收集了近二萬個已分類的主題檢索詞彙，並分析各類詞彙的特性如重要的查詢主題、檢索行為模式、及資訊需求類型等。第三部分則嘗試以使用者角度來瞭解階層式分類系統中各主題類別的關聯，初步透過圖書分類系統中相似圖書借閱行為的分析，挖掘出一些重要的非階層性關聯，並探討這些關聯的意義與類型等。實驗結果顯示，透過本研究所提出的架構與方法，可有效且即時地觀察網路使用者檢索興趣的分佈與變化，同時也可以系統化方式建立以檢索興趣為導向的主題分類表，此外，藉由相似使用行為的分析，則可獲得許多非階層式的關聯，讓主題分類的設計能更符合變動的使用者需求，及提供檢索興趣間多重聯結的彈性。有鑑於網路使用行為的研究在國外已受到相當重視，而本研究則是國內首次利用大量檢索詞彙進行台灣地區網路使用者行為的研究，所獲得成果對於瞭解網路使用者資訊需求，與改善網路搜尋系統檢索效益都具有相當的應用價值。除此，也可提供相關領域如傳播、教育、或電子商務等領域進行深入探討。 The Web is a revolution in information access. The searching is by far the most common user activity on the Web, yet many users experience great frustration while searching. In order to fulfill the intent of search, it is crucial to learn more about what users search on the Web. This proposal, therefore, presents an integrated framework of studying Web search interests through using various log-based approaches. The purpose is to develop effective methods to organize and understand search interests in terms of users’ queries on the Web. The framework consists of three main tasks, including subject categorization of query terms from search engines, construction of hierarchical subject taxonomy covering popular search interests, and discovery of associations between search interests in terms of the categories in the taxonomy. Using logs containing over 5 million queries from three search engines in Taiwan, the study proposed feasible and systematic methods to study Web search interests on a larger scale. Such methods contain development of an auto-categorization approach to classifying query terms into predefined taxonomy, design of a systematic approach to constructing a user-oriented subject taxonomy, and use of collaborative methods to discovering associated categories in the hierarchical taxonomy. For current stage of the research, there have been some initial results obtained, such as the frequency distributions of subject categories in response to changes in users’ search interests can be systematically observed in real time, a 2-level subject taxonomy of 15 major and 100 subcategories has been constructed based on grounded analysis of popular queries, and many highly associated categories across different subject hierarchies of the taxonomy have been discovered from analyzing transaction patterns of similar users. Some ongoing topics of research are also described in the proposal, including evaluation of different feature sets for the auto-categorization approach, design of query terms clustering to assist in the construction of the taxonomy, and investigation of association types of the categories obtained in the hierarchical taxonomy. The experimental results show that the framework can serve as a ground research and proves beneficial for related Web studies. Implications for applications are various, mainly in three areas of Web information retrieval research: (1) it is valuable for use in the design of Web information retrieval systems, such as implementing query filters; (2) it is useful for Web content organization, such as collecting domain-specific vocabularies; and (3) it provides an alternative way to understand users' searching behaviors, such as facilitating Web user studies.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT900396031 http://hdl.handle.net/11536/68662
Appears in Collections:	Thesis