非結構化文件中語意知識擷取方法之設計與研究

標題:	非結構化文件中語意知識擷取方法之設計與研究 Design and Study of Semantic Discovery Methods for Extracting Knowledge from Free Text Information
作者:	蒙以亨 I-Heng Meng 楊維邦 Wei-Pang Yang 資訊科學與工程研究所
關鍵字:	詢答系統;文字探勘;語句擷取;商務代理人;本體知識架構;文件關聯;知識管理;question answering system;text mining;passage retrieval;business agent;ontology;document association;knowledge management
公開日期:	2002
摘要:	傳統資訊擷取技術能在大量資料中依據使用者所下的查詢條件有效地找出特定的語句，但是透過關鍵字查詢所取得的大量檔案連結，使用者必須逐一點選瀏覽連結內容，以確認是否該內容符合使用者之查詢條件，同時在檔案中找出所要查詢的特定語句。這樣的問題主要因為關鍵字查詢無法讓系統清楚了解使用者查詢檔案的真正意圖，為了解決這個問題，需要發展詢答系統接收並處理使用者以自然語言提問的問題，以正確了解使用者問句的真正問意，問句經過擴充轉換成為具備語意空間的查詢條件，之後參考語意關聯找出最符合問句問意的語句。本研究提出一種以知網和自動斷詞為基礎的中文自然語言詢答架構，賦予系統中文語意處理的能力。本研究從中時電子報新聞網站收集了一千篇時事新聞作為實驗資料，引用詢答系統所使用的國際評估標準”平均回報排名”作為系統準確度實驗之依據，結果在不同新聞類別的”平均回報排名”平均值上，實驗數據可達到0.84的高準確率。網際網路在電子商務的線上零售交易扮演愈來愈重要的管道與角色，目前在網路上大約有將近五十億個交易內容網站，因此需要一個自動化的整合功能協調者，稱之為商務代理人，為供應端與需求端自動進行需求撮合服務，這樣的智慧型商務代理人在交易平台上能夠提供供需雙方有效率的資訊處理與分享機制。本研究提出一種全新的商務代理人架構，其中包含五個模組，分別為商務資訊蒐集、供需需求分析、供需需求內容分類、撮合與協商機制、以及隱藏商機探勘，此架構的交易流程包括四個主要步驟，分別為供需確認、產品仲介、商品仲介、與協商四個步驟，此架構採用本體知識架構作為內容分類的機制以提供更有效率的協商支援機制，此架構之設計可視為語意知識擷取的實際商務應用。在大量文件中如何自動找出文件之間的關聯，本研究結合文件內容分析與使用者行為分析兩種模式，文件內容分析採用N-gram和語意索引進行文件知識擷取，使用者行為部分包含外顯模型與隱含模型，結合文件內容分析與使用者行為分析兩種模式進行文件關聯分析與排序，用以找出與查詢文件最相關的文件檔案。本研究的另一個研究主題在於應用資料探勘技術於知識管理，以改善知識分享的效能，這些研究主題在本論文中都將詳細說明。 Traditional information retrieval has been proven effective for identifying specific passages within large volumes of data in response to a user query. However, users must click and browse numerous documents returned by keyword search to identify their desired word segments. The root problem is that keyword search is not an ideal method for users to express their real intentions for getting suitable documents. To overcome this problem, QA systems seek to process the question statement in natural language manner and find out the implicit intention of the user query. This study proposes a Chinese Question Answering system based on HowNet and Autotag to enable the system with semantic processing capability. The experiment collected 1000 Chinese News items from the ChinaTimes web site (http://www.chinatimes.com) and presented an excellent MRR (Mean Reciprocal Rank) value at 0.84. The Internet functions as an increasingly important channel for retailing commerce and business transactions. Although nearly five billion web sites exist on the World Wide Web, an integrated mediator, called a business agent, is sought to negotiate between suppliers and buyers. Such an intelligent business broker is thus necessary to support efficient and effective sharing of information. This work proposes an entirely new business agent architecture comprised of five elements, namely Business Spy Agent, Supply and Demand Analysis, Supply and Demand Classification, Matching and Negotiation, and Hidden Business Mining. The proposed architecture encompasses four trading processes, including supply and demand identification, product brokering, merchant brokering and negotiation. The architecture introduces the notion of ontology to increase the precision of classification and more the effectiveness of negotiation. In this study, the technology for discovering document associations of a specific document is done by using N-gram and semantic index. The users behavior on documents consists of explicit and tacit model. The analysis of document contents and users behavior on documents both are combined to solve and rank the document associations discovered. The other research topic in this work is the data mining mechanisms applied to knowledge management system result in a better knowledge environment. We will discuss the works more detailed in this dissertation.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT910394102 http://hdl.handle.net/11536/70269
Appears in Collections:	Thesis