標題: ㄧ個個人化的中文詢答系統 - 以科技新聞網站為例
A Personalized Chinese Q&A System – Using Technology News Web Site as A Case Study
作者: 鍾震寰
Cheng Huan Chung
羅 濟 群
Chi-Chun Lo
資訊管理研究所
關鍵字: 詢答;詞庫;斷詞;詞語相似度;個人化;自然語言;Question and Answering;dictionary;analysis vocabulary;vocabulary similar degree;individualizes;the natural language
公開日期: 2003
摘要: 網際網路的普及,網站上的資料已經儼然形成全世界資源最豐富的資料庫,裡面的內容包羅萬象、無奇不有,網路使用人口也急遽增加。要如何在這麼龐大的資料庫中找到自己真正想要的資訊,就是一個相當值得研究的主題。詢答系統可以傳回一個使用者想要的答案,而不需要讓使用者再檢視每一篇文件。目前常見到的做法有利用關鍵詞比對,或是利用語意擴充的方法檢視問句與文件的相似程度,找出適合的答案。 本論文針對每一個人知識領域的不同,在詢答的過程中,比對使用者問題中的目標角色與使用者平時所閱讀的文件,找出更可能是使用者需要的正確答案。本論文以聯合新聞網科技新聞網頁為實驗對象,提出一個結合了模糊邏輯的詢答系統,然後進行實做系統以及模擬,在模擬的過程中,召回率隨著字數的增加可以達到68%,而答題的正確率可達80%,MRR評估也可以達到0.612,整體而言在本論文所設定的規則之內所找到的答案都可以令人滿意。
The popularization of the internet network, the materials on websites have already formed the most abundant database of whole world resource solemnly, the content inside is all-embracing, nothing is strange, the network uses population to increase rapidly too. Information really wanted that how want to find oneself in such a huge database, a theme that is quite worth studying. Can pass the answer that a user want back while asking the system of answering, and does not need to let users inspect every file again. Method that see often at present is it utilize keyword than right to have, or the method of utilizing purpose of languages to expand inspects and asks the degree of resemblance between one and file, find out the suitable answer. This thesis is directed against the difference of fields of everyone's knowledge, during the process of asking and answering, file more read than at ordinary times to the goal role and user in user's question, it may be the correct answer that the user need to find out even more. This thesis regards technology news webpage from UDN.com as experimental subjects, propose that one accords with fuzzy logic asking the system of answering, then do the system and experiment in fact, in the course of experiment, the recalling rate is as the increase of the number of words can be up to 68%, and the correct rate of the question sheet can be up to 80%, MRR can be up to 0.612 too to assess, answers found can all be satisfactory within this rule established of thesis basically.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009134504
http://hdl.handle.net/11536/58013
顯示於類別:畢業論文