標題: 旅遊英語會話相似主題段落發掘之研究
A Study of Discovering Similar Topic Segments in Travel English Conversation
作者: 陳彥廷
YEN-TING CHEN
劉敦仁
資訊管理研究所
關鍵字: 自然語言處理;資訊擷取;Natural Language Processing;Information Retrieval
公開日期: 2006
摘要: 本研究希望能建立一套自動化的方法輔助使用者在英語旅遊會話上的學習。當使用者在閱讀一篇連貫的旅遊英語會話時,系統能將該篇會話切割出許多主題獨立的段落,並針對各個主題段落舉一反三地從現有的語料庫中找出相似主題的會話段落,推薦給使用者,讓使用者學習英語時得收觸類旁通之效。 本研究的研究重心在於會話段落主題相似度之比較,並提出一套以語料庫統計資訊為基礎的字根重要性權重與字根相關性權重的設定方法,以增進段落語意相似度比較之準確率。實驗結果顯示,各項權重設定方法均能有效提昇相似度比對效果,使得本研究所提出之相似段落發掘方法優於傳統以詞彙比較為基礎之相似段落發掘方法。
In this study, we hope to design an automated method to help users learn travel English conversation. When a user reads a continuous travel English conversation, the system will partition it into multiple topic segments. For each segment, the system will discover similar topic segments from the corpus repository and recommend them to the user to help the user learn more about each topic segment. The focus of the research lies in the measure of similarity between topic segments. This study proposes a set of weighting methods about the importance and correlation of word stem based on corpus statistics in order to promote the precision of the similarity measure. The experimental results show that all of the weighting methods will improve the performance of the similarity measure, making our similar segment discovery method outperforms the traditional similar segment discovery method based on lexical matching.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009434505
http://hdl.handle.net/11536/81679
顯示於類別:畢業論文


文件中的檔案:

  1. 450501.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。