Full metadata record
DC FieldValueLanguage
dc.contributor.author潘善均en_US
dc.contributor.authorShan-Chun Panen_US
dc.contributor.author梁婷en_US
dc.contributor.authorTyne Liangen_US
dc.date.accessioned2014-12-12T03:09:43Z-
dc.date.available2014-12-12T03:09:43Z-
dc.date.issued2006en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009455511en_US
dc.identifier.urihttp://hdl.handle.net/11536/82039-
dc.description.abstract主題詞辨識是文本理解中一項不可缺乏的工作,它可以釐清文本的核心敘述,進而應用在文章的主題偵測與作文的評分上。本論文首先以重心理論為基礎的方式取得小句重心,再以小句重心作為候選詞,依照主題的各個特徵辨識長句的主題詞,此法並不需任何的訓練語料。最後我們將長句主題詞運用至學生作文的離題偵測上,將小句重心運用至連貫性評量上。我們使用11篇平均字數為1500字的報紙社論文章進行主題詞辨識的驗證,針對包含主題各種特徵的實驗模組加以測試,社論文章的主題詞辨識可達86.84%的正確率,召回率為68.51%。我們另外蒐集95篇400字的學生作文進行主題詞辨識、離題偵測、以及連貫性評量的實驗,學生作文的主題詞辨識可達80.86%的正確率,召回率為71.36%。在離題偵測上,離題文章判別的正確率可達到63.36%,召回率為77.77%。本論文嘗試以長句主題詞來作離題偵測,雖可解決以文章全部詞彙來偵測離題的困難,但尚存有無法解決的問題,例如系統無法辨別學生認知概念上的離題,或者引用新穎的例證而造成系統誤判為離題。zh_TW
dc.description.abstractTopic recognition is an essential part of document understanding and can help people to quickly understand the core description of the document. It can be applied in topic detection and essay scoring. In this paper, we developed an algorithm to extract the topic from a Chinese sentence. First, we used Centering Theory-based algorithm to center each clauses. Second, we took those centers as candidates and extracted their features to generate a topic in a Chinese sentence. Then, we used those sentence topics to detect off-topic essays, and evaluated essay coherence by clause centers. We collected 11 news editorial articles, each of which contains around 1500 words, as our topic recognition corpus. We also collected another 95 400-words essays written by students to generate sentence topics, detected off-topic essays, and evaluated essay coherence. In our experiment, the precision and recall of topic recognition in editorial articles achieve 86.84% and 68.51%. In students’ essays, the precision and recall of topic recognition are 80.86% and 71.36%. In off-topic detection experiment, we can achieve 63.36% precision and 77.77% recall. Our method overcame some problems in using bag-of-words to detect off-topic essays, but still remained some difficulties that can not be solved. We can not detect the misunderstanding of students’ thought, and we also wrongly detected novel ideas given in students’ essay as an off-topic sentence.en_US
dc.language.isozh_TWen_US
dc.subject中文主題辨識zh_TW
dc.subject主題特徵zh_TW
dc.subject離題偵測zh_TW
dc.subject連貫性評量zh_TW
dc.subjectChinese topic recognitionen_US
dc.subjecttopic feature analysisen_US
dc.subjectoff-topic detectionen_US
dc.subjectcoherence evaluationen_US
dc.title中文主題詞辨識與其應用zh_TW
dc.titleTopic Recognition and Its Application to Chinese Textsen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
Appears in Collections:Thesis


Files in This Item:

  1. 551101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.