完整後設資料紀錄
DC 欄位語言
dc.contributor.author鄭守益en_US
dc.contributor.authorShou-Yi Chengen_US
dc.contributor.author梁婷en_US
dc.contributor.authorTyne Liangen_US
dc.date.accessioned2014-12-12T02:56:38Z-
dc.date.available2014-12-12T02:56:38Z-
dc.date.issued2005en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009323540en_US
dc.identifier.urihttp://hdl.handle.net/11536/79065-
dc.description.abstract語篇分析是文本理解中一項不可缺乏的工作,以釐清文章的論題或邏輯結構。因此,本論文乃以語料為主的方法,針對語篇的表層特徵進行收集及擴展,並制定相關的規則,以及提出一套有效的中文語篇自動標記程序。我們使用中研院平衡語料庫3.0版作為探勘的語料,計有報導、傳記日記、散文、信函、評論、說明手冊等文類,共7265篇。分別針對並列、承接、遞進、選擇、轉折、因果、條件、解證、目的等九種語篇類別,進行線索詞和連續詞性、特殊標點符號等輔助特徵的探勘。在我們的實驗中,使用100篇平均字數為1500字的報紙社論進行效能評估,在句內的標記部份,正確率可達到91%,召回率是95%,篩檢正確率是98%。另外,在句間的標記部分,正確率可達到86%,召回率是93%,篩檢正確率是95%,。 我們相信藉此語篇標記的研究,有助於將其應用在問答系統、作文評分系統、自動摘要和自動投影片產生系統之上。zh_TW
dc.description.abstractDiscourse analysis plays an important role of document understanding and is crucial for clarifying the proposition and logical structure of the document. Therefore, this thesis is aimed to built a automated Chinese discourse tagging system by collecting and expanding the coherence feature of discourse base on corpus study and to design the corresponding rules. We used the written documents from Sinica Balance Corpus 3.0 as our mining corpus. It includes 7265 articles covering news, biographies, essays, letters, commentary and illustration manuals. We mine individually cue term, continuous POS tag and peculiar punctuation marks for nine types of rhetorical relations of Chinese discourse, that includes Coordinate, Continue, Option, Forward, Disjunctive, Cause and Effect, Conditions, Elaboration and Goal. In our experiment, we used 100 news editorial articles, each of which contains around 1500 words(1424~1558), as testing corpus. The precision, recall and filtration precision of intra sentence tagging achieve 91%, 95% and 98%. On the other hand, the precision, recall and filtration precision of inter sentence tagging achieve 86%, 93% and 95%.en_US
dc.language.isozh_TWen_US
dc.subject中文zh_TW
dc.subject連貫關係zh_TW
dc.subject特徵分析zh_TW
dc.subject語篇標記zh_TW
dc.subject詞彙探勘zh_TW
dc.subjectCoherence relationen_US
dc.subjectsurface feature analysisen_US
dc.subjectdiscourse tagen_US
dc.subjectcue term miningen_US
dc.title以語料為基礎的中文語篇連貫關係自動標記zh_TW
dc.titleCorpus-Based Coherence Relation Tagging in Chinese Discourseen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 354001.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。