完整後設資料紀錄
DC 欄位語言
dc.contributor.author江振宇en_US
dc.contributor.authorChen Yu Chiangen_US
dc.contributor.author陳信宏en_US
dc.contributor.authorSin-Horng Chenen_US
dc.date.accessioned2014-12-12T01:43:21Z-
dc.date.available2014-12-12T01:43:21Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009113504en_US
dc.identifier.urihttp://hdl.handle.net/11536/45868-
dc.description.abstract在本論文中,我們設計了中文斷詞器的基本架構,並實現了此中文斷詞器,以模組化的設計方法,使得整個斷詞器的架構更加系統化,可以成為一個語音合成系統的軟體開發元件,改善了先前中文斷詞器的架構問題。整個最核心的斷詞單元,採用規則法斷詞,並使用詞典樹增加詞典比對速度。構詞單元,我們採用中研院提供的構詞規則以及自行整理出之規則應用,並使構詞單元之程式處理效率最佳化。對於特殊符號的語音讀法,我們設計了文字正規化單元,解決特殊符號的讀法問題。為了瞭解斷詞器之性能,我們以〈中研院平衡語料庫3.0版〉做為測試語料,測試結果顯示斷詞的召回率達到0.78,精確率達到0.87,而詞類標記的精確率可以達到0.96。最後我們分析本斷詞器之斷詞結果,探討斷詞錯誤之可能更正方法。zh_TW
dc.description.abstractIn this thesis, a Chinese word tagger for text-to-speech (TTS) is implemented. It contains four basic modules. They are word identification module, word combination module, POS (part of speech) tagging module, and text normalization module. In word identification module, we adopt a word matching algorithm with 6 heuristic rules proposed by the Chinese Knowledge Information Processing group (CKIP), Academia Sinica, to identify words from input Chinese character string. The word combination module groups words into compounds using 95 determinative-measure (DM) compound rules and 10 reduplication rules. The POS tagging module gives POS tags to words identified by the word identification module. To transform from written form to spoken form, we design the text normalization module. Lastly, the Sinica Corpus published by CKIP is used to evaluate the performance of our system. We achieve a recall rate of 0.78, a precision rate of 0.87 in word identification, and a precision rate of 0.96 in POS tagging. We also analyze word identification results to give advices in future works.en_US
dc.language.isozh_TWen_US
dc.subject中文斷詞器zh_TW
dc.subject語音合成zh_TW
dc.subject斷詞單元zh_TW
dc.subject構詞單元zh_TW
dc.subject詞類標記zh_TW
dc.subject文字正規化zh_TW
dc.subjectChinese word taggeren_US
dc.subjectText-to-Speechen_US
dc.subjectWord identificationen_US
dc.subjectCompound wordsen_US
dc.subjectPOS taggingen_US
dc.subjectText normalizationen_US
dc.title中文斷詞器之改進zh_TW
dc.titleAn Improvement on Chinese Parseren_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 350401.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。