國立陽明交通大學機構典藏：中文斷詞器之改進

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	江振宇	en_US
dc.contributor.author	Chen Yu Chiang	en_US
dc.contributor.author	陳信宏	en_US
dc.contributor.author	Sin-Horng Chen	en_US
dc.date.accessioned	2014-12-12T01:43:21Z	-
dc.date.available	2014-12-12T01:43:21Z	-
dc.date.issued	2003	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT009113504	en_US
dc.identifier.uri	http://hdl.handle.net/11536/45868	-
dc.description.abstract	在本論文中，我們設計了中文斷詞器的基本架構，並實現了此中文斷詞器，以模組化的設計方法，使得整個斷詞器的架構更加系統化，可以成為一個語音合成系統的軟體開發元件，改善了先前中文斷詞器的架構問題。整個最核心的斷詞單元，採用規則法斷詞，並使用詞典樹增加詞典比對速度。構詞單元，我們採用中研院提供的構詞規則以及自行整理出之規則應用，並使構詞單元之程式處理效率最佳化。對於特殊符號的語音讀法，我們設計了文字正規化單元，解決特殊符號的讀法問題。為了瞭解斷詞器之性能，我們以〈中研院平衡語料庫3.0版〉做為測試語料，測試結果顯示斷詞的召回率達到0.78，精確率達到0.87，而詞類標記的精確率可以達到0.96。最後我們分析本斷詞器之斷詞結果，探討斷詞錯誤之可能更正方法。	zh_TW
dc.description.abstract	In this thesis, a Chinese word tagger for text-to-speech (TTS) is implemented. It contains four basic modules. They are word identification module, word combination module, POS (part of speech) tagging module, and text normalization module. In word identification module, we adopt a word matching algorithm with 6 heuristic rules proposed by the Chinese Knowledge Information Processing group (CKIP), Academia Sinica, to identify words from input Chinese character string. The word combination module groups words into compounds using 95 determinative-measure (DM) compound rules and 10 reduplication rules. The POS tagging module gives POS tags to words identified by the word identification module. To transform from written form to spoken form, we design the text normalization module. Lastly, the Sinica Corpus published by CKIP is used to evaluate the performance of our system. We achieve a recall rate of 0.78, a precision rate of 0.87 in word identification, and a precision rate of 0.96 in POS tagging. We also analyze word identification results to give advices in future works.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	中文斷詞器	zh_TW
dc.subject	語音合成	zh_TW
dc.subject	斷詞單元	zh_TW
dc.subject	構詞單元	zh_TW
dc.subject	詞類標記	zh_TW
dc.subject	文字正規化	zh_TW
dc.subject	Chinese word tagger	en_US
dc.subject	Text-to-Speech	en_US
dc.subject	Word identification	en_US
dc.subject	Compound words	en_US
dc.subject	POS tagging	en_US
dc.subject	Text normalization	en_US
dc.title	中文斷詞器之改進	zh_TW
dc.title	An Improvement on Chinese Parser	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
顯示於類別：	畢業論文

文件中的檔案：

350401.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。