標題: 中文字轉音系統之文句分析的進一步研究
A Further Study on Text Analysis for Mandarin TTS
作者: 傅明榮
Ming-Zong Fu
Yih-Ru Wang
關鍵字: 文句分析;停頓標記;Text Analysis;Break type
公開日期: 2006
摘要: 在本論文中,我們建立詞綴構詞單元模組至中文斷詞器內,以改善某些衍生詞無法窮舉於詞典的問題,使中文斷詞器的架構更加完善,並將中文斷詞器製成一便於使用的視窗工具。整個最核心的構詞單元是採用中研院中文分詞規範所提供的“詞綴,接頭╱接尾詞”列表,經由統計整理,並以詞類作為規則法構詞的依據。另外,從詞綴著手,並加上介詞與連結詞,觀察三者對於口語語音停頓類型的特殊現象,從中挑選特別字詞,提供除詞類、詞長等參數外,作為未來從文字預估停頓的研究上另一新參數。本文也針對中文語音合成系統當中,破音字的問題作前處理,提供正確的語料可供未來研究使用。為了評量詞綴構詞單元之效能,我們以《中研院平衡語料庫3.0版》作為測試語料,測試結果顯示構詞正確率達八成左右。最後我們分析各個構詞規則的錯誤率,探討構詞錯誤的更正方法。
In this thesis, the further research about text on Mandarin Text-to-Speech(TTS) System. First, we hand on the multiphone characters and affix characters which proposed by Mandarin Promotion Committee, the Ministry of Education and Chinese Knowledge Information Processing group(CKIP), Academia Sinica. We design the new word combination module after the word identification to dispose of unknown word by using 74 rules. And, we observed some special words that can affect prosodic pause. This observation may improve on predicting break indices form Chinese text. At last, the Sinica Corpus published by CKIP is used to evaluate the performance of the new combination module. We achieve a precision rate of 0.829 in word combination. We also analyze word combination results to give advices in furure word


