標題: 中文語音停頓韻律標記預估之改進
An Improvement on Prosodic Break Tag Prediction for Mandarin Speech
作者: 羅裕璋
Lo, Yu-Jiang
陳信宏
Chen, Sin-Horng
電信工程研究所
關鍵字: 中文停頓預估;break prediction
公開日期: 2014
摘要: 本論文探討中文語音合成器中的停頓韻律標記預估,在文字轉語音系統中我們只能利用文字分析器得到的語言參數來預估停頓韻律標記,在過去的研究中我們主要使用詞層次(如前後詞的詞類)以及簡單語句層次(如和前後標點符號的距離)的語言參數來預估音節間停頓標記,本研究嘗試加入複合詞以及兩種特殊片語(“的”片語及連接詞片語)結構,來協助改進停頓標記的預測。由376篇正常語速的短文語料的實驗結果證實了加入詞組及片語結構的語言參數,對於停頓韻律標記的預估的確會有幫助,對7種停頓標記的預估正確率由70.34%上升至74.05%,主要的改進在於韻律詞邊界的基頻跳升停頓B2-1及短時間停頓B2-2,以及韻律片語邊界的中等時間停頓B3,因此對於後續的TTS合成語音將會有較好停頓節奏效果。
In this research, we investigate prediction of prosodic break tag of Mandarin Chinese speech synthesizer. We only can access linguistic features with parser to predict prosodic break tag. Previous researches mainly used linguistic features of word level (example: POS) and sentence level (example: distance between itself and punctuations) to predict break tag of inter-syllable juncture. However, in this research we would like to add compound words and two types of special phrase (de phrase and Conjunctions phrase), in order to assist in improving the prediction of break tag. According to the experimental results on a speech corpus containing 376 utterances of normal speaking rate, we authenticate that linguistic features which add compound word and phrase are really effective to prediction of prosodic break tag. Thus, there are 7 kinds of predicting correct rate of break tag rise to 74.05% from 70.34%. The mainly improvement is the break of pitch reset of prosody word boundary (B2-1), the break of short pause (B2-2) and the break of prosody phrase boundary (B3). Therefore, above improvement brings more effective break rhythm to continuous TTS.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070160270
http://hdl.handle.net/11536/75584
顯示於類別:畢業論文