Title: 漢語語速相依韻律模型之語者調適及其在語音合成之應用
Speaker Adaptation of Speaking Rate-Dependent Hierarchical Prosodic Model For Mandarin TTS
Authors: 王柏鈞
Wang, Po-Chun
Wang, Yih-Ru
Keywords: 文字轉語音合成;調適;語速韻律模型;外插法;最大事後機率;線性迴歸;TTS;Adaptation;Speaking Rate-Dependent Hierarchical Prosodic Model;Extrapolation;MAP;Linear Regression
Issue Date: 2014
Abstract: 本論文提出一個語者韻律調適方法,來將現有的可調語速漢語文字轉語音系統的語速相依階層式韻律模型調適至新語者的資料,以製做此新語者的合成語音,本研究主要探討兩個問題:資料稀少及模型參數外插,問題的起因是調適語料不多且只存在一部分的語速範圍內。本研究使用類似原先訓練語速相依階層式韻律模型的概念,先使用調適語料訓練出一個新語者的階層式韻律模型,再將此模型修改調整成為語速相依的模型,在其中我們使用了最大事後機率(Maximum a posterior, MAP)調適同時考慮模型參數外差的作法,以解決上述兩問題。由一位男性新語者的實驗結果顯示,調適後產生的語速相依階層式韻律模型可以涵蓋整個語速範圍(0.15-0.3 seconds/syllable),因此可以使用它來產生此新語者的任何語速合成語音的韻律參數。
In this thesis, a speaker adaptation methodto adapt an existing speaking rate-dependent hierarchical prosodic model (SR-HPM) of an SR-controlled Mandarin TTS system to new speaker’s data for realizing a new voice is proposed.Two main problems are solved: data sparseness for adaptation utterances existed only in a small range of normal speaking rate and no adaptation data in both ranges of fast and slow speaking rates. The proposed method follows the idea of SR-HPM training to firstly normalize the prosodic-acoustic features of the new speaker’s speech data, to then train an HPM by the PLM algorithm, and to lastly refine the HPM to a speaking rate-dependent model. The MAP adaptation method with model parameter extrapolation is applied to cope with the above two problems. Experimental results on a male speaker’s adaptation data confirmed that the resulting adaptive SR-HPM has reasonable parameters covering a wide range of speaking rates (0.15-0.3 seconds/syllable) and hence can be used in the TTS system to generate prosodic-acoustic features for synthesizing the new speaker’s voice of any given speaking rate
