標題: 語速相依韻律模型之語者調適技術與應用
Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS
作者: 廖宜斌
陳信宏
Liao, I-Bin
Chen, Sin-Horng
電信工程研究所
關鍵字: 語者調適;韻律模型;語音合成;最大事後機率法則;Speaker Adaptation;Hierarchical Prosodic Model;Structural Maximum a Posteriori;Speaking Rate-Controlled TTS
公開日期: 2016
摘要: 本論文提出一個語者韻律調適方法,以一個漢語語速相依階層式韻律模型為基礎,藉由少量特定語者資料便可以得到此特定語者之語速相依韻律模型。本研究主要解決兩個問題:資料稀少及模型參數外插,問題的起因是調適語料不多且只存在一部分的語速範圍內。本研究提出樹狀結構最大事後機率(Structural maximum a posteriori, SMAP)調適,同時考慮調適資料不足及模型參數外差的作法,以解決上述兩問題。由多位語者的實驗結果顯示,調適後產生的新語者的語速相依階層式韻律模型可以涵蓋整個語速範圍(0.15-0.3 seconds/syllable)。最後實作完成個人化TTS系統,將此模型產生新語者的任何語速的語音,藉由主客觀評測得到不錯的效果。
A structural maximum a posteriori (SMAP) speaker adaptation approach to adjusting the speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) of an existing SR-controlled Mandarin text-to-speech (SC-MTTS) system to a new speaker’s data for producing a new voice is discussed. Two main issues are addressed. One is the small SR coverage of the adaptation data and is solved by using the existing SR-HPM which was trained from a speech corpus of wide SR coverage as an informative prior. Another is the data sparseness problem resulting from the large number of parameters of the SR-HPM to be adjusted. It is solved by hierarchically organizing the SR-HPM parameters into decision-trees so as to be efficiently adjusted by the SMAP method. The effectiveness of the proposed approach is evaluated on speech databases of five new speakers. Both objective and subjective evaluations show that the proposed method not only performs better than the maximum likelihood-based method in the observed SR range of the target speaker’s data, but also is much better in the unseen SR ranges.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079413813
http://hdl.handle.net/11536/139980
顯示於類別:畢業論文