标题: | 语速相依韵律模型之语者调适技术与应用 Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS |
作者: | 廖宜斌 陈信宏 Liao, I-Bin Chen, Sin-Horng 电信工程研究所 |
关键字: | 语者调适;韵律模型;语音合成;最大事后机率法则;Speaker Adaptation;Hierarchical Prosodic Model;Structural Maximum a Posteriori;Speaking Rate-Controlled TTS |
公开日期: | 2016 |
摘要: | 本论文提出一个语者韵律调适方法,以一个汉语语速相依阶层式韵律模型为基础,藉由少量特定语者资料便可以得到此特定语者之语速相依韵律模型。本研究主要解决两个问题:资料稀少及模型参数外插,问题的起因是调适语料不多且只存在一部分的语速范围内。本研究提出树状结构最大事后机率(Structural maximum a posteriori, SMAP)调适,同时考虑调适资料不足及模型参数外差的作法,以解决上述两问题。由多位语者的实验结果显示,调适后产生的新语者的语速相依阶层式韵律模型可以涵盖整个语速范围(0.15-0.3 seconds/syllable)。最后实作完成个人化TTS系统,将此模型产生新语者的任何语速的语音,藉由主客观评测得到不错的效果。 A structural maximum a posteriori (SMAP) speaker adaptation approach to adjusting the speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) of an existing SR-controlled Mandarin text-to-speech (SC-MTTS) system to a new speaker’s data for producing a new voice is discussed. Two main issues are addressed. One is the small SR coverage of the adaptation data and is solved by using the existing SR-HPM which was trained from a speech corpus of wide SR coverage as an informative prior. Another is the data sparseness problem resulting from the large number of parameters of the SR-HPM to be adjusted. It is solved by hierarchically organizing the SR-HPM parameters into decision-trees so as to be efficiently adjusted by the SMAP method. The effectiveness of the proposed approach is evaluated on speech databases of five new speakers. Both objective and subjective evaluations show that the proposed method not only performs better than the maximum likelihood-based method in the observed SR range of the target speaker’s data, but also is much better in the unseen SR ranges. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079413813 http://hdl.handle.net/11536/139980 |
显示于类别: | Thesis |