Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS

doi:10.1109/TASLP.2016.2598307

Full metadata record

DC Field	Value	Language
dc.contributor.author	Liao, I-Bin	en_US
dc.contributor.author	Chiang, Chen-Yu	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.contributor.author	Chen, Sin-Horng	en_US
dc.date.accessioned	2017-04-21T06:56:12Z	-
dc.date.available	2017-04-21T06:56:12Z	-
dc.date.issued	2016-11	en_US
dc.identifier.issn	2329-9290	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/TASLP.2016.2598307	en_US
dc.identifier.uri	http://hdl.handle.net/11536/134050	-
dc.description.abstract	In this paper, a structural maximum a posteriori (SMAP) speaker adaptation approach to adjusting the speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) of an existing SR-controlled Mandarin text-to-speech system to a new speaker\'s data for producing a new voice is discussed. Two main issues are addressed. One is the small SR coverage of the adaptation data and is solved by using the existing SR-HPM that was trained from a speech corpus of wide SR coverage as an informative prior. Another is the data sparseness problem resulting from the large number of parameters of the SR-HPM to be adjusted. It is solved by hierarchically organizing the SR-HPM parameters into decision trees so as to be efficiently adjusted by the SMAP method. The effectiveness of the proposed approach is evaluated on speech databases of five new speakers. Both objective and subjective evaluations show that the proposed method not only performs better than the maximum likelihood-based method in the observed SR range of the target speaker\'s data, but also is much better in the unseen SR ranges.	en_US
dc.language.iso	en_US	en_US
dc.subject	Hierarchical prosodic model	en_US
dc.subject	speaker adaptation	en_US
dc.subject	structural maximum a posteriori	en_US
dc.subject	speaking rate-controlled text-to-speech	en_US
dc.subject	speaking rate coverage	en_US
dc.title	Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS	en_US
dc.identifier.doi	10.1109/TASLP.2016.2598307	en_US
dc.identifier.journal	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING	en_US
dc.citation.volume	24	en_US
dc.citation.issue	11	en_US
dc.citation.spage	2046	en_US
dc.citation.epage	2058	en_US
dc.contributor.department	交大名義發表	zh_TW
dc.contributor.department	National Chiao Tung University	en_US
dc.identifier.wosnumber	WOS:000382677800014	en_US
Appears in Collections:	Articles