Modeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTS

doi:10.1109/TASLP.2014.2321482

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Chen, Sin-Horng	en_US
dc.contributor.author	Hsieh, Chiao-Hua	en_US
dc.contributor.author	Chiang, Chen-Yu	en_US
dc.contributor.author	Hsiao, Hsi-Chun	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.contributor.author	Liao, Yuan-Fu	en_US
dc.contributor.author	Yu, Hsiu-Min	en_US
dc.date.accessioned	2014-12-08T15:36:17Z	-
dc.date.available	2014-12-08T15:36:17Z	-
dc.date.issued	2014-07-01	en_US
dc.identifier.issn	2329-9290	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/TASLP.2014.2321482	en_US
dc.identifier.uri	http://hdl.handle.net/11536/24622	-
dc.description.abstract	A new data-driven approach to building a speaking rate-dependent hierarchical prosodic model (SR-HPM), directly from a large prosody-unlabeled speech database containing utterances of various speaking rates, to describe the influences of speaking rate on Mandarin speech prosody is proposed. It is an extended version of the existing HPM model which contains 12 sub-models to describe various relationships of prosodic-acoustic features of speech signal, linguistic features of the associated text, and prosodic tags representing the prosodic structure of speech. Two main modifications are suggested. One is designing proper normalization functions from the statistics of the whole database to compensate the influences of speaking rate on all prosodic-acoustic features. Another is modifying the HPM training to let its parameters be speaking-rate dependent. Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate on Mandarin speech prosody very well. An application of the SR-HPM to design and implement a speaking rate-controlled Mandarin TTS system is demonstrated. The system can generate natural synthetic speech for any given speaking rate in a wide range of 3.4-6.8 syllables/sec. Two subjective tests, MOS and preference test, were conducted to compare the proposed system with the popular HTS system. The MOS scores of the proposed system were in the range of 3.58-3.83 for eight different speaking rates, while they were in 3.09-3.43 for HTS. Besides, the proposed system had higher preference scores (49.8%-79.6%) than those (9.8%-30.7%) of HTS. This confirmed the effectiveness of the speaking rate control method of the proposed TTS system.	en_US
dc.language.iso	en_US	en_US
dc.subject	Mandarin prosody modeling	en_US
dc.subject	speaking rate modeling	en_US
dc.subject	speaking rate-controlled TTS	en_US
dc.title	Modeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTS	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1109/TASLP.2014.2321482	en_US
dc.identifier.journal	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING	en_US
dc.citation.volume	22	en_US
dc.citation.issue	7	en_US
dc.citation.spage	1158	en_US
dc.citation.epage	1171	en_US
dc.contributor.department	電機工程學系	zh_TW
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.identifier.wosnumber	WOS:000338122000005	-
dc.citation.woscount	0	-
顯示於類別：	期刊論文

文件中的檔案：

000338122000005.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。