A SPEAKING RATE-CONTROLLED MANDARIN TTS SYSTEM

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Hsieh, Chiao-Hua	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.contributor.author	Chiang, Chen-Yu	en_US
dc.contributor.author	Chen, Sin-Horng	en_US
dc.date.accessioned	2014-12-08T15:34:23Z	-
dc.date.available	2014-12-08T15:34:23Z	-
dc.date.issued	2013	en_US
dc.identifier.isbn	978-1-4799-0356-6	en_US
dc.identifier.issn	1520-6149	en_US
dc.identifier.uri	http://hdl.handle.net/11536/23536	-
dc.description.abstract	In this paper, a new speaking rate-controlled Mandarin TTS system based on a speaking rate-dependent hierarchical prosodic model (SR-HPM) [6] is proposed. In the training phase, a data-driven approach is employed to automatically build the SR-HPM directly from a large prosody-unlabeled speech database containing utterances of various speaking rates. The SR-HPM comprises 15 sub-models designed to describe various relationships among 3 types of prosodic-acoustic features of speech utterances, two types of prosodic tags specifying a 4-layer prosody hierarchy, linguistic features of various levels of the associated texts, and the speaking rates. In the test phase, the SR-HPM is employed to generate 4 prosodic-acoustic features, including syllable pitch contours, syllable durations, syllable energy levels, and syllable juncture pause durations. Combining these prosodic features with the spectral features generated by the HTS synthesizer, the system can generate natural speech for any speaking rate in a wide range of 0.15-0.3 seconds/syllable. A distinct feature of the system to control the occurrence frequencies of breaks of various types as well as their pause durations according to the given speaking rate was demonstrated. A subjective test showed that MOS scores of 3.35, 3.44 and 3.28 were achieved respectively for fast (SR= 0.17 sec/syllable), medium (SR=0.2 sec/syllable) and slow (SR= 0.25 sec/syllable) synthetic speeches.	en_US
dc.language.iso	en_US	en_US
dc.subject	Speaking rate modeling	en_US
dc.subject	Mandarin prosody modeling	en_US
dc.subject	Speaking rate-controlled TTS	en_US
dc.title	A SPEAKING RATE-CONTROLLED MANDARIN TTS SYSTEM	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.journal	2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)	en_US
dc.citation.spage	6900	en_US
dc.citation.epage	6904	en_US
dc.contributor.department	電子工程學系及電子研究所	zh_TW
dc.contributor.department	Department of Electronics Engineering and Institute of Electronics	en_US
dc.identifier.wosnumber	WOS:000329611507012	-
顯示於類別：	會議論文