Full metadata record
DC FieldValueLanguage
dc.contributor.authorChen, Sin-Horngen_US
dc.contributor.authorHsieh, Chiao-Huaen_US
dc.contributor.authorChiang, Chen-Yuen_US
dc.contributor.authorHsiao, Hsi-Chunen_US
dc.contributor.authorWang, Yih-Ruen_US
dc.contributor.authorLiao, Yuan-Fuen_US
dc.contributor.authorYu, Hsiu-Minen_US
dc.date.accessioned2014-12-08T15:36:17Z-
dc.date.available2014-12-08T15:36:17Z-
dc.date.issued2014-07-01en_US
dc.identifier.issn2329-9290en_US
dc.identifier.urihttp://dx.doi.org/10.1109/TASLP.2014.2321482en_US
dc.identifier.urihttp://hdl.handle.net/11536/24622-
dc.description.abstractA new data-driven approach to building a speaking rate-dependent hierarchical prosodic model (SR-HPM), directly from a large prosody-unlabeled speech database containing utterances of various speaking rates, to describe the influences of speaking rate on Mandarin speech prosody is proposed. It is an extended version of the existing HPM model which contains 12 sub-models to describe various relationships of prosodic-acoustic features of speech signal, linguistic features of the associated text, and prosodic tags representing the prosodic structure of speech. Two main modifications are suggested. One is designing proper normalization functions from the statistics of the whole database to compensate the influences of speaking rate on all prosodic-acoustic features. Another is modifying the HPM training to let its parameters be speaking-rate dependent. Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate on Mandarin speech prosody very well. An application of the SR-HPM to design and implement a speaking rate-controlled Mandarin TTS system is demonstrated. The system can generate natural synthetic speech for any given speaking rate in a wide range of 3.4-6.8 syllables/sec. Two subjective tests, MOS and preference test, were conducted to compare the proposed system with the popular HTS system. The MOS scores of the proposed system were in the range of 3.58-3.83 for eight different speaking rates, while they were in 3.09-3.43 for HTS. Besides, the proposed system had higher preference scores (49.8%-79.6%) than those (9.8%-30.7%) of HTS. This confirmed the effectiveness of the speaking rate control method of the proposed TTS system.en_US
dc.language.isoen_USen_US
dc.subjectMandarin prosody modelingen_US
dc.subjectspeaking rate modelingen_US
dc.subjectspeaking rate-controlled TTSen_US
dc.titleModeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTSen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/TASLP.2014.2321482en_US
dc.identifier.journalIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSINGen_US
dc.citation.volume22en_US
dc.citation.issue7en_US
dc.citation.spage1158en_US
dc.citation.epage1171en_US
dc.contributor.department電機工程學系zh_TW
dc.contributor.departmentDepartment of Electrical and Computer Engineeringen_US
dc.identifier.wosnumberWOS:000338122000005-
dc.citation.woscount0-
Appears in Collections:Articles


Files in This Item:

  1. 000338122000005.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.