Hierarchical prosody modeling for Mandarin spontaneous speech

doi:10.1121/1.5099263

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lin, Cheng-Hsien	en_US
dc.contributor.author	You, Chung-Long	en_US
dc.contributor.author	Chiang, Chen-Yu	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.contributor.author	Chen, Sin-Horng	en_US
dc.date.accessioned	2019-06-03T01:08:34Z	-
dc.date.available	2019-06-03T01:08:34Z	-
dc.date.issued	2019-04-01	en_US
dc.identifier.issn	0001-4966	en_US
dc.identifier.uri	http://dx.doi.org/10.1121/1.5099263	en_US
dc.identifier.uri	http://hdl.handle.net/11536/151945	-
dc.description.abstract	In this paper, a hierarchical prosody model (HPM)-based method for Mandarin spontaneous speech is proposed. First, an HPM is designed for describing relations among acoustic features of utterances, linguistic features of texts, and prosodic tags representing the underlying hierarchical prosodic structures of utterances. Subsequently, a sequential optimization algorithm is employed to train the HPM based on a large conversational speech corpus, the Mandarin Conversational Dialogue Corpus (MCDC), which features orthographic transcriptions and prosodic event annotations. In this unsupervised training method, all utterances of the MCDC are labeled with two types of prosodic tags, namely, break and prosodic states, automatically and simultaneously. After training, the HPM parameters are examined to identify critical prosodic properties of Mandarin spontaneous speech, which are then compared with their counterparts in the read-speech HPM. The prosodic tags on the studied utterances enable mapping of various prosodic events onto the hierarchical prosodic structures of the utterances. Prosodic analyses of some disfluent events are conducted using the prosodic tags affixed to the MCDC. Finally, an application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed. Significant relative error rate reductions of 9.0%, 9.2%, 15.6%, and 7.3% are obtained for base-syllable, character, tone, and word recognition, respectively. (C) 2019 Acoustical Society of America.	en_US
dc.language.iso	en_US	en_US
dc.title	Hierarchical prosody modeling for Mandarin spontaneous speech	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1121/1.5099263	en_US
dc.identifier.journal	JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA	en_US
dc.citation.volume	145	en_US
dc.citation.issue	4	en_US
dc.citation.spage	2576	en_US
dc.citation.epage	2596	en_US
dc.contributor.department	電機工程學系	zh_TW
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.identifier.wosnumber	WOS:000466779100066	en_US
dc.citation.woscount	0	en_US
Appears in Collections:	Articles