A New Prosody-Assisted Mandarin ASR System

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chen, Sin-Horng	en_US
dc.contributor.author	Yang, Jyh-Her	en_US
dc.contributor.author	Chiang, Chen-Yu	en_US
dc.contributor.author	Liu, Ming-Chieh	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.date.accessioned	2014-12-08T15:22:32Z	-
dc.date.available	2014-12-08T15:22:32Z	-
dc.date.issued	2012-08-01	en_US
dc.identifier.issn	1558-7916	en_US
dc.identifier.uri	http://hdl.handle.net/11536/15935	-
dc.description.abstract	This paper presents a new prosody-assisted automatic speech recognition (ASR) system for Mandarin speech. It differs from the conventional approach of using simple prosodic cues on employing a sophisticated prosody modeling approach based on a four-layer prosody-hierarchy structure to automatically generate 12 prosodic models from a large unlabeled speech database by the joint prosody labeling and modeling (PLM) algorithm proposed previously. By incorporating these 12 prosodic models into a two-stage ASR system to rescore the word lattice generated in the first stage by the conventional hidden Markov model (HMM) recognizer, we can obtain a better recognized word string. Besides, some other information can also be decoded, including part of speech (POS), punctuation mark (PM), and two types of prosodic tags which can be used to construct the prosody-hierarchy structure of the testing speech. Experimental results on the TCC300 database, which consists of long paragraphic utterances, showed that the proposed system significantly outperformed the baseline scheme using an HMM recognizer with a factored language model which models word, POS, and PM. Performances of 20.7%, 14.4%, and 9.6% in word, character, and base-syllable error rates were obtained. They corresponded to 3.7%, 3.7%, and 2.4% absolute (or 15.2%, 20.4%, and 20% relative) error reductions. By an error analysis, we found that many word segmentation errors and tone recognition errors were corrected.	en_US
dc.language.iso	en_US	en_US
dc.subject	Prosody modeling	en_US
dc.subject	prosody-assisted automatic speech recognition (ASR)	en_US
dc.subject	prosody-hierarchy structure	en_US
dc.title	A New Prosody-Assisted Mandarin ASR System	en_US
dc.type	Article	en_US
dc.identifier.journal	IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING	en_US
dc.citation.volume	20	en_US
dc.citation.issue	6	en_US
dc.citation.epage	1669	en_US
dc.contributor.department	電機工程學系	zh_TW
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.identifier.wosnumber	WOS:000302532000001	-
dc.citation.woscount	2	-
Appears in Collections:	Articles

Files in This Item:

000302532000001.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.