RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

doi:10.1016/S0167-6393(01)00006-1

標題:	RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion
作者:	Wang, WJ Liao, YF Chen, SH 電信工程研究所 Institute of Communications Engineering
關鍵字:	recurrent neural network;prosodic modeling;speech-to-text conversion;acoustic decoding;linguistic decoding
公開日期:	1-Mar-2002
摘要:	In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. The prosodic modeling is performed in the post-processing stage of acoustic decoding and aims at detecting word-boundary cues to assist in linguistic decoding. It employs a simple three-layer RNN to learn the relationship between input prosodic features, extracted from the input utterance with syllable boundaries pre-determined by the preceding acoustic decoder, and output word-boundary information of the associated text. After the RNN prosodic model is properly trained, it can be used to generate word-boundary cues to help the linguistic decoder solving the problem of word-boundary ambiguity. Two schemes of using these word-boundary cues are proposed. Scheme I modifies the baseline scheme of the conventional linguistic decoding search by directly taking the RNN outputs as additional scores and adding them to all word-sequence hypotheses to assist in selecting the best recognized word sequence. Scheme 2 is an extended version of Scheme I by further using the RNN outputs to drive a finite state machine (FSM) for setting path constraints to restrict the linguistic decoding search. Character accuracy rates of 73.6%, 74.6% and 74.7% were obtained for the systems using the baseline scheme, Schemes I and 2, respectively. Besides, a gain of 17% reduction in the computational complexity of the linguistic decoding search was also obtained for Scheme 2. So the proposed prosodic modeling method is promising for Mandarin speech recognition. (C) 2002 Elsevier Science B.V. All rights reserved.
URI:	http://dx.doi.org/10.1016/S0167-6393(01)00006-1 http://hdl.handle.net/11536/28961
ISSN:	0167-6393
DOI:	10.1016/S0167-6393(01)00006-1
期刊:	SPEECH COMMUNICATION
Volume:	36
Issue:	3-4
起始頁:	247
結束頁:	265
Appears in Collections:	Articles

Files in This Item:

000173774700005.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.