RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

doi:10.1016/S0167-6393(01)00006-1

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Wang, WJ	en_US
dc.contributor.author	Liao, YF	en_US
dc.contributor.author	Chen, SH	en_US
dc.date.accessioned	2014-12-08T15:42:40Z	-
dc.date.available	2014-12-08T15:42:40Z	-
dc.date.issued	2002-03-01	en_US
dc.identifier.issn	0167-6393	en_US
dc.identifier.uri	http://dx.doi.org/10.1016/S0167-6393(01)00006-1	en_US
dc.identifier.uri	http://hdl.handle.net/11536/28961	-
dc.description.abstract	In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. The prosodic modeling is performed in the post-processing stage of acoustic decoding and aims at detecting word-boundary cues to assist in linguistic decoding. It employs a simple three-layer RNN to learn the relationship between input prosodic features, extracted from the input utterance with syllable boundaries pre-determined by the preceding acoustic decoder, and output word-boundary information of the associated text. After the RNN prosodic model is properly trained, it can be used to generate word-boundary cues to help the linguistic decoder solving the problem of word-boundary ambiguity. Two schemes of using these word-boundary cues are proposed. Scheme I modifies the baseline scheme of the conventional linguistic decoding search by directly taking the RNN outputs as additional scores and adding them to all word-sequence hypotheses to assist in selecting the best recognized word sequence. Scheme 2 is an extended version of Scheme I by further using the RNN outputs to drive a finite state machine (FSM) for setting path constraints to restrict the linguistic decoding search. Character accuracy rates of 73.6%, 74.6% and 74.7% were obtained for the systems using the baseline scheme, Schemes I and 2, respectively. Besides, a gain of 17% reduction in the computational complexity of the linguistic decoding search was also obtained for Scheme 2. So the proposed prosodic modeling method is promising for Mandarin speech recognition. (C) 2002 Elsevier Science B.V. All rights reserved.	en_US
dc.language.iso	en_US	en_US
dc.subject	recurrent neural network	en_US
dc.subject	prosodic modeling	en_US
dc.subject	speech-to-text conversion	en_US
dc.subject	acoustic decoding	en_US
dc.subject	linguistic decoding	en_US
dc.title	RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1016/S0167-6393(01)00006-1	en_US
dc.identifier.journal	SPEECH COMMUNICATION	en_US
dc.citation.volume	36	en_US
dc.citation.issue	3-4	en_US
dc.citation.spage	247	en_US
dc.citation.epage	265	en_US
dc.contributor.department	電信工程研究所	zh_TW
dc.contributor.department	Institute of Communications Engineering	en_US
dc.identifier.wosnumber	WOS:000173774700005	-
dc.citation.woscount	20	-
顯示於類別：	期刊論文

文件中的檔案：

000173774700005.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。