Full metadata record
DC FieldValueLanguage
dc.contributor.authorChen, SHen_US
dc.contributor.authorHwang, SHen_US
dc.contributor.authorWang, YRen_US
dc.date.accessioned2014-12-08T15:49:06Z-
dc.date.available2014-12-08T15:49:06Z-
dc.date.issued1998-05-01en_US
dc.identifier.issn1063-6676en_US
dc.identifier.urihttp://dx.doi.org/10.1109/89.668817en_US
dc.identifier.urihttp://hdl.handle.net/11536/32634-
dc.description.abstractA new RNN-based prosodic information synthesizer for Mandarin Chinese text-to-speech (TTS) is proposed in this paper. Its four-layer recurrent neural network (RNN) generates prosodic information such as syllable pitch contours, syllable energy levels, syllable initial and final durations, as well as inter-syllable pause durations. The input layer and first hidden layer operate with a word-synchronized clock to represent current-word phonologic states within the prosodic structure of text to be synthesized. The second hidden layer and output layer operate on a syllable-synchronized clock and use outputs from the preceding layers, along with additional syllable-level inputs fed directly to the second hidden layer, to generate desired prosodic parameters. The RNN was trained on a large set of actual utterances accompanied by associated texts, and can automatically learn many human-prosody phonologic rules, including the well-known Sandhi Tone 3 F0-change rule. Experimental results show that all synthesized prosodic parameter sequences matched quite well with their original counterparts, and a pitch-synchronous-overlap-add-based (PSOLA-based) Mandarin TTS system was also used for testing of our approach. While subjective tests are difficult to perform and remain to be done in the future, we have carried out informal listening tests by a significant number of native Chinese speakers and the results confirmed that all synthesized speech sounded quite natural.en_US
dc.language.isoen_USen_US
dc.subjectMandarinen_US
dc.subjectpitch contouren_US
dc.subjectprosodic information synthesizeren_US
dc.subjectrecurrent neural networken_US
dc.subjecttext-to-speechen_US
dc.titleAn RNN-based prosodic information synthesizer for Mandarin text-to-speechen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/89.668817en_US
dc.identifier.journalIEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSINGen_US
dc.citation.volume6en_US
dc.citation.issue3en_US
dc.citation.spage226en_US
dc.citation.epage239en_US
dc.contributor.department電子工程學系及電子研究所zh_TW
dc.contributor.departmentDepartment of Electronics Engineering and Institute of Electronicsen_US
dc.identifier.wosnumberWOS:000073145000003-
dc.citation.woscount56-
Appears in Collections:Articles


Files in This Item:

  1. 000073145000003.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.