标题: | 一套基于类神经网路与模糊逻辑之中文语音合成系统 A Mandarin Text-to-Speech System based on Neural Networks and Fuzzy Logic |
作者: | 林显易 Shean-Yih Lin 林进灯 Chin-Teng Lin 电控工程研究所 |
关键字: | 中文语音合成系统;类神经网路;模糊逻辑;说易通;Mandarin Text-to-Speech system;Neural networks;Fuzzy logic;Easy-Talk |
公开日期: | 1998 |
摘要: | 本论文提出模糊推论与递回类神经网路结合的新方法,并用于文句转语音合成系统之韵律模型之研究。 这套文句转语音系统利用此基于加强式递回类神经网路之韵律模型产生的合成语音比起传统用规则法所 合成的语音较为自然。利用这个使用于我们的文句转语音系统上的方法,可以克服变调及其它使传统文 句转语音系统无法产生流利语音的韵律现象。换句话说,我们所提的方法提供一个完全的答案去解决合 成韵律讯息的问题。此加强式递回类神经网路为基础的韵律模型使用一个五层的类神经网路及一个使用 二十五条模糊法则的推论引擎来产生韵律讯息,其中包括音高平均值、音高曲线形状、音长、停顿长。 实验结果显示,此加强式递回类神经网路为基础的韵律模型可以自动学习人类的音韵规则。为了证实此 韵律模型的效能,我们制作了一套使用中文单音节语音资料库及加强式递回类神经网路为基础的韵律模 型之中文文句转语音合成系统,并基于时域基频同步叠加合成方法。经由本国实验室人员主观测试后, 其结果显示本系统所合成的语音比传统规则法所合成的语音较为自然。 This thesis investigates a new technique using a recurrent neural network enhanced by a fuzzy-rule inference engine in the prosodic model of a text-to-speech (TTS) system. The TTS system generates more natural synthetic speech with the enhanced recurrent-neural-network (RNN)-based prosodic model than with the traditional rule-based one. Using the method proposed in our TTS system can overcome sandhi and other prosodic phenomena existing in the traditional TTS systems which always had trouble in generating fluent speech. In other words, the proposal method provides a total solution to solve the problem of prosodic information synthesis. The enhanced RNN-based prosodic model employs a five-layer network and a fuzzy inference engine with twenty-five fuzzy rules to generate the prosodic information, including pitch means, pitch shapes, maximum energy levels, syllable duration and pause duration. Experimental results show that the enhanced RNN-based prosodic model can automatically learn prosody phonological rules of human beings. To verify the performance of this prosodic model, we implement a Mandarin TTS system which is based on time-domain pitch-synchronous-overlap-add (TD-PSOLA) method, with a Mandarin monosyllable database and an enhanced RNN-based prosodic model. Through subjective tests by the native laboratory members, it is shown that the synthetic speech is more natural than one synthesized by the traditional rule-based method. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT870591011 http://hdl.handle.net/11536/64938 |
显示于类别: | Thesis |