标题: 一套基于类神经网路与模糊逻辑之中文语音合成系统
A Mandarin Text-to-Speech System based on Neural Networks and Fuzzy Logic
作者: 林显易
Shean-Yih Lin
林进灯
Chin-Teng Lin
电控工程研究所
关键字: 中文语音合成系统;类神经网路;模糊逻辑;说易通;Mandarin Text-to-Speech system;Neural networks;Fuzzy logic;Easy-Talk
公开日期: 1998
摘要: 本论文提出模糊推论与递回类神经网路结合的新方法,并用于文句转语音合成系统之韵律模型之研究。
这套文句转语音系统利用此基于加强式递回类神经网路之韵律模型产生的合成语音比起传统用规则法所
合成的语音较为自然。利用这个使用于我们的文句转语音系统上的方法,可以克服变调及其它使传统文
句转语音系统无法产生流利语音的韵律现象。换句话说,我们所提的方法提供一个完全的答案去解决合
成韵律讯息的问题。此加强式递回类神经网路为基础的韵律模型使用一个五层的类神经网路及一个使用
二十五条模糊法则的推论引擎来产生韵律讯息,其中包括音高平均值、音高曲线形状、音长、停顿长。
实验结果显示,此加强式递回类神经网路为基础的韵律模型可以自动学习人类的音韵规则。为了证实此
韵律模型的效能,我们制作了一套使用中文单音节语音资料库及加强式递回类神经网路为基础的韵律模
型之中文文句转语音合成系统,并基于时域基频同步叠加合成方法。经由本国实验室人员主观测试后,
其结果显示本系统所合成的语音比传统规则法所合成的语音较为自然。
This thesis investigates a new technique using a recurrent neural network enhanced by a fuzzy-rule inference engine in the
prosodic model of a text-to-speech (TTS) system. The TTS system generates more natural synthetic speech with the enhanced
recurrent-neural-network (RNN)-based prosodic model than with the traditional rule-based one. Using the method proposed in
our TTS system can overcome sandhi and other prosodic phenomena existing in the traditional TTS systems which always had
trouble in generating fluent speech. In other words, the proposal method provides a total solution to solve the problem of prosodic
information synthesis. The enhanced RNN-based prosodic model employs a five-layer network and a fuzzy inference engine with
twenty-five fuzzy rules to generate the prosodic information, including pitch means, pitch shapes, maximum energy levels, syllable duration and pause duration. Experimental results show that the enhanced RNN-based prosodic model can automatically learn
prosody phonological rules of human beings. To verify the performance of this prosodic model, we implement a Mandarin TTS
system which is based on time-domain pitch-synchronous-overlap-add (TD-PSOLA) method, with a Mandarin monosyllable database and an enhanced RNN-based prosodic model. Through subjective tests by the native laboratory members, it is shown that the synthetic speech is more natural than one synthesized by the traditional rule-based method.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870591011
http://hdl.handle.net/11536/64938
显示于类别:Thesis