標題: 一套基於類神經網路與模糊邏輯之中文語音合成系統
A Mandarin Text-to-Speech System based on Neural Networks and Fuzzy Logic
作者: 林顯易
Shean-Yih Lin
林進燈
Chin-Teng Lin
電控工程研究所
關鍵字: 中文語音合成系統;類神經網路;模糊邏輯;說易通;Mandarin Text-to-Speech system;Neural networks;Fuzzy logic;Easy-Talk
公開日期: 1998
摘要: 本論文提出模糊推論與遞迴類神經網路結合的新方法,並用於文句轉語音合成系統之韻律模型之研究。 這套文句轉語音系統利用此基於加強式遞迴類神經網路之韻律模型產生的合成語音比起傳統用規則法所 合成的語音較為自然。利用這個使用於我們的文句轉語音系統上的方法,可以克服變調及其它使傳統文 句轉語音系統無法產生流利語音的韻律現象。換句話說,我們所提的方法提供一個完全的答案去解決合 成韻律訊息的問題。此加強式遞迴類神經網路為基礎的韻律模型使用一個五層的類神經網路及一個使用 二十五條模糊法則的推論引擎來產生韻律訊息,其中包括音高平均值、音高曲線形狀、音長、停頓長。 實驗結果顯示,此加強式遞迴類神經網路為基礎的韻律模型可以自動學習人類的音韻規則。為了證實此 韻律模型的效能,我們製作了一套使用中文單音節語音資料庫及加強式遞迴類神經網路為基礎的韻律模 型之中文文句轉語音合成系統,並基於時域基頻同步疊加合成方法。經由本國實驗室人員主觀測試後, 其結果顯示本系統所合成的語音比傳統規則法所合成的語音較為自然。
This thesis investigates a new technique using a recurrent neural network enhanced by a fuzzy-rule inference engine in the prosodic model of a text-to-speech (TTS) system. The TTS system generates more natural synthetic speech with the enhanced recurrent-neural-network (RNN)-based prosodic model than with the traditional rule-based one. Using the method proposed in our TTS system can overcome sandhi and other prosodic phenomena existing in the traditional TTS systems which always had trouble in generating fluent speech. In other words, the proposal method provides a total solution to solve the problem of prosodic information synthesis. The enhanced RNN-based prosodic model employs a five-layer network and a fuzzy inference engine with twenty-five fuzzy rules to generate the prosodic information, including pitch means, pitch shapes, maximum energy levels, syllable duration and pause duration. Experimental results show that the enhanced RNN-based prosodic model can automatically learn prosody phonological rules of human beings. To verify the performance of this prosodic model, we implement a Mandarin TTS system which is based on time-domain pitch-synchronous-overlap-add (TD-PSOLA) method, with a Mandarin monosyllable database and an enhanced RNN-based prosodic model. Through subjective tests by the native laboratory members, it is shown that the synthetic speech is more natural than one synthesized by the traditional rule-based method.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870591011
http://hdl.handle.net/11536/64938
顯示於類別:畢業論文