中文文句翻語音系統之遞迴式模糊類神經韻律模型研究

標題:	中文文句翻語音系統之遞迴式模糊類神經韻律模型研究 A Study on Recurrent Fuzzy Neural Prosodic Model in a Mandarin Text-to-Speech System
作者:	蔡正雄 Tsai Cheng-Hsiung 林進燈 Lin Chin-Teng 電控工程研究所
關鍵字:	文字翻語音;韻律訊息產生器;遞迴式模糊類神經網路;Text-to-Speech;prosodic information generation;Recurrent Fuzzy Neural Network
公開日期:	1999
摘要:	本論文探討一個關於中文文字翻語音系統的新方法，其主要的研究方向乃著重於韻律訊息產生器的研究與探討。為了模擬人類的說話邏輯，在此將模糊規則建構法使用於韻律模型上。我們提出的遞迴式模糊類神經網路是一個結合自我建構模糊類神經推論網路(SONFIN)與多層遞迴式類神經網路的組織架構，依其功能可區分為兩部分：第一部份採用SONFIN為韻律模型，根據模糊推論規則來探究高階語言參數與韻律訊息之間的關係。第二部分為一個五層的遞迴式類神經網路，它根據第一部份韻律模式產生的韻律模糊規則及其他重要的音節參數，經由大量的語料訓練，即可得到所需之韻律參數。使用本論文所提的方法於文字翻語音的系統中，不僅可以克服變調及傳統方法無法產生自然音韻的情況，還能建構出韻律片語結構的規則。因此，我們採用的新模型可以產生適當的韻律參數，其中包括音高平均值、音調軌跡、音節時長、停頓時長及音量準位，最後便能合成自然的語音。為了驗證此韻律模型之效能，我們使用中文單音節語音資料庫，並基於時域基頻同步疊加合成方法，以本論文提的模型對舊有之系統進行改進，製作一套中文文字翻語音之系統，稱為「說文解字」。經由一些試聽測試後，其結果顯示合成出的語音較以往更為自然。 In this thesis, we investigate a new technique on a Mandarin text-to-speech (TTS) system. Our major effort is focused on prosodic information generation. New methodologies for constructing fuzzy rules on a prosodic model to simulate human’s brain are studied in this thesis. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multi-layer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a connectionist structure. The RFNN can be functionally partitioned into two parts. The first part adopts the SONFIN and is taken as a prosodic model to explore the relationship between high-level linguistic features and prosodic information by inferring fuzzy rule. The second part employs a five-layer network to generate all prosodic parameters by using the prosodic fuzzy rule from the first part and other important features of syllable fed in directly. Using the method proposed in our TTS system can overcome not only sandhi rule and the other prosodic phenomena existing in the traditional TTS systems but also to find out some rules about prosodic phrase structure. Hence, we can generate proper prosody parameters, including pitch means, pitch shapes, maximum energy levels, syllable duration and pause duration, to synthesis fluent speech. To verify the performance of this prosodic model, we modify a TTS system developed previously, based on time-domain pitch synchronous overlap add (TD-PSOLA) method, with a Mandarin monosyllable database. Through some listening test, the synthetic speech is more natural than previous version. 英文摘要…………………………………………………………………………ii 致謝 …………………………………………………………………………iii 目錄 …………………………………………………………………………iv 表目錄 …………………………………………………………………………vii 圖目錄 …………………………………………………………………………viii 第一章緒論……………………………………………………………………1 1.1 應用範圍……………………………………………………………………1 1.2 背景發展……………………………………………………………………2 1.3 中文文句翻語音所遭遇的問題……………………………………………3 1.4 本論文使用方法……………………………………………………………4 1.5 章節大綱……………………………………………………………………4 第二章系統概要………………………………………………………………5 2.1 系統架構……………………………………………………………………5 2.2 文句分析……………………………………………………………………5 2.2.1 文字前處理……………………………………………………………5 2.2.2 自然語言剖析…………………………………………………………6 2.3 合成單元的選取……………………………………………………………7 2.4 韻律產生機制………………………………………………………………8 2.5 語音合成……………………………………………………………………9 第三章文句分析模組…………………………………………………………11 3.1 資料庫………………………………………………………………………11 3.1.1 詞庫…………………………………………………………………11 3.1.2 語料庫………………………………………………………………12 3.2 馬可夫語言模型……………………………………………………………13 3.2.1 模型假設……………………………………………………………13 3.2.2 以詞類為基礎之雙連語言模型……………………………………15 3.3 斷詞…………………………………………………………………………15 3.3.1 斷詞原則……………………………………………………………17 3.3.2 建立多層詞格………………………………………………………17 3.3.3 利用動態規劃做路徑搜尋…………………………………………19 3.4 構詞…………………………………………………………………………20 3.5 破音字處理…………………………………………………………………21 3.5.1變調法則………………………………………………………………21 3.5.2破音字處理方法………………………………………………………21 3.6 測試方式……………………………………………………………………22 3.7 實驗結果 ……………………………………………………………………24 第四章韻律訊息產生模組……………………………………………………25 4.1 音韻資訊之概念……………………………………………………………25 4.2 模糊推論韻律規則模型……………………………………………………26 4.3 自我建構模糊類神經推理網路……………………………………………28 4.3.1 SONFIN架構…………………………………………………………29 4.3.2 SONFIN學習法則……………………………………………………31 4.4 多層遞迴類神經網路………………………………………………………37 4.4.1多層遞迴類神經網路架構……………………………………………37 4.4.2多層遞迴類神經網路學習法則………………………………………39 4.5 輸入參數……………………………………………………………………40 4.6 輸出參數……………………………………………………………………43 4.7 結果分析……………………………………………………………………44 4.8 韻律規則之驗證……………………………………………………………48 4.8.1音調連結規則…………………………………………………………48 4.8.2模糊推論文句規則……………………………………………………50 第五章語音合成模組…………………………………………………………59 5.1 TD-PSOLA模型………………………………………………………………61 5.2 TD-PSOLA合成方法…………………………………………………………61 5.2.1基頻同步分析…………………………………………………………61 5.2.2基頻同步調變…………………………………………………………62 5.2.3基頻同步疊加…………………………………………………………62 5.2.4基頻及音長調變………………………………………………………63 5.3 音節信號之合成……………………………………………………………64 5.3.1 音節語音波形樣本資料庫…………………………………………64 5.3.2 無聲、有聲部分的長度……………………………………………65 5.3.3 有聲部分的基頻軌跡………………………………………………65 5.4 合成語音測試………………………………………………………………67 第六章系統實做………………………………………………………………69 6.1 系統需求……………………………………………………………………69 6.2 系統架構……………………………………………………………………69 6.3 人機介面……………………………………………………………………70 6.4 功能…………………………………………………………………………70 6.5 系統安裝與使用……………………………………………………………72 第七章結論與展望……………………………………………………………73 參考文獻…………………………………………………………………………75 附錄一中文411單音節………………………………………………………77 附錄二 Ordered Partial Derivatives...………………………………81 附錄三 Epochwise Backsweep 學習法則…………………………………83
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT880591013 http://hdl.handle.net/11536/66243
顯示於類別：	畢業論文