標題: 中文語音合成技術之實作與分析
An Implementation and Analysis of Mandarin Speech Synthesis Technologies
作者: 魯弘茂
Hong-Mao Lu
陳信宏
Dr. Sin-Horng Chen
電信工程研究所
關鍵字: 語音合成;文句翻語音系統;基頻同步疊加;遞迴式類神經網路;speech synthesis;TTS;Text To Speech;PSOLA;RNN
公開日期: 2001
摘要: 本論文對國立交通大學電信研究所過去發展的國語文句翻語音系統提出一些改進。首先,對於由411個基本音節波型過長所引起的合成聲音品質不佳的效應加以改進,採用由連續語音中抽取長度適中且沒有連音效應的基本音節波型以及去除過長的基本音節波型的鼻音部分,可大幅度改善合成聲音品質;接著,我們比較TD-PSOLA的三種不同能量補償方式—簡單重疊相加、最小平均方相加、及簡化後的簡單重疊相加,以及LP-PSOLA,最後決定採用簡化後的簡單重疊相加方式的TD-PSOLA;然後,我們使用類神經網路來產生音節的能量軌跡,將音節依照聲母分成四大類,個別使用一個MLP,由適當的語言輸入參數,可產生相當好的能量軌跡;最後,我們改進過去所提出,以 RNN-MLP方式,對夾雜在中文文句中的英文專有名詞,產生逐字發音所需的韻律信息,獲得較佳的字母長度及中英文詞間停頓長度。
In this thesis, some approaches to improve the Mandarin TTS system, developed previously in the Department of Communication Engineering of National Chiao Tung University, are discussed. Firstly, the problem of the acoustic inventory comprising too-long waveforms of 411 isolated base-syllables is solved by selecting waveforms of proper duration from continuous speech and by compressing the nasal parts of too-long waveforms. Experimental results showed that the quality of the synthesized speech was greatly improved. Secondly, three forms of TD-PSOLA and LP-PSOLA are implemented to compare their qualities. Based on the study, we choose to use the simplest form of TD-PSOLA. Thirdly, an NN-based method is proposed to generate the energy contour of syllable. We first classify all syllables into four classes according to their initials, and then use one MLP to generate energy contours for syllables in each class. With properly choosing input linguistic features, very good synthesized energy contour of syllable can be obtained. Lastly, the RNN-MLP method proposed previously for the generation of prosodic parameters for English alphabets embedded in Chinese text is refined. Experimental results showed that better alphabet duration and pause durations before and after English words were obtained.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900435041
http://hdl.handle.net/11536/68916
顯示於類別:畢業論文