標題: 台語文句翻語音系統之製作
An Implementation of Taiwanese Text-to-Speech System
作者: 楊鈺清
Yu-Ching Yang
陳信宏
Dr. Sin-Horng Chen
電信工程研究所
關鍵字: 文句翻語音;台語;國語;遞迴類神經;基頻同步疊加;TTS;Taiwanese;Mandarin;RNN;PSOLA
公開日期: 1998
摘要: 本論文完成一套台語文句翻語音系統。它由四個主要部份組成:文句分析器、RNN韻律訊息產生器、語音波形樣本資料庫和PSOLA語音合成器。輸入文句經由文句分析後產生適當的語言參數,RNN韻律訊息產生器則根據這些參數產生出相對應的韻律參數。PSOLA語音合成器則根據合成音節碼從語音波形樣本資料庫擷取出適當的語音波形樣本,將其依照韻律參數調整後,得到合成語音波形輸出。在此研究中,我們並嘗試了一些不同的方法來合成語音。首先,為了能夠更精緻化的合成語音,我們以取樣點為基本合成單元來代替以音框為基本合成單元的方式來合成語音。其次,為了能使合成語音的波封更接近實際的情形,我們採用能量軌跡的方法來合成語音。此外,為了克服由於錄音環境不同或者是錄音者本身的因素,而有錄音語句前後音量和速度不同的現象,我們則先對目標值做正規化後,再訓練遞迴類神經網路。最後,我們使用一個單一文件界面的文字編輯器配合語音合成核心製作了一套在Windows 95/NT平台上的展示系統。
In this thesis, a Taiwanese TTS system is implemented. It consists of four main parts: text analyzer, RNN prosody generator, waveform inventory of synthesis units, and PSOLA synthesizer. The input text is first tagged in the text analyzer into word sequence. Then, the RNN prosody generator is used to generate the prosodic information by using linguistic features extracted from the word sequence. Waveform sequence corresponding to the word sequence is then extracted from the waveform inventory and prosodically-adjusted to generate the output speech. The basic implementation of the system follows the Mandarin TTS system developed previously in NCTU with the following improvements. First, the sample-based duration information are used rather than the frame-based one. Second, the syllable energy contour is taken as a prosodic information to be generated in stead of using static patterns given by the corresponding basic waveform. Third, both duration and energy features are normalized up to the utterance level. A demo system operating on the Windows 95/NT platform by using a SDI (Single Document Interface) text editor with the synthesis kernel was last realized. Informal listening tests show that most synthesized speeches sound fair.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870435018
http://hdl.handle.net/11536/64476
Appears in Collections:Thesis