使用語音辨認做前處理之 TTS 系統發展

標題:	使用語音辨認做前處理之 TTS 系統發展 An Implementation of Mandarin TTS System using Preprocessing based on HMM segmentation
作者:	郭威志 Wei-Chih Kuo 陳信宏 Sin-Horng Chen 電信工程研究所
關鍵字:	前處理;語音辨認;TTS
公開日期:	1999
摘要:	本論文建立一套「自動化切割」的方法，用來做文句翻語音系統的前處理。論文中所用之方法，為語音辨認領域中所使用之 HMM 次音節模型。首先，我們以台大、交大、成大所錄製的語音資料庫，採用混合高斯模型，並且調高所用特徵參數之維度，再加上 SBR 的應用，建立了一套辨認率約 70％的 HMM 次音節模型。接著，我們使用這套次音節模型，對單一女性語者語料庫進行切割處理，再由切割位置中，抽取所需資訊，並由 RNN 韻律訊息產生器，重新訓練出相對應之韻律參數。最後，我們由 RNN 重新訓練所得之韻律參數，實作一套女性國語文句翻語音系統，並且加上較為細緻之能量軌跡建置方法。系統仍依照前人之研究 [1]，由四個主要部分組成：文句分析器、RNN 韻律訊息產生器、語音波形樣本資料庫及 PSOLA 語音合成器。 In this thesis, we implement a way be able to cut waves automatically to do preprocessing of Mandarin text-to-speech system. We choice HMM sub-syllable models which are used in speech recognition. In the beginning, we use the speech database recorded by NTU, NCTU, and NCKU. About the state observation probability, we employ mixture Gaussian models, and raise the numbers of Gaussian distribution. In additional, we employ SBR to compensate the effect of speakers and channels. Final, we get a series of HMM sub-syllable models which make the recognition rate about 70%. We employ the models to cut the speech database of a single female speaker, and extract the prosodic features from the cutting position. Then, we use the prosodic features to retrain the prosodic parameters by RNN prosodic generator. Final, we adopt the prosodic parameters to implement a female Mandarin text-to-speech system, and the syllable energy contour is taken as a prosodic information. The female Mandarin text-to-speech system consists of four main parts: text analyzer, RNN prosodic generator, waveform inventory of synthesis units, and PSOLA synthesizer.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT880435042 http://hdl.handle.net/11536/65877
Appears in Collections:	Thesis