標題: 中文TTS系統語音合成之改進
An Improvement of Speech Synthesis for Mandarin TTS System
作者: 林立峰
Li-Feng Lin
Sin-Horng Chen
關鍵字: 中文文句轉語音系統;語音自動切割;基週軌跡偵測;TTS;automatic speech segmentation;pitch contour detection;acoustic inventory
公開日期: 2003
摘要: 以語料庫為基礎(Corpus-based)的中文文句轉語音系統是現今語音合成的主流,在實作這樣的系統前,首先必須對一個大量的語料庫做切割動作;本論文試著建立一套處理大量語料庫的標準流程,內容包括語音的自動切割與修正、pitch contour的求取與調整,以期為將來建立Corpus-based 的TTS系統做鋪路。再來,我們主要針對國立交通大學電信研究所過去所發展的國語文句轉語音系統的合成器和樣本音節資料庫作改進:合成器的部分,對於音節相連時產生不連續的現象做處理,並且改變合成摩擦類子音的方式;在樣本音節的選取上,將原有過長的樣本音節,置換成三種不同長度的音節,以降低因TD-PSOLA時長改變太大所引發聲音品質不佳的效應。經過以上的改良後,合成出的語音較為自然順暢。
In this thesis, a standard pre-processing procedure is established for the development of corpus-based Mandarin text-to-speech (TTS) systems. It includes automatic speech segmentation, syllable pitch contour detection, and pronunciation error detection. Besides, some improvements of the existing Mandarin TTS system, developed previously in National Chiao Tung University, are discussed. Firstly, a new acoustic inventory is constructed. Three waveform templates with different durations for each base-syllable are extracted from a large continuous-speech database to replace the single isolated waveform template which is pronounced too long in length. The degradation in syllable-duration modification of TD-PSOLA synthesizer caused by too large compression or stretching is therefore greatly reduced. Secondly, a processing to eliminate the energy and pitch discontinuities in syllable waveform concatenation is done. Lastly, the method to synthesize fricative sounds is changed from re-sampling to overlap-add. Experimental results showed that the quality of the Keywords: Mandarin text-to-speech, automatic speech segmentation, pitch contour detection, acoustic inventory


  1. 350801.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。