漢語語音音節長度模型之研究

標題:	漢語語音音節長度模型之研究 A Syllable Duration Model for Mandarin Speech
作者:	簡家勇 Chia-Yung Chien 王逸如 Yih-Ru Wang 電信工程研究所
關鍵字:	文字轉語音合成器;語音;漢語;韻律;音長;音節;TTS;Speech;Mandarin;Prosody;Duration;Syllable
公開日期:	2007
摘要:	在本論文中我們對單一語者的閱讀語音說話模式建立了一個以音節為單位的音長模型，在模型中我們考慮了數個會影響到音長的因素，包含了每一個語者整體說話之平均音長特性的說話風格、低階音節層次的基本音節和聲調與代表著高階語言層次音長特性的韻律狀態。而對於模型的建立方面，我們以一個線性相加模型來納入這些影響因素，並且以循序最佳化程序的方式搭配著最大概似法則來訓練模型直到收斂。而後我們對模型所產生的參數做了分析，以觀察其是否符合我們所認知的語音特性，而對於韻律狀態方面我們除了分析其自身的特性，也搭配著在[11]中作者以韻律訊息中的音高訊息和停頓訊息與一些語言參數，對語料庫所標示的音節間之停頓類型的Break Type做了相關性分析，而Break Type也就是在[2]中作者所述的漢語語音階層韻律結構的標記。接著我們也將[11]中對語料庫的每一個音節所標示的音高韻律狀態和本論文的音長韻律狀態做了相關性分析，以觀察高階語言層次的音高和音長訊息之間的相關性。最後我們對去除掉語者、基本音節和聲調之影響的音長(其亦代表著高階語言層次音長特性)，搭配著[2]中所述的漢語語音階層韻律結構建立一個高階語言層次階層音長模型。而對於模型的建立方面，我們同樣以一個線性相加模型來納入這些階層的影響因素，並且以循序最佳化程序的方式搭配著最小平方誤差法則來訓練模型直到收斂。而後我們對模型所產生的參數做了分析，並且呈現出各個階層的音長軌跡，以觀察其是否符合我們所認知的語音特性。 In this thesis, we construct a syllable duration model for single-speaker Mandarin read speech. In this model, several affecting factors which control the variation of syllable duration are considered. They include the global mean of each speaker, base-syllable, tone, and prosodic state which represents the influences from all high-level linguistic features. Those affecting factors are combined additively. The model is trained by a sequential optimization procedure with maximum likelihood criterion. After well training, we analyze the model parameters to see wherther they conform to our a priori knowledge about Mandarin speech prosody. The relationship between the prosodic state and the inter-syllable break type, labeled by [11], are also explored. Besides, the collocation of the prosodic state and the pitch prosodic state derived by [11] are analyzed. Lastly, we construct a hierarchial duration model using the features combining the prosodic-state affecting factor and the residuals. The model is composed of three layers: PW, MIPPH, and MPPH. The affecting patterns of these three layers are length-dependent and assumed to be combined additively. The model is trained by a sequential optimization procedure with minimum squared error criterion. After well training, we analyze the resulting affectiong patterns to see wherther they conform to our a priori knowledge about Mandarin speech prosody. Experimental results confirmed that they conformed to those explored by Tseng [2].
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT009313546 http://hdl.handle.net/11536/78362
Appears in Collections:	Thesis

Files in This Item:

354601.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.