標題: 扁平化文法架構之英文階層式韻律模型與其在語音合成之應用
Flat Structure of English Syntax on English Speech Prosody Modeling and its Application to TTS
作者: 劉于誠
Liu, Yu-Cheng
Chen, Sin-Horng
關鍵字: 扁平文法架構;英文韻律模型;階層式韻律模型;文字轉語音系統;Flat Sturcture;English prosody model;Hierarchical Prosody Modeling;Text-to-Speech
公開日期: 2016
摘要: 本論文提出以扁平化文法架構取代先前使用Stanford Parser的階層式文法架構,將英文文句分割成詞組串列,將詞組間的關係加入一個以音節為單位的階層式韻律模型,使用PLM演算法來從一個L2的英語短文語音資料庫自動訓練此韻律模型,並對此資料庫的文句自動標示音節間停頓標記及音節之韻律狀態標記。實驗結果顯示,訓練好的韻律模型的參數符合我們所了解的英語韻律知識,而對詞組邊界所標記的停頓類型分析,亦與我們對扁平化文法架構的韻律現象的預期大致相符合。 最後將此韻律模型應用在英語文字轉語音的韻律信息產生上,結合HTS語音合成軟體產生的頻譜信息來產生語音,由非正式的聽覺試驗顯示所合成的聲音較HTS合成的聲音自然,停頓與韻律起伏較為明顯。
This paper proposes to use a flat grammar structure of English to replace the conventional syntactic tree structure generated by the Stanford Parser. English texts are then segmented into word-chunk sequences. Various relations between two consecutive word chunks are explored and incorporated into an syllable-based English hierarchical prosodic model (EHPM). The model is trained by the PLM algorithm, modified from the version for Mandarin prosody modeling, to automatically generate all model parameters and label all utterances with two types of prosodic tags: syllable-juncture break type and syllable prosodic state. Experimental results show that both the EHPM parameters and prosodic tags labelled interpret well the prosodic phenomena of English. With the success of building the EHPM, we apply it to English Text-to-Speech (ETTS). Synthetic speech is generated by using the spectral features produced by the HTS synthesizer and the prosodic information produced by the EHPM. An informal listening test confirms that the synthetic speech sounds more natural than that generated by the HTS synthesizer.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070160264
Appears in Collections:Thesis