基於隱藏式馬可夫模型之英文語音合成系統實作

標題:	基於隱藏式馬可夫模型之英文語音合成系統實作 An Implementation of HMM-based English Speech Synthesis
作者:	劉冠驛 Liu, Kuan-Yi 陳信宏 Chen, Sin-Horng 電信工程研究所
關鍵字:	英文語音合成;English;speech;systhesis
公開日期:	2011
摘要:	本論文使用一個以中文為母語的女性語者，以托福考試文章為內容的語料庫，實作一個線上英文語音合成系統。先透過一個不錯的三連音模型為語料庫做切割，再使用cmu字典與Stanford-Postagger在標記中加上音素與音節、詞、片語、句子五層結構的相關位置的韻律資訊，加以建立口腔、基頻與狀態持續時間模型，以期增加合成語音的韻律、節奏的自然度。由實驗結果顯示，產生的韻律仍不夠自然，雖和國外其它網站合成的語音比較起來，整體韻律起伏較為明顯一點，但聲音則明顯模糊不清與細部奇怪的音調起伏，推測是因為目前只使用規則法去估計各韻律標記，所預估的韻律資訊仍不夠準確，以致合成的音檔大體的韻律正確，但較細部的音調有忽高忽低的問題。 The thesis establishes an online English text to speech system. Using the data base based on a woman whose mother language is China read TOEFL article. First through a good tri-phone model to segment data base, then using CMU dictionary and Stanford-Postagger software labeled phone, syllable, word, phrase and sentence five level structure relative position and prosodic information, to establish vocal cave, fundamental frequency, and duration model, expected to product more prosody and rhythm. According to experiment result, the synthesized prosody still not natural enough. Although compare with speech synthesized from foreign web site, our prosody is more ripple but more blurred and weird rise and fall. Suppose to use rule based method to estimate variety prosodic labels still not accurate enough. So synthesized speech prosody right in general, but having strange ripple in detail.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079413548 http://hdl.handle.net/11536/40744
Appears in Collections:	Thesis

Files in This Item:

354801.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.