以階層式韻律模型為基礎之中文半隱藏式馬可夫模型語音合成器

Full metadata record

DC Field	Value	Language
dc.contributor.author	吳文良	en_US
dc.contributor.author	陳信宏	en_US
dc.date.accessioned	2014-12-12T01:47:06Z	-
dc.date.available	2014-12-12T01:47:06Z	-
dc.date.issued	2011	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT079813506	en_US
dc.identifier.uri	http://hdl.handle.net/11536/46989	-
dc.description.abstract	本論文目標為引入階層式韻律模型，進一步提升以馬可夫模型為基礎之合成器表現。首先引入韻律模型相關之韻律標記-音節邊界停頓標記與音節韻律狀態，將其運用到頻譜模型訓練過程，在決策樹分群階段改以韻律標記取代傳統語言資訊，改以介於上層語法資訊與下層音節資訊間的中層韻律資訊供決策樹分群使用，韻律標記除考量語言資訊外，更同時考量了聲學上的資訊，故應比語言資訊與頻譜更加相關，經實驗證實，韻律標記確實可提供勝過語言資訊的分群能力，訓練出更好的頻譜模型。接著進一步考慮合成時韻律模型的運用，因合成階段僅有文字，但欲取得標記需同時具有聲學與語言資訊，故本論文提出以條件式隨機域的方式訓練以文字預估韻律標記的模型，由於其可同時考量全域觀察序列之影響，並且利用前後狀態相關性進行模型學習，對於具時間相關性的參數預估應極有幫助，從實驗結果可發現，預估得到的韻律狀態，大多皆能符合音節邊界停頓對應的轉移特性。最後結合頻譜模型、韻律模型與預估得到之韻律標記，即為一完整合成系統，此系統具韻律變化豐富之優點，但因音節邊界停頓預估仍不夠好，導致部分合成語音的自然度欠佳，此有待未來繼續努力。	zh_TW
dc.description.abstract	In this thesis, we introduce the hierarchical prosody model to further improve the HMM-based synthesis system performance. First, we apply two types of prosodic tags, prosodic breaks and prosody states, to the spectral model training process. In the process of decision tree clustering, we replace the high-level linguistic features with the middle-level prosodic tags to cluster context dependent model. For the prosodic tags labeling, we consider not only linguistic features but also acoustic features. We suggest it be more related to spectrum than considering linguistic features only. The experiment confirms that our proposed method is better than the conventional method considering linguistic features only in the clustering process. Second, in the synthesis stage, there is no way to label the prosodic tags of the text with the prosody model owing to the lack of acoustic features. As a result, we propose the conditional random fields(CRFs) method to estimate two types of prosodic tags according to the input text information. Because during the CRF model training process, it considers all the observation sequences and the neighboring output states, it is contributive to estimate the time-dependent parameter. The results of experiment show the transition of prosody states matches the corresponding prosodic breaks. Last, we build our proposed complete synthesis system by combining the training spectral model, the prosody model and the estimating prosodic tags, which has the advantage of prosodic diversity. Nevertheless, it is still not good enough for the prosodic break prediction. The prediction results degrade the naturalness of synthesis speech, thus improving the prosodic break prediction will be the future work.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	語音合成	zh_TW
dc.subject	韻律模型	zh_TW
dc.subject	synthesis	en_US
dc.subject	prosody model	en_US
dc.title	以階層式韻律模型為基礎之中文半隱藏式馬可夫模型語音合成器	zh_TW
dc.title	A HSMM-based Mandarin Speech Synthesizer Based on Hierarchical Prosody Model	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis

Files in This Item:

350601.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.