Full metadata record
DC FieldValueLanguage
dc.contributor.author何鎮仲en_US
dc.contributor.authorChen-Chung Hoen_US
dc.contributor.author陳信宏en_US
dc.contributor.authorSin-Horng Chenen_US
dc.date.accessioned2014-12-12T02:23:30Z-
dc.date.available2014-12-12T02:23:30Z-
dc.date.issued1999en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT880435026en_US
dc.identifier.urihttp://hdl.handle.net/11536/65862-
dc.description.abstract本論文提出了一種將韻律參數的統計模型建構在類神經網路上的閩南(台語)話韻律合成,將合成的基本單元音節建立聲母、韻母、與下一字間之停頓音長、基頻軌跡及能量位準的統計模型。在RNN的訓練中,應用統計模型補償韻律參數並使用它們做為RNN韻律合成器使用。在合成時,首先產生RNN訓練的韻律參數並且使用統計模型將其做反補償,這個研究的優點可經由統計模型修正並去除一些影響RNN韻律合成的因素。在統計模型的建立上,我們對台語音節的聲母及韻母建立以高斯分佈的模型,考慮三個影響實際語音音長的要素,這包含了utterance的說話速度、音調及韻律變化。應用ML法則經由反覆疊代循序式的推導來估計模型的參數,而此三個伸縮係數來自於實際觀察的音長模型。最後,我們將合成結果,應用於一個單一文件界面的文字編輯器配合語音合成核心上。zh_TW
dc.description.abstractIn this thesis, a hybrid approach which incorporates statistical modeling of prosodic features into recurrent neural network (RNN)-based prosody synthesis for Min-Nan speech (Taiwanese) is proposed. It takes syllable as the basic synthesis unit and constructs statistical models for syllable initial duration, syllable final duration, inter-syllable pause duration, pitch contour of syllable, and energy level of syllable. In the training, it normalizes prosodic features by these statistical models and uses them to train an RNN prosody synthesizer. In synthesis, it first generates normalized prosodic features by the well-trained RNN and then produces output prosodic features by denormalization using these statistical models. The advantage of the approach can be justified as relieving the RNN prosody synthesizer of some affecting factors via incorporating with the statistical models. In this study both syllable initial and final durations are modeled as normal distribution but considering three major affecting factors resided in the observed durational information of real speech. They include utterance-level speaking speed, lexical tone, and prosody. An iterative procedure derived based on the ML criterion is employed to sequentially estimate all model parameters and the companding (compressing-expanding) factors of these three affecting factors from observed data.en_US
dc.language.isozh_TWen_US
dc.subject類神經網路韻律產生器zh_TW
dc.subject文字轉語音zh_TW
dc.subjectRNN prosody generatoren_US
dc.subjectPSOLAen_US
dc.subjectText-to-Speechen_US
dc.title混合統計與類神經網路之台語韻律合成zh_TW
dc.titleA hybrid statistical/RNN approach to prosody synthesis for Taiwanese TTSen_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
Appears in Collections:Thesis