漢語語音合成系統之字轉音與聲音頻譜改進

標題:	漢語語音合成系統之字轉音與聲音頻譜改進 An Improvement on G2P and Spectrum Synthesis for Mandarin TTS
作者:	何佑家陳信宏 Ho,Yo-Chia Chen,Sin-Horng 電信工程研究所
關鍵字:	G2P;HMM模型;G2P;HMM model
公開日期:	2017
摘要:	語音合成的品質隨著各式各樣合成方法的發明越來越好聽，兩個原因會直接影響聽者的感受。一個是這個字該唸什麼音而沒唸什麼音，另一個則是每一個音的音質。我們從這兩部分下手，前者所說的就是破音字，如果在這段文字裡有破音字，而合成出來的語音沒有唸出該正確的音，會讓聽者覺得突兀。有些破音字可以由POS決定讀音，我們找出這些可以由POS決定讀音的字加進G2P裡，並計算其正確率。音質的部分我們從頻譜來改進合成聲音品質，重新檢討HMM模型分類的問題集與使用不同狀態數的HMM來訓練模型，實驗結果顯示原有TTS系統正確讀出破音字的能力及合成聲音品質均有顯著改善，證實我們所提出方法的有效性。 Quality of speech synthesis has been improved with the various synthesis methods being invented. There are two fundamental factors straight influencing listener's feeling. One is character being pronounced incorrectly, while the other one is quality of each sound. In this study, we put emphasis on the two factors to optimize pronunciation. The former refers to homographs. It brings about abrupt situation if pronunciation of this kind character synthesized without the voice it supposed to be in the text. However, pronunciations of some homographs can be determined by POS. We put the homographs, of which pronunciations are determined by POS, into G2P, and then calculated accuracy of the homographs. As for the latter, the synthesized sound quality is improved from spectrum. We reconsidered the question set which was used for classifying the HMM model, and used HMM with different number of states to train the model. The results show that the ability of original TTS system in generating correct pronunciation of homographs and in synthesizing sounds with high quality have significantly improved, demonstrating the effectiveness and application of the proposed method.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070260236 http://hdl.handle.net/11536/140332
Appears in Collections:	Thesis