中文語音韻律模式之研究

標題:	中文語音韻律模式之研究 A Study on The Prosodic Model of Mandarin Speech
作者:	蘇思漢 Khans Su 陳信宏 Sin-Horng Chen 電信工程研究所
關鍵字:	韻律模式，遞迴式類神經網路，韻律狀態;prosodic model, recurrent neural network, prosodic state
公開日期:	1993
摘要:	本論文研究中文語音韻律變化之模式，基本想法是認為語音和其所對應之文句是一體的兩面，蘊涵相同的韻律訊息，因此可將文句之語言參數和語音之聲學參數之對應關係，當作一控制韻律變化之發音模式，稱之為韻律模式。本論文即在探討建立韻律模式之方法，文中採用漸進學習方式，首先使用一遞迴式類神經網路學習聲學參數對應至簡單語言參數，其隱藏層即隱含韻律變化之狀態，而後使用另一個遞迴式類神經網路學習語言參數對應至此隱藏層之輸出，其輸出層亦隱含韻律變化之狀態，將此兩種韻律狀態分別向量量化，可得到兩個有限狀態器 (finite state machine, FSM)，實驗結果顯示這兩個FSM's和文法規則約略相符，顯示它為一可行之做法。我們並且將其應用至解析三聲變音，結果顯示第一個韻律模式，對三聲音調變音具有不錯的解析效果。 In this thesis, two prosodic models of Mandarin speech to simulate human's prosody pronunciation mechanism is studied. The first prosodic model describes the dynamics of the prosody information of Mandarin speech. It is constructed by using a hidden layer recurrent neural network (HLRNN). Some prosody information extarcted from a given utterance are taken as input features of the HLRNN. Output targets include some simple linguistic features of the text associated with the utterance. The hidden layer of the HLRNN learns to represent the dynamic states of the input prosody inforamtion. The second model tries to explore the hidden states embedded in a given piece of Mandarin text that control the generation of the prosody information of the corresponding utterance. A multi- rate recurrent neural network (MRNN) is used to construct the prosodic model. Input features of the MRNN include some linguistic features such as POS's and lengths of phrases, location of a syllable in a phrase, etc.. Output targets are the outputs of all hidden nodes of the first prosodic model. The output layer is trained to represent the hidden prosodic states of the input linguistic features. Validalities of these two prosodic models were examined by simulations using a database containing 655 sentential and paragraphic utterances and their texts. After properly training, we constructed two 8-state finite state machines (FSM) by vector- quantizing the hidden layer outputs and the output layer outputs of these two models, respectively. By closely examining some state sequences, we found that both of these two FSM's roughly conform to the syntax of Chinese language. We also confirmed that the first prosodic model can be used to determine whether a tone 3 should be changed into a tone 2 or not. Based on these experimental results, we therefore conclude that these two prosodic model functions are quite good.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT820436034 http://hdl.handle.net/11536/58163
Appears in Collections:	Thesis