使用韻律訊息於建立聲學模型之中文語音辨認

標題:	使用韻律訊息於建立聲學模型之中文語音辨認 Incorporating Prosody Information in Acoustic Modeling for Mandarin Speech Recognition
作者:	邱子軒陳信宏電信工程研究所
關鍵字:	語音辨認;聲學模型;韻律;Speech Recognition;Acoustic Model;Prosody
公開日期:	2012
摘要:	本研究探討如何使用韻律訊息於聲學模型(acoustic model, AM)之建立，用於中文語音辨認。本研究在訓練聲學模型時，將傳統前後文相關(context dependent) 的tri-phone HMM拓展至在音節邊界時，同時考慮韻律停頓(prosodic break)的影響。其中韻律停頓分為四種強度，用以表示音節間不同的緊密接合程度，並採用分類回歸決策樹(Classification and Regression Trees, CART)建立一個與前後文及韻律停頓相關的聲學模型。在辨認時分為兩個階段，在第一階段只利用聲學模型進行音節的辨認產生音節圖(syllable lattice)，且含有韻律停頓的資訊。在第二階段，針對音節圖配合詞典並輔以韻律停頓的資訊進行構詞，將其轉為詞圖(word lattice)，最後再結合語言模型(language model, LM)重新計分(rescoring)，實現詞的辨認。使用TCC300語料庫之實驗結果顯示本方法較傳統之tri-phone HMM有較好的辨認率。 The thesis presents a study on introducing prosody information to acoustic modeling for Mandarin speech recognition. Its idea is to extend the conventional context-dependent (CD) tri-phone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and tightly-coupled non-break. In the training phase, prosody- and phonetic-dependent phone models are constructed by using Classification and Regression Trees (CART) Algorithm. In the test phase, a two-stage recognition approach is adopted. In the first stage, we use the acoustic models to generate a syllable lattice which contains prosodic break information. In the second stage, we first construct a word lattice from the syllable lattice by constructing all possible words using a lexicon with the help of prosodic information, and then find the best output word sequence by rescoring using a trigram language model. Experimental results on the TCC300 database showed that the proposed method slightly outperformed the conventional method using tri-phone acoustic models.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079913501 http://hdl.handle.net/11536/49287
Appears in Collections:	Thesis

Files in This Item:

350101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.