標題: 使用韻律訊息於建立聲學模型之中文語音辨認
Incorporating Prosody Information in Acoustic Modeling for Mandarin Speech Recognition
作者: 邱子軒
陳信宏
電信工程研究所
關鍵字: 語音辨認;聲學模型;韻律;Speech Recognition;Acoustic Model;Prosody
公開日期: 2012
摘要: 本研究探討如何使用韻律訊息於聲學模型(acoustic model, AM)之建立,用於中文語音辨認。本研究在訓練聲學模型時,將傳統前後文相關(context dependent) 的tri-phone HMM拓展至在音節邊界時,同時考慮韻律停頓(prosodic break)的影響。其中韻律停頓分為四種強度,用以表示音節間不同的緊密接合程度,並採用分類回歸決策樹(Classification and Regression Trees, CART)建立一個與前後文及韻律停頓相關的聲學模型。在辨認時分為兩個階段,在第一階段只利用聲學模型進行音節的辨認產生音節圖(syllable lattice),且含有韻律停頓的資訊。在第二階段,針對音節圖配合詞典並輔以韻律停頓的資訊進行構詞,將其轉為詞圖(word lattice),最後再結合語言模型(language model, LM)重新計分(rescoring),實現詞的辨認。使用TCC300語料庫之實驗結果顯示本方法較傳統之tri-phone HMM有較好的辨認率。
The thesis presents a study on introducing prosody information to acoustic modeling for Mandarin speech recognition. Its idea is to extend the conventional context-dependent (CD) tri-phone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and tightly-coupled non-break. In the training phase, prosody- and phonetic-dependent phone models are constructed by using Classification and Regression Trees (CART) Algorithm. In the test phase, a two-stage recognition approach is adopted. In the first stage, we use the acoustic models to generate a syllable lattice which contains prosodic break information. In the second stage, we first construct a word lattice from the syllable lattice by constructing all possible words using a lexicon with the help of prosodic information, and then find the best output word sequence by rescoring using a trigram language model. Experimental results on the TCC300 database showed that the proposed method slightly outperformed the conventional method using tri-phone acoustic models.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079913501
http://hdl.handle.net/11536/49287
Appears in Collections:Thesis


Files in This Item:

  1. 350101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.