標題: 使用階層式韻律模型於豐富中文語音辨認
Enriching Mandarin Speech Recognition by Incorporating a Hierarchical Prosody Model
作者: 張皓翔
Chang, Hao-Hsiang
陳信宏
Chen, Sin-Horng
電信工程研究所
關鍵字: 語音辨認;韻律;鑑別式模型;speech recognition;prosody;discriminative model combination
公開日期: 2010
摘要: 人類平常利用語音交換資訊時,語者的聲調高低,抑揚頓挫,這些表現通稱為韻律現象,本研究提出一個新的語音辨認方法,試圖整合這些韻律資訊於語音辨認上,藉由建立階層式韻律模型,利用聲學及語言參數來幫助預估韻律邊界停頓,並且能夠於辨認結果上標記出語言參數及韻律邊界標記,如此能夠幫助我們更容易閱讀辨識結果。實驗的架構採取兩階段式的方法,重新計分時需將每個參與解碼的模型給予適當的權重,挑選出辨識率最高的詞串,本研究利用DMC的方法調整,藉以找出最佳的權重分布,實驗語料庫為TCC300,最後實驗結果顯示加入了韻律模型的詞彙辨認率與基本系統相比提升了1.67%。
This thesis presents a probabilistic model for incorporating hierarchical prosody in the speech recognition task, for improving word recognition directly and for enriching speech recognition output. The model includes higher level linguistic cues (syllable, word, punctuation mark, and part of speech), intermediate prosodic break representation, and prosodic-acoustic feature correlated with break type and linguistic cues. Moreover, our speech recognition system produces not only word sequences but also prosodic label and linguistic information in order to enrich speech recognition output for downstream natural language processing module. We adopted a two-stage rescoring framework to implement our approach, and discriminative model combination method is used for rescoring. We evaluate our approach on TCC300 corpus, and results show that the performance of prosodic model is better than the baseline system. We obtain a 1.67% absolute improvement in word error rate over the baseline system on a read speech task.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079713570
http://hdl.handle.net/11536/44587
Appears in Collections:Thesis


Files in This Item:

  1. 357001.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.