標題: 中文語音辨識中語言模型的強化之研究
The Study of Language Model Enhancement for Mandarin Speech Recognition
作者: 呂宜玲
Yi-Ling Lu
傅心家
Prof. Hsin-Chia Fu
資訊科學與工程研究所
關鍵字: 語音辨識;語言模型;平滑化方法;Speech Recognition;Language Model;Smoothing Method
公開日期: 2004
摘要: 『資料稀疏』為語音辨識中語言模型極需克服之問題,目前常用的『凱氏平滑法』(Katz Smoothing)及『聶氏平滑法(Kneser-Ney Smoothing)』(包含『聶氏後退法(Kneser-Ney Backoff Smoothing)』及『聶氏內插法(Knerser-Ney Interpolation Smoothing)』),其應用於雙連馬可夫模型時,對於欲估計的未知雙連詞串詞尾沒有出現於訓練語料的情況,並無適當的機率評估方法。針對此點,我們提出『強化凱氏平滑法』與『強化聶氏平滑法』加以改進,我們由詞尾曾出現於訓練語料但整個雙連詞串並無出現於訓練語料的未知雙連詞串做平滑化後的機率,再進一步扣除小部分機率值,將此被折扣的機率量分配給詞尾未出現於訓練語料的未知雙連詞串,並以混淆度(perplexity)為效能的評量標準。我們由華視網站收集一年的新聞語料,每月取180則新聞為測試語料,其餘新聞為訓練語料;由實驗結果可知『強化凱氏平滑法』比原來『凱氏平滑法』平均低了6.65個混淆度單位;『強化聶氏平滑法』比原來『聶氏平滑法』中效能較佳的『聶氏後退法』平均低了4.50個混淆度單位。此外,也將建構的雙連馬可夫模型及實驗結果中效能最佳的『強化聶氏平滑法』,應用於中文語音辨識系統的語言模型部分,經實際測試,系統正確率可達88.62%,精確率可達85.52%。
In this thesis, we propose smoothing methods to solve the “data sparseness” problems of the language model to improve the efficiency of speech recognition. “Katz Smoothing” and “Kneser-Ney Smoothing” which includes “Kneser-Ney Backoff Smoothing” and “Kneser-Ney Interpolation Smoothing” are the most popular smoothing methods. However, these methods for bigram models don’t consider about the unseen bigrams with the last phrase which does not occur in training data. So we proposed the improving methods to discount small amount of probability from smoothed bigrams which does not occur in training data but the last phrase of them occurs in training data. And then, we distribute the discounted mass of probability to bigrams which does not occur in training data with the last phrase does not occur in training data, either. We use perplexity to measure the efficiency of our language model. We collect the experiment corpus from daily news on Chinese TV System (CTS) website. We take 180 news from 12 months to be testing data, others to be training data. From “Enhanced Katz Smoothing”, we obtain a perplexity which is 6.65 lower than for the “Katz Smoothing”. And from “Enhanced Kneser-Ney Smoothing”, we also obtain a perplexity which is 4.50 lower than for the “Kneser-Ney Backoff Smoothing”. Besides, we implement the bigram Markov language models and “Enhanced Kneser-Ney Smoothing” which performs best in our experiment to a Mandarin speech recognition system. The correct rate of system is 88.62%, and the accuracy of the system is 85.52%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009217549
http://hdl.handle.net/11536/73491
Appears in Collections:Thesis


Files in This Item:

  1. 754901.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.