完整後設資料紀錄
DC 欄位語言
dc.contributor.author呂宜玲en_US
dc.contributor.authorYi-Ling Luen_US
dc.contributor.author傅心家en_US
dc.contributor.authorProf. Hsin-Chia Fuen_US
dc.date.accessioned2014-12-12T02:38:08Z-
dc.date.available2014-12-12T02:38:08Z-
dc.date.issued2004en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009217549en_US
dc.identifier.urihttp://hdl.handle.net/11536/73491-
dc.description.abstract『資料稀疏』為語音辨識中語言模型極需克服之問題,目前常用的『凱氏平滑法』(Katz Smoothing)及『聶氏平滑法(Kneser-Ney Smoothing)』(包含『聶氏後退法(Kneser-Ney Backoff Smoothing)』及『聶氏內插法(Knerser-Ney Interpolation Smoothing)』),其應用於雙連馬可夫模型時,對於欲估計的未知雙連詞串詞尾沒有出現於訓練語料的情況,並無適當的機率評估方法。針對此點,我們提出『強化凱氏平滑法』與『強化聶氏平滑法』加以改進,我們由詞尾曾出現於訓練語料但整個雙連詞串並無出現於訓練語料的未知雙連詞串做平滑化後的機率,再進一步扣除小部分機率值,將此被折扣的機率量分配給詞尾未出現於訓練語料的未知雙連詞串,並以混淆度(perplexity)為效能的評量標準。我們由華視網站收集一年的新聞語料,每月取180則新聞為測試語料,其餘新聞為訓練語料;由實驗結果可知『強化凱氏平滑法』比原來『凱氏平滑法』平均低了6.65個混淆度單位;『強化聶氏平滑法』比原來『聶氏平滑法』中效能較佳的『聶氏後退法』平均低了4.50個混淆度單位。此外,也將建構的雙連馬可夫模型及實驗結果中效能最佳的『強化聶氏平滑法』,應用於中文語音辨識系統的語言模型部分,經實際測試,系統正確率可達88.62%,精確率可達85.52%。zh_TW
dc.description.abstractIn this thesis, we propose smoothing methods to solve the “data sparseness” problems of the language model to improve the efficiency of speech recognition. “Katz Smoothing” and “Kneser-Ney Smoothing” which includes “Kneser-Ney Backoff Smoothing” and “Kneser-Ney Interpolation Smoothing” are the most popular smoothing methods. However, these methods for bigram models don’t consider about the unseen bigrams with the last phrase which does not occur in training data. So we proposed the improving methods to discount small amount of probability from smoothed bigrams which does not occur in training data but the last phrase of them occurs in training data. And then, we distribute the discounted mass of probability to bigrams which does not occur in training data with the last phrase does not occur in training data, either. We use perplexity to measure the efficiency of our language model. We collect the experiment corpus from daily news on Chinese TV System (CTS) website. We take 180 news from 12 months to be testing data, others to be training data. From “Enhanced Katz Smoothing”, we obtain a perplexity which is 6.65 lower than for the “Katz Smoothing”. And from “Enhanced Kneser-Ney Smoothing”, we also obtain a perplexity which is 4.50 lower than for the “Kneser-Ney Backoff Smoothing”. Besides, we implement the bigram Markov language models and “Enhanced Kneser-Ney Smoothing” which performs best in our experiment to a Mandarin speech recognition system. The correct rate of system is 88.62%, and the accuracy of the system is 85.52%.en_US
dc.language.isozh_TWen_US
dc.subject語音辨識zh_TW
dc.subject語言模型zh_TW
dc.subject平滑化方法zh_TW
dc.subjectSpeech Recognitionen_US
dc.subjectLanguage Modelen_US
dc.subjectSmoothing Methoden_US
dc.title中文語音辨識中語言模型的強化之研究zh_TW
dc.titleThe Study of Language Model Enhancement for Mandarin Speech Recognitionen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 754901.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。