標題: 使用有限狀態轉換器之漢語語音辨認系統
A Mandarin Speech Recognition System Using Weighted Finite-State Transducer
作者: 林昂星
Lin, Ang-Hsing
王逸如
Wang, Yih-Ru
電信工程研究所
關鍵字: 有限狀態機;中文語音辨認系統;階層式語言模型;WFST
公開日期: 2011
摘要: 本論文主要探討如何使用加權有限狀態轉換器來建構漢語語音辨認系統。首先介紹加權有限狀態轉換器的相關演算法,以及不同層級的語音模型如何以有限狀態機圖形來表示,並整合成漢語語音辨認系統。從語音辨認實驗結果中提出詞辨認錯誤約55%因OOV Words引起,經統計,一個OOV詞平均造成2.4個詞辨認錯誤。為了降低OOV words在漢語語音辨認系統中所造成詞辨認錯誤的影響,經統計結果顯示OOV words中,人名約佔了30%,其中三字中文人名(姓氏+名字)約佔了23%,故我們引入階層式語言模型的概念,訓練人名模型來輔助降低詞錯誤率。 測試語料採用包含朗讀式長句之TCC300語料庫實驗。使用HTK兩階段辨識,詞錯誤率為13.76%,使用加權有限狀態機RT為13可達到相同的錯誤率,辨認速度比傳統HTK辨認快約15倍。另一方面,在語言模型層建構出OOVs人名模型置入語音辨識系統,並有效地降低詞錯誤率約0.12%。
This study focuses on how to use a Weighted Finite-State Transducer (WFST) to construct a Mandarin Speech Recognition System (MSRS). It first introduces algorithms for WFST, as well as different levels of speech model to represent the Finite-State Machine (FSM) graph, and integrates into the MSRS. The experimental results indentify the Word Error Rate (WER) at about 55% is related to the appearance of OOV words, and statistics shows that one OOV word results in 2.4 words error averagely in MSRS. According to the statistical results, it shows that the names OOV words accounts for about 30%, and in which three words Chinese names accounts for about 23%, In order to reduce the negative impact of the OOV words results in the MSRS, we introduce a hierarchical language model, training name model to assist lower WER. The test corpus uses for the read-type long sentences TCC300 corpus. The 13.76% WER is obtained by using HTK two-stage recognition, while use of WFST RT=13 can achieve the same WER, the recognition speed is about 15 times faster than the traditional HTK. Besides, we construct OOVs names model in the language model layer and placed in the MSRS, this effectively reduces the WER at about 0.12%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079913551
http://hdl.handle.net/11536/49330
顯示於類別:畢業論文


文件中的檔案:

  1. 355102.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。