以加權有限狀態轉換器實現中文連續語音辨認

標題:	以加權有限狀態轉換器實現中文連續語音辨認 Large Vocabulary Continuous Mandarin Speech Recognition Using Weighted Finite-State Transducer
作者:	許昱超 Hsu, Yu-Chao 陳信宏 Chen, Sin-Horng 電信工程研究所
關鍵字:	有限狀態機;大詞彙辨識;階層式語言模型;WFST;LVCSR;Hierarchical Language Model
公開日期:	2011
摘要:	本論文主要探討如何使用加權有限狀態轉換器來建構中文大詞彙連續語音辨認系統。首先介紹加權有限狀態轉換器的相關演算法，以及不同層級之語音模型如何以有限狀態機圖形來表示。接著加入階層式語言模型的概念，使用NER標記來訓練人名模型以解決OOV words中屬於人名的問題，並提出一階段式與兩階段式的架構來進行辨識的方法。在一階段式的辨認中，我們實現一個即時展開有限狀態機圖形的演算法，使得在一階段式架構下也能使用較複雜的階層式模型來提升辨識效能。實驗使用TCC-300的朗讀式語音進行辨識，在加入階層式的語言模型後，一階段辨識對於詞錯誤率為26.26%，而採用兩階段式重計分之辨認則最多可以將錯誤率降低至23.73%。 This thesis presents an ASR system based on Weighted Finite-State Transducer(WFST). In the first we will introduce some algorithms that used to construct WFST graph, and how we express different models using WFST format. Then, we described a hierarchical language model training by NER labels to deal with the problem of chinese person name, which often detected as OOV words. We incorporating the hierachical langugage model into one-stage and two-stage ASR system. In the one-stage ASR system, an on-the-fly replace algoritm was implemented to reduce the memory’s allocation, so we can use a complex hierachical model to calculate the probability of chinese person name. We evaluate our approach on the TCC-300 corpus, which consists of long paragraphic utterances, obtained a 0.38% absolute improvement in word error rate in one-stage ASR system; and at most 2.91% when using two-stage ASR system.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079813543 http://hdl.handle.net/11536/47028
Appears in Collections:	Thesis

Files in This Item:

354302.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.