標題: 用於中文語音辨認的圖形搜尋法之研究
The Application of Graph Search Algorithm in Mandarin Speech Recognition
作者: 王志明
Chi-Ming Wang
陳信宏
Dr. Sin-Horng Chen
電信工程研究所
關鍵字: 中文語音辨認;圖形搜尋法;圖形音節組;詞類雙連文模型;音碼雙連文模型;Mandarin Speech Recognition;Graph Search Algorithm;syllable bigram model;syllable graph;POS bigram model;lexicon-tree
公開日期: 1998
摘要: 本論文的研究重點在於中文連續語音辨認系統中的搜尋演算法。我們從實作系統的觀點,把語音辨認分成兩個步驟:聲學辨認和語言解碼。在聲學辨認系統方面,我們採用傳統的HMM模型,並且加入音碼雙連文模型(Syllable Bigram)的資訊,再設計了一個圖形搜尋法(Graph Search),將所有可能的音碼組合找出來,得到音節辨認率80.8%,包含率91%以上的圖形音節組合(Syllable Graph)。而在語言解碼方面,我們將十一萬詞的詞庫編成一個詞典樹,並加入詞的單連文模型(Word Unigram)和詞類雙連文模型(POS Bigram),並且設計了一個維特比搜尋法。在輸入聲學辨認得到的音節組合後,得到了字的辨認率為64%的結果,而且也證實比用Lattice方法的辨認率高了2%。
In this thesis, a graph search algorithm applied to the acoustic decoding of large-vocabulary, continuous Mandarin speech recognition is discussed. It uses sub-syllable HMM models and a syllable bigram model to calculate syllable-pair likelihood scores and construct a syllable graph for each testing utterance. The syllable graph is further used in the lexical decoding to be combined with a language model for speech-to-text conversion. The language model is formed by aggregating a word unigram model and a part-of-speech (POS)bigram model. Besides, a lexicon-tree containing about 110,000 words is used. Effectiveness of the proposed method was examined by simulation on a speaker-dependent speech-to-text task using a database provided by Chunghwa Telecommunication Laboratories (TL). A syllable recognition rate of 80.84% was obtained by the syllable graph search algorithm. The syllable inclusion rate was 91% for top-300 syllable sequences. The character accuracy rate was 64% which was 2% higher than the lattice-search method for the same testing data. Experimental results confirmed that the proposed syllable graph search algorithm outperformed the conventional lattice search algorithm on both the recognition performance and the computational complexity.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870435050
http://hdl.handle.net/11536/64509
顯示於類別:畢業論文