標題: 利用連續型隱藏式馬可夫模型建立的小字庫語者調適型辨識系統
A Small-Vocabulary Speaker-Adaptive Speech Recognizer Based on Continuous-Density HMM
作者: 蔡明況
Tsai Ming-Kuang
劉啟民
Liu Chi-Min
資訊科學與工程研究所
關鍵字: 語者調適;隱藏式馬可夫模型;貝氏調適;參數共享;Speaker-Adaptive;Hidden Markov Model;Bayesian Adaptation; Parameter Sharing
公開日期: 1993
摘要:   特定語者(Speaker-dependent)型語音辨識系統雖有高辨識率的優點 ,但當它要應用到新語者時須花許多語音訓練資料和時間;而不限語者( Speaker-independent)或多語者(multi-speaker)型的語音辨識系統除最 初建立系統時所需語音資料外,應用於新語者時有不再需新語音訓練資料 的好處,但它的辨識率普遍不高。語者調適(Speaker-adaptive)辨識系統 則利用一充分訓練過的參考系統已知資訊,藉新語者少量語音資料訓練, 可達到接近特定語者系統的辨識率。本論文以一小字庫來對語者調適技術 作研究,所採用獨立音(isolated)小字庫包含10個國語數字'0'-'9'及26 個英語字母'A'-'Z',其中國語'1'和英語'E'同音,視為一音,故共有35 音。技術上則利用連續型隱藏式馬可夫模型(Continuous Density Hidden Markov Model;CDHMM)分別建立辨識率93.9%的特定語者和辨識 率83.2%的多語者辨識系統。由於以連續型HMM建立的不限語者或多語者辨 識系統,貝氏調適(Bayesian adaptation)技術已有不少成功的應用例子 ,因此我們以一多語者系統為基礎參考系統,將貝氏調適的作法結合在k 路分割( segmental k-means)訓練演算法中,運用在語者調適系統的訓練 上。實驗結果顯示,當僅用1組音來調適時,辨識率比多語者系統好,當 用3組音做調適時,已可達到接近特定語者系統的辨識率。此外,本論文 還針對E集合的易混淆音做特別處理,利用參數共享的方式,降低辨識錯 誤率。 A speaker-dependent speech recognition system allows high performance, but it needs a lot of speaker-specific training utterances. A speaker-independent (or multi-speaker) system needs not training data from the new speaker, excluding from the training data for creating system, but it cannot achieve high performance usually. A speaker-adaptive system makes use of existing knowledge, contained in a reliable trained reference system, so that a small amount of training data is sufficient to reach performance of the speaker-dependent system. In this paper , we show the advantages of speaker- adaptive techniques by testing on a 35-word small vocabulary in isolated word mode, including 10 Mandarin digits and 26 letters of the English alphabet (the Mandarin digit '1' speaks the same as the English alphabet 'E'). We create three speaker-dependent systems with average recognition rate 93.9% and a speaker- independent system with recognition rate 83.2%, based on the continuous density hidden Markov model (CDHMM). For speaker- independent (or multi-speaker) recognizers using CDHMM, a Bayesian adaptation technique has been used with good success. In this paper, we integrate a Bayesian training algorithm into the segmental k-means training procedure for speaker adaptation. The experimental results indicate that the speaker adaptation procedure achieves better performance than that of the multi-speaker system, when only one training token from each word is used. When 3 training tokens from each word is used, the recognition rate reaches that of the speaker- dependent system. Furthermore, we try to solve the confusing set problem using the concept of parameter sharing.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT820392067
http://hdl.handle.net/11536/57875
顯示於類別:畢業論文