標題: 利用半連續型隱藏式馬可夫模型建立的語者調適之中文語音辨識系統
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
作者: 陳健宏
Chien-Hung Chen
劉啟民
Dr. Chi-Min Liu
資訊科學與工程研究所
關鍵字: 語者調適;貝氏調適;半連續型隱藏式馬可夫模型;碼書;混合比重;speaker-adaptive;Bayesian adaptation; SCHMM;codebook;mixture weights
公開日期: 1994
摘要: 特定語者(Speaker-dependent)型語音辨識系統雖有高辨識率的優點,但 當它要應用到新語者時須花許多語音訓練資料和時間;而不限語者 (Speaker-independent)或多語者(multi-speaker)型的語音辨識系統除最 初建立系統時所需語音資料外,應用於新語者時有不再需新語音訓練資料 的好處,但它的辨識率普遍不高。語者調適(Speaker- adaptive) 辨識系 統則利用一充分訓練過的參考系統已知資訊,藉新語者少量語音資料訓練 ,可達到接近特定語者系統的辨識率。本論文採用了中文語音裡的76個音 ,來對語者調適技術作研究,其中包含了四個韻母和19個聲母(4x19=76 )的混淆音。技術上則利用半連續型隱藏式馬可夫模型(Semi- Continuous Density Hidden Markov Model; SCHMM)分別建立辨識 率90.46%的特定語者和辨識率58.97%的不限語者辨識系統。由於以HMM建 立的不限語者或多語者辨識系統,貝氏調適 (Bayesian adaptation)技術 已有不少成功的應用例子,因此我們以一不限語者系統為基礎參考系統, 採用了正算反算的貝氏調適演算法中,運用在語者調適系統的訓練上;在 SCHMM模型中,我們用之於碼書的調適、混合比重的調適、以及轉移參數 的調適。最後的實驗結果顯示,當僅用1組音來調適時,辨識率比不限語 者系統好,由58.97%提昇到 76.65%。當用3組音做調適時,已可達到接近 特定語者系統的辨識率,用6組音做調適時,更可超越特定語者系統的辨 識率。 A speaker-dependent speech recognition system performs high recognition rate, but it needs a lot of speaker-specific training data. A speaker-independent (or multi-speaker) system needs no training data from speakers, and it cannot get satis- -factory performance usually. A speaker-adaptive system uses the existing knowledge from a reliably trained reference system, so that a small amount of new speaker's training data is suffi- cient to reach the performance of speaker-dependent system. In this thesis, we consider the applying of speaker adaptation techniques in Mandarin speech. The vocabulary we study has 76 syllables, which include 19 INITIALs and 4 FINALs from the confusing sets in Mandarin syllables. For the reference systems in speaker adaptation, we create speaker-dependent and speaker- independent systems based on the semi-continuous density hidden Markov model (SCHMM). The speaker-dependent system has an aver- -age recognition rate 90.46% and the speaker-independent system 58.97%. On the basis of the two reference systems, we study the Bayesian adaptation techniques with the forward-backward training procedure. We apply the adaptation techniques to adjust codebooks, mixture weights, and transition probabilities in SCHMM. Experiment results show that the adaptation procedure achieves better performance than that of the speaker-independent system with only one training token, it raises recognition rate from 58.97% to 76.65 %. When 3 training tokens are used, the recognition rate approximates that of the speaker-dependent system. When using 6 training tokens, the recognition rate achieves better than that of the speaker-dependent system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830392064
http://hdl.handle.net/11536/58988
顯示於類別:畢業論文