Title: 利用半連續型隱藏式馬可夫模型建立的語者調適之中文語音辨識系統
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
Authors: 陳健宏
Chien-Hung Chen
劉啟民
Dr. Chi-Min Liu
資訊科學與工程研究所
Keywords: 語者調適;貝氏調適;半連續型隱藏式馬可夫模型;碼書;混合比重;speaker-adaptive;Bayesian adaptation; SCHMM;codebook;mixture weights
Issue Date: 1994
Abstract: 特定語者(Speaker-dependent)型語音辨識系統雖有高辨識率的優點,但
當它要應用到新語者時須花許多語音訓練資料和時間;而不限語者
(Speaker-independent)或多語者(multi-speaker)型的語音辨識系統除最
初建立系統時所需語音資料外,應用於新語者時有不再需新語音訓練資料
的好處,但它的辨識率普遍不高。語者調適(Speaker- adaptive) 辨識系
統則利用一充分訓練過的參考系統已知資訊,藉新語者少量語音資料訓練
,可達到接近特定語者系統的辨識率。本論文採用了中文語音裡的76個音
,來對語者調適技術作研究,其中包含了四個韻母和19個聲母(4x19=76
)的混淆音。技術上則利用半連續型隱藏式馬可夫模型(Semi-
Continuous Density Hidden Markov Model; SCHMM)分別建立辨識
率90.46%的特定語者和辨識率58.97%的不限語者辨識系統。由於以HMM建
立的不限語者或多語者辨識系統,貝氏調適 (Bayesian adaptation)技術
已有不少成功的應用例子,因此我們以一不限語者系統為基礎參考系統,
採用了正算反算的貝氏調適演算法中,運用在語者調適系統的訓練上;在
SCHMM模型中,我們用之於碼書的調適、混合比重的調適、以及轉移參數
的調適。最後的實驗結果顯示,當僅用1組音來調適時,辨識率比不限語
者系統好,由58.97%提昇到 76.65%。當用3組音做調適時,已可達到接近
特定語者系統的辨識率,用6組音做調適時,更可超越特定語者系統的辨
識率。
A speaker-dependent speech recognition system performs high
recognition rate, but it needs a lot of speaker-specific
training data. A speaker-independent (or multi-speaker) system
needs no training data from speakers, and it cannot get satis-
-factory performance usually. A speaker-adaptive system uses
the existing knowledge from a reliably trained reference
system, so that a small amount of new speaker's training data
is suffi- cient to reach the performance of speaker-dependent
system. In this thesis, we consider the applying of
speaker adaptation techniques in Mandarin speech. The
vocabulary we study has 76 syllables, which include 19
INITIALs and 4 FINALs from the confusing sets in Mandarin
syllables. For the reference systems in speaker adaptation, we
create speaker-dependent and speaker- independent systems based
on the semi-continuous density hidden Markov model (SCHMM). The
speaker-dependent system has an aver- -age recognition rate
90.46% and the speaker-independent system 58.97%. On the basis
of the two reference systems, we study the Bayesian
adaptation techniques with the forward-backward training
procedure. We apply the adaptation techniques to adjust
codebooks, mixture weights, and transition
probabilities in SCHMM. Experiment results show that the
adaptation procedure achieves better performance than that of
the speaker-independent system with only one training token, it
raises recognition rate from 58.97% to 76.65 %. When 3
training tokens are used, the recognition rate approximates
that of the speaker-dependent system. When using 6
training tokens, the recognition rate achieves better than
that of the speaker-dependent system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830392064
http://hdl.handle.net/11536/58988
Appears in Collections:Thesis