標題: 高斯混合模型在語者辨識與國語語音辨認之應用
The Applications of GMM in Speaker Identification and Mandarin-Speech Recognition
作者: 范世明
Shih-Ming Fan
王逸如
Yih-Ru Wang
電信工程研究所
關鍵字: 高斯混合模型;語者辨識;語音辨認;Gaussian mixture model (GMM);speaker identification;speech recognition
公開日期: 2001
摘要: 在本論文中,我們將語者辨識模型以訓練一個包含二個狀態之馬可夫鏈做為GMM模型之上層架構,做為傳統語者模型之改進,並使用最小錯誤鑑別式訓練語者模型,以得到更精確的語者特徵參數分佈。由實驗結果可知,訓練語料的長短會影響模型之精確度,而對於以聲母-韻母模型為架構之語者辨識系統,在各高斯混合數較少或是測試秒數較短情形下,其語者辨識結果都比傳統模式有較佳之表現。最後,我們將GMM模型應用於語者正規化HMM模型之國語語音辨認,以簡化標準VTN模式之音節系統複雜度與計算量,並將測試語料以不同長度去估計最佳伸縮因子,得以加速預估時間。由結果可知,測試語料長短將會影響音節辨認率,雖辨認率稍低於標準VTN模式,但系統複雜度、計算量及辨認時間皆減少許多。
In this thesis, the applications of Gaussian mixture model (GMM) in both speaker identification and speech recognition were studied. For the speaker identification system, a conventional speaker identification system using GMM was implemented first. Then, a two-state Markov chain was added in the upper layer of GMM identifier to model the initial-final structure of Mandarin speech in order to improve the performance of system. Finally, the generalized probabilistic descent (GPD) was used to retrain the system according to the minimum classification error (MCE) criterion. By experiments, 10-20% reduction of recognition error rate was achieved for the proposed method. In the speech recognition system, the GMM was used to find the warping factor used in vocal tract normalization (VTN) method. According to the experiments, only few seconds of speech data was needed for estimating the warping factor. Although the recognition rate of the proposed system was slightly degenerated, the complexity of the proposed recognition system can be significant reduced.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900435056
http://hdl.handle.net/11536/68932
Appears in Collections:Thesis