標題: | 國語連續音節辨認系統之電話通道語者效應偏移量移除與分析 Speaker-Based Telephone Channel Bias Remove Analysis with Continuous Mandarin Speech Recognition Method |
作者: | 葉人鳳 Jen-Feng Yeh 陳信宏 Dr. Sin-Horng Chen 電信工程研究所 |
關鍵字: | 通道語者效應偏移量;右相關聲母隱藏式馬可夫模型;前後文相關隱藏式馬可夫模型;以HMM為基礎的語者偏移量移除;channel/speaker bias;Right-Context Dependent initial HMM;Context Dependent HMM;HMM-based Bias Removing |
公開日期: | 2002 |
摘要: | 本篇論文中,主要針對training phase的HMM模型進行改進,我們希望對語音信號有較好的modeling,以估計channel/speaker bias,將其去除後得到較佳的正規化特徵參數。在已知語料庫之語音信號的HMM模型切割資訊條件下,我們以HMM-based Bias Removing的方法對特徵參數做初步的調適,由觀察得知,特徵參數在去除偏移量後明顯使F-ratio提升,而且訓練出HMM狀態模型的之分佈較移除信號偏移量前更為緊密。實驗中以MAT4500語料庫9:1的比例為訓練及測試語料做外部測試(outside testing),基本系統語音模型的高斯混合數(mixture)以50個音框取ㄧ個混合數,最大混合數不超過32個,靜音模型的混合數取64個,訓練右相關聲母隱藏式馬可夫模型(Right-Context Dependent initial HMM)及前後文相關隱藏式馬可夫模型(Context Dependent HMM),以HMM-based Bias Removing的方法取代SBR的作法消除信號偏移量,Signal Model 1平移轉換特徵參數正規化(i.e. 線性轉換)在intra-syllable RCD-HMM系統下的辨識率為61.23%,較基本系統(SBR)的音節辨識率約略高出1%,Signal Model 2 的affine轉換特徵參數正規化(i.e. 線性轉換)在intra-syllable RCD-HMM系統的辨識率由60.17%提升至65.71%,CD-HMM系統的辨識率由62.56 %提升至67.96 %。此外,更進一步考慮將語音信號的聲母、韻母、靜音(silence)部分區分開,分別求取此三類型資料的轉換矩陣 及向量 ,分類將特徵參數正規化,在Signal Model 2 轉換下,intra-syllable RCD-HMM系統辨識率由60.17%提升至73.31%,CD-HMM系統的辨識率由62.56 %提升至76.56 %。因此在已知語音信號的HMM模型切割資訊條件下,HMM-based Bias Removing的方法將使得辨認系統效能提升。 In this thesis, the methods of improving the robustness and accuracy of features using speaker-based feature normalization are described. A continuous mixture-Gaussian hidden Markov model (HMM)-based Mandarin speech recognition system using MAT4500 database is constructed. In HMM training procedure, bias estimated from HMM segment is addressed to compact each HMM model. Besides, a series of studies between SBR and HMM biases are perused. The recognition rate of applying speaker-based feature normalized in the 100 RCD initials and 40 CI finals-HMM system is 65.71% on condition that HMM segment is addressed. This experimental result is higher than typical SBR method, 60.17%. By the way, the accuracy rate was raised from 62.56% to 67.96% in the Context Dependent HMM system. Further more; features are classified according to initial/final and silence. An accuracy rate of 73.31% was achieved in the intra-syllable RCD-HMM system and 76.56% in the CD-HMM system. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910435062 http://hdl.handle.net/11536/70595 |
Appears in Collections: | Thesis |