國語連續音節辨認系統之電話通道語者效應偏移量移除與分析

標題:	國語連續音節辨認系統之電話通道語者效應偏移量移除與分析 Speaker-Based Telephone Channel Bias Remove Analysis with Continuous Mandarin Speech Recognition Method
作者:	葉人鳳 Jen-Feng Yeh 陳信宏 Dr. Sin-Horng Chen 電信工程研究所
關鍵字:	通道語者效應偏移量;右相關聲母隱藏式馬可夫模型;前後文相關隱藏式馬可夫模型;以HMM為基礎的語者偏移量移除;channel/speaker bias;Right-Context Dependent initial HMM;Context Dependent HMM;HMM-based Bias Removing
公開日期:	2002
摘要:	本篇論文中，主要針對training phase的HMM模型進行改進，我們希望對語音信號有較好的modeling，以估計channel/speaker bias，將其去除後得到較佳的正規化特徵參數。在已知語料庫之語音信號的HMM模型切割資訊條件下，我們以HMM-based Bias Removing的方法對特徵參數做初步的調適，由觀察得知，特徵參數在去除偏移量後明顯使F-ratio提升，而且訓練出HMM狀態模型的之分佈較移除信號偏移量前更為緊密。實驗中以MAT4500語料庫9:1的比例為訓練及測試語料做外部測試(outside testing)，基本系統語音模型的高斯混合數（mixture）以50個音框取ㄧ個混合數，最大混合數不超過32個，靜音模型的混合數取64個，訓練右相關聲母隱藏式馬可夫模型(Right-Context Dependent initial HMM)及前後文相關隱藏式馬可夫模型(Context Dependent HMM)，以HMM-based Bias Removing的方法取代SBR的作法消除信號偏移量，Signal Model 1平移轉換特徵參數正規化(i.e. 線性轉換)在intra-syllable RCD-HMM系統下的辨識率為61.23%，較基本系統(SBR)的音節辨識率約略高出1%，Signal Model 2 的affine轉換特徵參數正規化(i.e. 線性轉換)在intra-syllable RCD-HMM系統的辨識率由60.17%提升至65.71%，CD-HMM系統的辨識率由62.56 %提升至67.96 %。此外，更進一步考慮將語音信號的聲母、韻母、靜音(silence)部分區分開，分別求取此三類型資料的轉換矩陣及向量，分類將特徵參數正規化，在Signal Model 2 轉換下，intra-syllable RCD-HMM系統辨識率由60.17%提升至73.31%，CD-HMM系統的辨識率由62.56 %提升至76.56 %。因此在已知語音信號的HMM模型切割資訊條件下，HMM-based Bias Removing的方法將使得辨認系統效能提升。 In this thesis, the methods of improving the robustness and accuracy of features using speaker-based feature normalization are described. A continuous mixture-Gaussian hidden Markov model (HMM)-based Mandarin speech recognition system using MAT4500 database is constructed. In HMM training procedure, bias estimated from HMM segment is addressed to compact each HMM model. Besides, a series of studies between SBR and HMM biases are perused. The recognition rate of applying speaker-based feature normalized in the 100 RCD initials and 40 CI finals-HMM system is 65.71% on condition that HMM segment is addressed. This experimental result is higher than typical SBR method, 60.17%. By the way, the accuracy rate was raised from 62.56% to 67.96% in the Context Dependent HMM system. Further more; features are classified according to initial/final and silence. An accuracy rate of 73.31% was achieved in the intra-syllable RCD-HMM system and 76.56% in the CD-HMM system.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT910435062 http://hdl.handle.net/11536/70595
Appears in Collections:	Thesis