標題: 使用特徵參數轉換之語音辨認與語者調適研究
A Study on Feature Transformation for Speaker-Adapted Speech Recognition
作者: 唐嘉俊
Chia-Chun Tang
王逸如
Yih-Ru Wang
電信工程研究所
關鍵字: 特徵參數轉換;語者調適;最小平均誤差;最大相似度;Feature transformation;Speaker adaptation;Minimum mean-squared error;Maximum likelihood
公開日期: 2003
摘要: 本篇論文主要是探討特徵參數轉換方法對語者調適語音辨認的影響,我們以最小平均誤差(Mean Square Error)及最大相似度(Maximum Likelihood)為原則 (Criteria)推導公式,以MAT4500語料庫9:1的比例為實驗的訓練及測試語料,並且使用測試語料中的長句為調適語料;實驗中觀察辨識率上限與調適語料為一句(4秒)到八句時(約37秒)的辨識率;最後並分析在分群求取轉移函數的情形下,上限辨識率的改變和調適語料長短對分群的影響。歸納看來,特徵參數轉換方法可以有效去除語者/通道的差異而獲得較精準的HMM模型;轉移函數參數量越多時,上限辨識率越高,但在少量調適語料的情況下則越不理想,顯示出在調適語料有限的情形時,參數的估測有可能失去準確度而無法達到調適的效果。
In this thesis, the effect of feature transformation in speaker-adapted speech recognition is exploited. Two criteria, minimum mean-squared error and maximum likelihood, are employed to formulate the feature transformation algorithm. Besides, the approach of using different transformation for three broad speech classes of initial, final, and silence is also studied. Effectiveness of the proposed method was examined by simulations using MAT4500 telephone speech database with 9/10 data for training and 1/10 for testing. Sentential utterances were used in the speaker adaptation test. The amount of adaptation date ranged from one utterance (4 seconds) to eight utterances (37 seconds). Experimental results showed that the proposed feature transformation method can eliminate the speaker/channel effect so as to make the HMM models more compact. We also found that, as more transformation parameters were used, the upper bound of recognition rate was better while the adaptation effect became worse for small adaptation data. This mainly resulted from the inaccuracy of parameter estimation when insufficient adaptation data were used.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009113562
http://hdl.handle.net/11536/46490
顯示於類別:畢業論文


文件中的檔案:

  1. 356201.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。