利用連續型隱藏式馬可夫模型建立的小字庫語者調適型辨識系統

Title:	利用連續型隱藏式馬可夫模型建立的小字庫語者調適型辨識系統 A Small-Vocabulary Speaker-Adaptive Speech Recognizer Based on Continuous-Density HMM
Authors:	蔡明況 Tsai Ming-Kuang 劉啟民 Liu Chi-Min 資訊科學與工程研究所
Keywords:	語者調適;隱藏式馬可夫模型;貝氏調適;參數共享;Speaker-Adaptive;Hidden Markov Model;Bayesian Adaptation; Parameter Sharing
Issue Date:	1993
Abstract:	特定語者(Speaker-dependent)型語音辨識系統雖有高辨識率的優點，但當它要應用到新語者時須花許多語音訓練資料和時間；而不限語者( Speaker-independent)或多語者(multi-speaker)型的語音辨識系統除最初建立系統時所需語音資料外，應用於新語者時有不再需新語音訓練資料的好處，但它的辨識率普遍不高。語者調適(Speaker-adaptive)辨識系統則利用一充分訓練過的參考系統已知資訊，藉新語者少量語音資料訓練，可達到接近特定語者系統的辨識率。本論文以一小字庫來對語者調適技術作研究，所採用獨立音(isolated)小字庫包含10個國語數字'0'-'9'及26 個英語字母'A'-'Z'，其中國語'1'和英語'E'同音，視為一音，故共有35 音。技術上則利用連續型隱藏式馬可夫模型(Continuous Density Hidden Markov Model；CDHMM)分別建立辨識率93.9%的特定語者和辨識率83.2%的多語者辨識系統。由於以連續型HMM建立的不限語者或多語者辨識系統，貝氏調適(Bayesian adaptation)技術已有不少成功的應用例子，因此我們以一多語者系統為基礎參考系統，將貝氏調適的作法結合在k 路分割( segmental k-means)訓練演算法中，運用在語者調適系統的訓練上。實驗結果顯示，當僅用1組音來調適時，辨識率比多語者系統好，當用3組音做調適時，已可達到接近特定語者系統的辨識率。此外，本論文還針對E集合的易混淆音做特別處理，利用參數共享的方式，降低辨識錯誤率。 A speaker-dependent speech recognition system allows high performance, but it needs a lot of speaker-specific training utterances. A speaker-independent (or multi-speaker) system needs not training data from the new speaker, excluding from the training data for creating system, but it cannot achieve high performance usually. A speaker-adaptive system makes use of existing knowledge, contained in a reliable trained reference system, so that a small amount of training data is sufficient to reach performance of the speaker-dependent system. In this paper , we show the advantages of speaker- adaptive techniques by testing on a 35-word small vocabulary in isolated word mode, including 10 Mandarin digits and 26 letters of the English alphabet (the Mandarin digit '1' speaks the same as the English alphabet 'E'). We create three speaker-dependent systems with average recognition rate 93.9% and a speaker- independent system with recognition rate 83.2%, based on the continuous density hidden Markov model (CDHMM). For speaker- independent (or multi-speaker) recognizers using CDHMM, a Bayesian adaptation technique has been used with good success. In this paper, we integrate a Bayesian training algorithm into the segmental k-means training procedure for speaker adaptation. The experimental results indicate that the speaker adaptation procedure achieves better performance than that of the multi-speaker system, when only one training token from each word is used. When 3 training tokens from each word is used, the recognition rate reaches that of the speaker- dependent system. Furthermore, we try to solve the confusing set problem using the concept of parameter sharing.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT820392067 http://hdl.handle.net/11536/57875
Appears in Collections:	Thesis

APA	蔡., Tsai M., 劉., & Liu C. (1993). 利用連續型隱藏式馬可夫模型建立的小字庫語者調適型辨識系統. http://hdl.handle.net/11536/57875.
Bibtex	@article{1993, title={利用連續型隱藏式馬可夫模型建立的小字庫語者調適型辨識系統}, author={蔡明況 and Tsai Ming-Kuang and 劉啟民 and Liu Chi-Min}, journal={http://hdl.handle.net/11536/57875}, year={1993}, url={https://ir.lib.nycu.edu.tw/handle/11536/57875}, }