標題: | 經由隱藏式馬可夫模型切割之語音辨識及其語者調適技術 Speaker Adaptation of HMM-Segmentation-Based Speech Recognition |
作者: | 林維芬 Lin, Wei-Fen 林進燈 Chin-Teng Lin 電控工程研究所 |
關鍵字: | 語音辨適;語者調適;speech recognition;speaker adaptation |
公開日期: | 1995 |
摘要: | 本篇論文提出一利用隱藏式馬可夫模型(hidden Markov model)與維特 比演算法(Viterbi algorithm)的語音辨識演算法,其中維特比演算法 是用來切割語音訊號,亦即在語音辨識的過程中,語音訊號的長度並不是 固定的,因此我們利用維特比演算法來將不固定大小的語音特徵向量( feature vector)轉換成固定大小的特徵向量,我們稱做TN向量(TN vector)。接下來我們利用模糊認知器(fuzzy perceptron)產生可區分 每一類樣本(pattern)與他類的超平面(hyperpla□□。當我們應用〝 支持樣本〞(supporting pattern)的觀念時,此語音演算法可以很容易 地運用在語者調適(speaker adaptation)上。我們所謂的支持樣本就是 距離超平面最近的那些樣本。因此當有一辨識錯誤發生時,我們便將此訊 號對所有訓練好的隱藏式馬可夫模型所切割出之TN向量當成支持樣本中 的一部份。值得注意的是在語者調適的過程中需調整的超平面有二個:一 個是辨識錯誤的超平面;另一個是應為辨識結果的超平面。而由於只有二 個超平面需做調整,因此我們所提出的調適方法並不會花費很長的時間終 止且其亦適用於線上調適(on-line adaptation)。當使用大量的資料庫 來建立獨立語者(speaker independent)系統或是大量字彙的系統時, 我們以向量量化來減少訓練語料。雖然我們的語者調適方法並不能保證在 調適過後即能得到正確的辨識結果,但是超平面能以疊代的方式往正確的 辨識結果方向調整,而且可經由設定參數〝belief〞來決定調適速度,最 後我們由實驗結果可以看出我們的辨識方法與調適技術確可提高辨識率。 In this thesis, we propose a speech recognition algorithm which utilizes hidden Markov models and Viterbi algorithm for segmenting the current input speech sequence, such that the variable-dimensional speech signal is converted into a fixed- dimensional speech signal, which is called TN vector. Then we use the fuzzy perceptron to generate hyperplanes which seperate patterns of each class from the others. The proposed speech recognition algorithm is easy for speaker adaptation when the idea of ``supporting pattern'' is used. The supporting patterns are those patterns closest to the hyperplane. When a recognition error occurs, we include all the TN vectors with respect to the segmentations of all HMM models of the input speech sequence as the supporting patterns.The supporting patterns are then used to tune the hyperplane that can cause correct recognition, and also tune the hyperplane that resulted in misrecognition. Since only two hyperplanes need to be tuned, the proposed adaptation scheme does not take a long time to terminate and is suitable for on-line adaptation. When a large database is used for training a speaker independent system or a large vocabulary system, the vector quantization (VQ) technique is used to reduce the number of training patterns. Although the adaptation scheme cannot ensure to recognize the input speech sequence correctly even after adaptation, the hyperplanes are tuned in the direction for correct recognition iteratively and the speed of adaptation can be adjusted by a ``belief'' parameter set by the user. We use several examples to show the performance of the proposed recognition algorithm and the adaptation scheme. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT840327032 http://hdl.handle.net/11536/60288 |
Appears in Collections: | Thesis |