標題: 經由隱藏式馬可夫模型切割之語音辨識及其語者調適技術
Speaker Adaptation of HMM-Segmentation-Based Speech Recognition
作者: 林維芬
Lin, Wei-Fen
林進燈
Chin-Teng Lin
電控工程研究所
關鍵字: 語音辨適;語者調適;speech recognition;speaker adaptation
公開日期: 1995
摘要: 本篇論文提出一利用隱藏式馬可夫模型(hidden Markov model)與維特
比演算法(Viterbi algorithm)的語音辨識演算法,其中維特比演算法
是用來切割語音訊號,亦即在語音辨識的過程中,語音訊號的長度並不是
固定的,因此我們利用維特比演算法來將不固定大小的語音特徵向量(
feature vector)轉換成固定大小的特徵向量,我們稱做TN向量(TN
vector)。接下來我們利用模糊認知器(fuzzy perceptron)產生可區分
每一類樣本(pattern)與他類的超平面(hyperpla□□。當我們應用〝
支持樣本〞(supporting pattern)的觀念時,此語音演算法可以很容易
地運用在語者調適(speaker adaptation)上。我們所謂的支持樣本就是
距離超平面最近的那些樣本。因此當有一辨識錯誤發生時,我們便將此訊
號對所有訓練好的隱藏式馬可夫模型所切割出之TN向量當成支持樣本中
的一部份。值得注意的是在語者調適的過程中需調整的超平面有二個:一
個是辨識錯誤的超平面;另一個是應為辨識結果的超平面。而由於只有二
個超平面需做調整,因此我們所提出的調適方法並不會花費很長的時間終
止且其亦適用於線上調適(on-line adaptation)。當使用大量的資料庫
來建立獨立語者(speaker independent)系統或是大量字彙的系統時,
我們以向量量化來減少訓練語料。雖然我們的語者調適方法並不能保證在
調適過後即能得到正確的辨識結果,但是超平面能以疊代的方式往正確的
辨識結果方向調整,而且可經由設定參數〝belief〞來決定調適速度,最
後我們由實驗結果可以看出我們的辨識方法與調適技術確可提高辨識率。
In this thesis, we propose a speech recognition algorithm which
utilizes hidden Markov models and Viterbi algorithm for
segmenting the current input speech sequence, such that the
variable-dimensional speech signal is converted into a fixed-
dimensional speech signal, which is called TN vector. Then we
use the fuzzy perceptron to generate hyperplanes which seperate
patterns of each class from the others. The proposed speech
recognition algorithm is easy for speaker adaptation when the
idea of ``supporting pattern'' is used. The supporting
patterns are those patterns closest to the hyperplane. When a
recognition error occurs, we include all the TN vectors with
respect to the segmentations of all HMM models of the input
speech sequence as the supporting patterns.The supporting
patterns are then used to tune the hyperplane that can cause
correct recognition, and also tune the hyperplane that resulted
in misrecognition. Since only two hyperplanes need to be tuned,
the proposed adaptation scheme does not take a long time to
terminate and is suitable for on-line adaptation. When a large
database is used for training a speaker independent system or a
large vocabulary system, the vector quantization (VQ) technique
is used to reduce the number of training patterns. Although the
adaptation scheme cannot ensure to recognize the input speech
sequence correctly even after adaptation, the hyperplanes are
tuned in the direction for correct recognition iteratively and
the speed of adaptation can be adjusted by a ``belief''
parameter set by the user. We use several examples to show the
performance of the proposed recognition algorithm and the
adaptation scheme.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840327032
http://hdl.handle.net/11536/60288
顯示於類別:畢業論文