標題: 以結合決策樹與GCVHMM為基礎之不特定語者中文連續數字語音辨識
A Technique for Speaker Independent Automatic Speech Recognition Based on Decision Tree State Tying with GCVHMM
作者: 周佑霖
Yu-Lin Chou
林進燈
周志成
Chin-Teng Lin
Chi-Cheng Jou
電控工程研究所
關鍵字: 馬可夫模型;語音辨識;HMM;Speech Recognition
公開日期: 2001
摘要: 本論文的主要目的是在研究連續中文語音的辨識。在連續音辨識的領域中,有很多的演算法被提出來解決連續音的辨識問題,其中有一種辨識法則,稱為one-state algorithm。本論文將研究的重點放在改良one-state algorithm的兩大問題上。第一個問題是由於在one-state algorithm進行辨識時,參考模型本身的好壞會嚴重影響one-state algorithm的辨識率。由於好的參考模型會有效的增加連續音辨識效果,因此在本論文中,提出一個主軸空間隱藏式馬可夫模型(GCVHMMs)來改善參考模型本身的辨識效果。 接下來針對連續語音的辨識來做介紹。由於一個聲音在前後音不一樣的時候,所產生的模型也會有些許的差異,這便是前後音相關﹙context dependent﹚的概念。因此為了達到連續語音辨識的目標,我們必須要針對一個聲音,各個前後音不同的情形都要建立一個模型,由此可知不僅所建立出來的模型量是如此地大,不合實際需要;而且我們所蒐集的訓練資料也很難能將所有的前後音蒐集到來做訓練,這樣便會造成沒有此前後音的模型。所以我們接下來就這個部分,以決策樹﹙Decision Tree﹚的方法來簡化運算量,並解決訓練資料量不足的問題,如此可使模型複雜度與可蒐集到的語音訓練資料間能達到一個適當的取捨平衡。 最後應用所有論文中提出的方法在中文連續數字的辨識上,實驗結果顯示出比原來的辨識系統最高可增加26.039%的辨識率。
This paper proposed a new speech recognition technique for continuous speech-independent recognition of spoken Mandarin digits. One popular tool for solving such a problem is the HMM-based one-state algorithm, which is a connected word pattern matching method. However, two problems existing in this conventional method prevent it from practical use on our target problem. One is the lack of a proper selection mechanism for robust acoustic models for speaker-independent recognition. The other is the information of intersyllable co-articulatory effect in the acoustic model is contained or not. At first, a generalized common-vector (GCV) approach is developed based on the eigenanalysis of covariance matrix to extract an invariant feature over different speakers as well as the acoustical environment effects and the phase or temporal difference. The GCV scheme is then integrated into the conventional HMM to form the new GCV-based HMM, called GCVHMM, which is good at speaker-independent recognition. For the second problem, context-dependent model is done in order to account for the co-articulatory effects of neighboring phones. It is important because the co-articulatory effect for continuous speech is significantly stronger than that for isolated utterances. However, there must be numerous context-dependent models generated because of modeling the variations of sounds and pronunciations. Furthermore, if the parameters in those models are all distinct, the total number of model parameters would be very huge. To solve the problems above, the decision tree state tying technique is used to reduce the number of parameter, hence reduce the computation complexity. In our experiments on the recognition of speaker-independent continuous speech sentences, the proposed scheme is shown to increase the average recognition rate of the conventional HMM-based one-state algorithm by over 26.039% without using any grammar or lexical information.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900591078
http://hdl.handle.net/11536/69448
顯示於類別:畢業論文