標題: 國語連續音節辨認系統之改進與分析
An Improvement on the HMM-based Continuous Mandarin Speech Recognition Method
作者: 呂儲仰
Chu-Yang Lu
陳信宏
Sin-Horng Chen
電信工程研究所
關鍵字: 音節辨認;右相關聲母模型;環境不匹配;遞迴式類神經網路;語音切割;連音聲學模型;語者說話速度;syllable recognition;right context dependent initial model;mismatch;RNN;speech segment;coarticulation model;speaking rate
公開日期: 2001
摘要: 在本論文中,我們針對右相關聲母模型進行系統的分析與改進,研究主題包含了對環境的不匹配,我們以三種不同的方法對之做初步的調適,均可得不錯的效果。另外,我們利用遞迴式類神經網路(RNN)的輸出參數來協助HMM訓練語音的切割,並用之標示出音節耦合的位置,藉以建立連音聲學模型來協助音節的辨認,由實驗結果可知,對於訓練語料的切割我們可對之做微幅的調整而獲得較正確的音節邊界,而對於連音聲學模型,其在長句的辨認也可得較佳的辨認結果。最後,我們利用最大似然機率法則找出特徵參數與語者說話速度間的關係,並用以調適語音模型,降低因語者說話速度不同所對辨認系統的影響,根據實驗結果顯示,其對於語者說話速度較大時可使辨認系統效能有所提升。
In this thesis, an improvement of the HMM-based continuous Mandarin speech recognition method, developed previously in NCTU, on the following aspects are discussed. Firstly, three schemes of compensating the environmental mismatch are discussed. One is to use the database mean difference directly. Another is the CMN method. The other is a mismatch prediction method. Secondly, the information of RNN speech segmentation is used to restrict the recognition search in both the training and testing phases. Experimental results showed that it is effective on refining the HMM models as well as on speeding up the recognition process. Thirdly, new recognition units to model serious inter-syllable coarticulation are constructed. Lastly, a new method of speaking rate normalization is discussed. The model tries to explore the relation of speaking rate and dynamic spectral features. The recognition performance was improved for high speaking rate utterances.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900435027
http://hdl.handle.net/11536/68902
顯示於類別:畢業論文