Title: 以聲韻母為基礎之國語連續音辨認之改進
An improvement of initial-final based Mandarin continuous speech recognition
Authors: 蔣松茂
S. M. Chiang
陳信宏
S. H. Chen
電信工程研究所
Keywords: 音節插入懲罰;限制狀態長度的隱藏式馬可夫模型;有限狀態機;;Syllable insertion penalty;Bounded state duration HMM (BSD- HMM); Finite state machine (FSM);
Issue Date: 1994
Abstract: 在本論文中,我們進行以聲母韻母模型為基礎的國語連續語音辨認。研究
主題可以分成兩個部份:在第一個部份,我們以100 個考慮後接韻母相依
關係的聲母模型與 39 個韻母模型來進行辨認,並針對語音信號時間上的
結構特性採用限制狀態長度的隱藏式馬可夫模型來改進標準隱藏式馬可夫
模型在這方面模擬的缺失。在語者不特定的辨認中,我們適當運用語者之
間差異的資訊做語者分類,這種做法有助於提昇辨認的正確率。在第二個
部份,我們先以遞迴式類神經網路做連續語音聲母、韻母、靜音的預先切
割,然後利用切割結果建構一個有限狀態機,並將它合併入連續音辨認的
架構中,實驗結果顯示有限狀態機的輔助可以節省辨認過程中一半左右的
計算量。
In this thesis, several techniques to improve the initial-final
based HMM method for continuous Mandarin speech recognition are
proposed. The baseline system uses 100 right-context-dependent
initial HMM models and 39 context-independent final HMM models.
First, the technique of bounded state duration is employed to
model the temporal structure of speech signals and incorporated
into the recognition process. The technique of syllable penalty
is then used to relieve the suffering of high insertion errors.
We then employ the technique of signal normalization to improve
the system. The performance of the recognizer is then further
improved by using gender-dependent HMM models. Effectiveness of
the above proposals was confirmed by simulations on a speaker-
independent speech recognition task to recognize continuous
Mandarin speech through telephone channel. Syllable recognition
rate was raised from 30.86% to 42.14%. Finally, an RNN-based
finite state machine is proposed to pre-segment the input
signal into 4 states including initial, final, silence, and
transient states. State-dependent Constraints are then set to
restrict the search of optimal path for relieving the
computation load of the one-stage recognition process.
Experimental results showed that about half of the computations
can be saved with a very minor loss on the recognition rate.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830436031
http://hdl.handle.net/11536/59386
Appears in Collections:Thesis