標題: 使用遞迴式類神網路之語音辨認前處理
RNN - based Preprocessing for Speech Recognition
作者: 游山銳
Shan-Ruei You
陳信宏
Sin-Horng Chen
電信工程研究所
關鍵字: 遞迴式神經網路;語音辨認前處理;聲調辨認;預分類;RNN-based Preprocessing;Speech Recognition;continuous-word tone recognition;pre-classify
公開日期: 1999
摘要: 本論文主要探討以遞迴式類神經網路為架構之語音辨認前處理系統,並且初步嘗試利用動態更新雜訊模型的方式,來增加系統對雜訊的抗干擾性,也由實驗結果發現,此一方法有不錯的效果,另一方面由有限狀態將語音信號做聲母、韻母、靜音和呼吸聲的預分類,與後級辨認器做結合,不僅加快了辨認速度,也修正了後級辨認可能發生的錯誤,使得辨認率能有所增進。在聲調辨認方面,我們將前處理系統和後級辨認器結合,融入聲調的辨認,利用可靠的切割位置,來求取聲調的特徵參數,增加了聲調辨認的可信度,並且在進行基頻軌跡搜尋時,除了音節內本身的基頻範圍預測限制外,再引入由過去音節基頻值,來預測下個音節基頻軌跡開始搜尋之音框的基頻範圍,可以修正連續語音中突發的兩倍基頻值的出現,使得聲調的辨認更為準確。
In this thesis we discuss RNN-based preprocessing for speech recognition, and use the method of dynamic noise model to prevent the interference of noise. Experimental results showed that the method is very effective. On the other hand, the RNN-based pre-processing to detect the endpoints of the input speech as well as to pre-classify input frames into four broad classes of I (initial), F (final), S (silence), and T (transition). The purpose of pre-classification is to speed up the following recognition process by restricting the search spaces for the three stable classes of I, F, and S. In the continuous-word tone recognition, the main feature is the pitch contour detected by SIFT. In the pitch tracking, the range of pitch is restricted by past syllable. The restriction can correct the error of double pitch, and raise the accuracy rate.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880435031
http://hdl.handle.net/11536/65866
Appears in Collections:Thesis