標題: 使用遞迴式類神經網路之語音段切割
RNN-based Segmentation for Speech Recognition
作者: 林威成
Wei-Cheng Lin
王逸如
Yih-Ru Wang
電信工程研究所
關鍵字: 遞迴式類神經網路;切割;多層式類神經網路;有限狀態機;隱藏式馬可夫模型;recurrent neural network;segmentation;MLP;FSM;HMM
公開日期: 2001
摘要: 在本論文中,主要針對連續語音的預切割系統,進行研究與分析。在此提出以遞迴式類神經網路結合有限狀態機的基本架構,對連續語音做粗分類與細分類,以供不同目的的後級處理器使用。在粗分類方面,我們將連續語音分為靜音與語音兩部分,由實驗結果可知,能得到正確的靜音與語音邊界。在細分類方面,我們將語音分為聲母、韻母、韻尾鼻音、靜音與聲母-韻母間的轉換狀態,在實作的過程中,我們發現對於音節耦合處,預切割無法有效的處理。因此我們對產生連音的情形做統計與分析,並建立連音模型,使得後級的音節辨認系統可以運用這些資訊以得到辨認率的提升。最後,對於韻律片語邊界的偵測,我們提出高斯混和模型與多層神經元的類神經網路兩種方法,也可以得到不錯的辨識結果。
In this thesis, the recurrent neural network (RNN) and finite state machine (FSM) were used to construct a pre-segmentation unit in speech processing system. A RNN pre-segment network was used to classify the input speech into silence, initial, final and nasal. Two speech databases, MAT-2000 and TCC-300, were used to examine the effectiveness of the RNN pre-segment network. And the FSM’s were used in second stage to constraint the segmentation result according to the phonetic structure of Mandarin speech. First, a FSM was used to classify the input signal into silence/speech. And another FSM was used to segment the signal into silence, initial, initial/final transition, final, nasal, silence. The performance of above two RNN-FSM segmentation schemes was carefully examined by experiments. Finally, beside the sentence and syllable boundaries, the prosodic boundaries of speech was also be detected by using a statistical method and MLP neural network.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900435060
http://hdl.handle.net/11536/68937
顯示於類別:畢業論文