標題: | 使用遞迴式類神經網路之語音段切割 RNN-based Segmentation for Speech Recognition |
作者: | 林威成 Wei-Cheng Lin 王逸如 Yih-Ru Wang 電信工程研究所 |
關鍵字: | 遞迴式類神經網路;切割;多層式類神經網路;有限狀態機;隱藏式馬可夫模型;recurrent neural network;segmentation;MLP;FSM;HMM |
公開日期: | 2001 |
摘要: | 在本論文中,主要針對連續語音的預切割系統,進行研究與分析。在此提出以遞迴式類神經網路結合有限狀態機的基本架構,對連續語音做粗分類與細分類,以供不同目的的後級處理器使用。在粗分類方面,我們將連續語音分為靜音與語音兩部分,由實驗結果可知,能得到正確的靜音與語音邊界。在細分類方面,我們將語音分為聲母、韻母、韻尾鼻音、靜音與聲母-韻母間的轉換狀態,在實作的過程中,我們發現對於音節耦合處,預切割無法有效的處理。因此我們對產生連音的情形做統計與分析,並建立連音模型,使得後級的音節辨認系統可以運用這些資訊以得到辨認率的提升。最後,對於韻律片語邊界的偵測,我們提出高斯混和模型與多層神經元的類神經網路兩種方法,也可以得到不錯的辨識結果。 In this thesis, the recurrent neural network (RNN) and finite state machine (FSM) were used to construct a pre-segmentation unit in speech processing system. A RNN pre-segment network was used to classify the input speech into silence, initial, final and nasal. Two speech databases, MAT-2000 and TCC-300, were used to examine the effectiveness of the RNN pre-segment network. And the FSM’s were used in second stage to constraint the segmentation result according to the phonetic structure of Mandarin speech. First, a FSM was used to classify the input signal into silence/speech. And another FSM was used to segment the signal into silence, initial, initial/final transition, final, nasal, silence. The performance of above two RNN-FSM segmentation schemes was carefully examined by experiments. Finally, beside the sentence and syllable boundaries, the prosodic boundaries of speech was also be detected by using a statistical method and MLP neural network. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT900435060 http://hdl.handle.net/11536/68937 |
Appears in Collections: | Thesis |