使用遞迴式類神經網路之語音段切割

標題:	使用遞迴式類神經網路之語音段切割 RNN-based Segmentation for Speech Recognition
作者:	林威成 Wei-Cheng Lin 王逸如 Yih-Ru Wang 電信工程研究所
關鍵字:	遞迴式類神經網路;切割;多層式類神經網路;有限狀態機;隱藏式馬可夫模型;recurrent neural network;segmentation;MLP;FSM;HMM
公開日期:	2001
摘要:	在本論文中，主要針對連續語音的預切割系統，進行研究與分析。在此提出以遞迴式類神經網路結合有限狀態機的基本架構，對連續語音做粗分類與細分類，以供不同目的的後級處理器使用。在粗分類方面，我們將連續語音分為靜音與語音兩部分，由實驗結果可知，能得到正確的靜音與語音邊界。在細分類方面，我們將語音分為聲母、韻母、韻尾鼻音、靜音與聲母-韻母間的轉換狀態，在實作的過程中，我們發現對於音節耦合處，預切割無法有效的處理。因此我們對產生連音的情形做統計與分析，並建立連音模型，使得後級的音節辨認系統可以運用這些資訊以得到辨認率的提升。最後，對於韻律片語邊界的偵測，我們提出高斯混和模型與多層神經元的類神經網路兩種方法，也可以得到不錯的辨識結果。 In this thesis, the recurrent neural network (RNN) and finite state machine (FSM) were used to construct a pre-segmentation unit in speech processing system. A RNN pre-segment network was used to classify the input speech into silence, initial, final and nasal. Two speech databases, MAT-2000 and TCC-300, were used to examine the effectiveness of the RNN pre-segment network. And the FSM’s were used in second stage to constraint the segmentation result according to the phonetic structure of Mandarin speech. First, a FSM was used to classify the input signal into silence/speech. And another FSM was used to segment the signal into silence, initial, initial/final transition, final, nasal, silence. The performance of above two RNN-FSM segmentation schemes was carefully examined by experiments. Finally, beside the sentence and syllable boundaries, the prosodic boundaries of speech was also be detected by using a statistical method and MLP neural network.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT900435060 http://hdl.handle.net/11536/68937
Appears in Collections:	Thesis