標題: 兩階段消除迴響及環境噪音演算法
A Two-stage Algorithm for De-reverberation and De-noise
作者: 黃群
冀泰石
Huang, Chun
Chi, Tai-Shih
電機工程學系
關鍵字: 迴響消除;噪音消除;空間脈衝響應;調變域頻譜;類神經網路;複數理想遮罩;De-reverberation;De-noise;Room impulse response;Modulation spectrum;Deep neural network;Complex ideal mask
公開日期: 2017
摘要: 消除迴響以及環境噪音一直以來是語音訊號處理中相當重要的議題,本論文中,我們首先針對迴響訊號進行一連串的近似及簡化,並使用深度學習(deep learning)來達到映射(mapping)及遮蔽(masking)的方法以學習消除迴響的過程。藉由加入參考調變振幅(reference modulation magnitude)於輸入中並使用混洗(shuffling)方法,對於未見過(unseen)的迴響環境也能有很好的消迴響效果。接著為了同時消除迴響及環境噪音,我們藉由兩階段的處理分別於調變域(modulation domain)消除迴響及於振幅頻譜(magnitude spectrogram)消除噪音,並於第一階段訓練時產生出的附加性噪音(additive noise)於第二階段一併消去,希望藉由多階段的學習以提升重建的效果。從人類聽覺(human hearing)角度出發,我們同時考慮語音理解度(intelligibility)及語音品質(quality)為重要的評量標準,藉由與其他演算法做比較並且嘗試使用不同的資料庫,以分析各種架構的優缺點。
De-reverberation to cancel the reverberant effect and de-noise have always been important topics in speech signal processing. In this thesis, we first analyze the re-verberant effect through a series of approximations and simplifications and use deep learning techniques to apply mapping and masking methods for de-reverberation. By using the reference modulation magnitude derived from a different sentence as the input to the neural network during training, the neural network performs well on de-reverberation for unseen environments. Next, to handle the real-world problem, we propose a two-stage processing which de-reverberates in the modulation domain and de-noises in the spectrogram domain respectively. The artificial additive noise produced from the first de-reverberation stage will also be canceled in the second stage along with environmental additive noise. The reconstruction of speech can be improved by multiple-stage learning. For human hearing applications, speech intelli-gibility and speech quality are considered as important evaluation criteria. Conse-quently, we analyze the advantages and disadvantages of each network structure by comparing the scores of speech intelligibility and quality using two speech corpora.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450730
http://hdl.handle.net/11536/141942
Appears in Collections:Thesis