標題: 基於複數頻譜圖時域變化之消除迴響演算法
Dreverberation based on temporal variations of complex spectrogram
作者: 陳祖昊
冀泰石
Chen, Tzu-Hao
Chi, Tai-Shih
電信工程研究所
關鍵字: 迴響;深度類神經網路;複數頻譜;dereverberation;deep neuron network;complex spectrogram
公開日期: 2016
摘要: 近年來機器學習於語音訊號處理的領域中扮演著舉足輕重的腳色。消除迴響一直都是語音訊號處理中很重要的議題,然而以傳統抵銷卷積演算法來達到消除迴響的效果是一連串複雜的過程。本論文中,我們將以機器學習的方式處理迴響訊號,嘗試將時域卷積的過程以近似的方式簡化,並搭配人腦處理聲音訊號之模式,以深度類神經網路學習消除迴響的過程。語音理解度與語音品質皆是評量語音訊號處理的重要評量標準,我們同時考慮此兩種評量標準,目的為達到同時提升語音理解度與語音品質,有別於以往以機器聽覺(machine hearing)為最終目的的演算法,我們將人類聽覺(human hearing)視為最終目的,發展同時考慮振幅頻譜(magnitude spectrogram)與相位頻譜(phase spectrogram)之演算法,以複數頻譜之時域變化為特徵,提出了特稱應對結合參考相位(feature mapping with reference phase, FMwRP)和複數理想浮值遮罩(complex-valued ideal ratio mask, cIRM)兩種演算法,並在最後與機器聽覺為最終目的的演算法做消除迴響效能比較。
Recently, machine learning plays a significant role in the field of audio signal processing. De-reverberation to cancel the reverberant effect has always been an important task in speech processing, however, time-domain deconvolution algorithms often require a series of compli-cated processes and provide no good results. In this thesis, we propose two de-reverberation algorithms in the modulation-variation domain using a machine learning technique. Inspired by human auditory processing, the time domain convolution operation was first transformed to the modulation-variation domain and a deep neural network (DNN) was used to learn how to de-reverberate speech signals in that domain. For human hearing applications, enhancing speech intelligibility and speech quality is more critical than enhancing spectral profiles, which are important to machine hearing applications. Our de-reverberation algorithms simultaneously consider magnitude and phase responses of the temporal variations of the complex spectro-gram such that they are able to improve speech intelligibility and speech quality scores of the processed speech signals. Performance comparisons between the proposed two algorithms, one using feature mapping with reference phase (FMwRF) and the other one forming a com-plex-valued ideal ratio mask (cIRM), and the baseline de-reverberation system show our mod-ulation-variation-domain algorithms are more suitable for human hearing applications.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070360218
http://hdl.handle.net/11536/139259
顯示於類別:畢業論文