标题: 两阶段消除回响及环境噪音演算法
A Two-stage Algorithm for De-reverberation and De-noise
作者: 黄群
冀泰石
Huang, Chun
Chi, Tai-Shih
电机工程学系
关键字: 回响消除;噪音消除;空间脉冲响应;调变域频谱;类神经网路;复数理想遮罩;De-reverberation;De-noise;Room impulse response;Modulation spectrum;Deep neural network;Complex ideal mask
公开日期: 2017
摘要: 消除回响以及环境噪音一直以来是语音讯号处理中相当重要的议题,本论文中,我们首先针对回响讯号进行一连串的近似及简化,并使用深度学习(deep learning)来达到映射(mapping)及遮蔽(masking)的方法以学习消除回响的过程。藉由加入参考调变振幅(reference modulation magnitude)于输入中并使用混洗(shuffling)方法,对于未见过(unseen)的回响环境也能有很好的消回响效果。接着为了同时消除回响及环境噪音,我们藉由两阶段的处理分别于调变域(modulation domain)消除回响及于振幅频谱(magnitude spectrogram)消除噪音,并于第一阶段训练时产生出的附加性噪音(additive noise)于第二阶段一并消去,希望藉由多阶段的学习以提升重建的效果。从人类听觉(human hearing)角度出发,我们同时考虑语音理解度(intelligibility)及语音品质(quality)为重要的评量标准,藉由与其他演算法做比较并且尝试使用不同的资料库,以分析各种架构的优缺点。
De-reverberation to cancel the reverberant effect and de-noise have always been important topics in speech signal processing. In this thesis, we first analyze the re-verberant effect through a series of approximations and simplifications and use deep learning techniques to apply mapping and masking methods for de-reverberation. By using the reference modulation magnitude derived from a different sentence as the input to the neural network during training, the neural network performs well on de-reverberation for unseen environments. Next, to handle the real-world problem, we propose a two-stage processing which de-reverberates in the modulation domain and de-noises in the spectrogram domain respectively. The artificial additive noise produced from the first de-reverberation stage will also be canceled in the second stage along with environmental additive noise. The reconstruction of speech can be improved by multiple-stage learning. For human hearing applications, speech intelli-gibility and speech quality are considered as important evaluation criteria. Conse-quently, we analyze the advantages and disadvantages of each network structure by comparing the scores of speech intelligibility and quality using two speech corpora.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450730
http://hdl.handle.net/11536/141942
显示于类别:Thesis