标题: 深度学习估计时频调变域之理想二元遮罩以保留语音理解度
Rate-Scale Ideal Binary Mask Estimation Using Deep Learning to Preserve Speech Intelligibility
作者: 刘兆伦
冀泰石
Liu, Chao-Lun
Chi, Tai-Shih
电机工程学系
关键字: 语音理解度;深度学习;理想二元遮罩;时频调变域;speech intelligibility;deep learning;ideal binary mask;rate-scale
公开日期: 2016
摘要: 为了验证理想二元遮罩对于助听器及人工电子耳的帮助,我们发展出了一项新的时频调变域的理想二元遮罩,解决了理想二元遮罩稀疏的问题。我们先对正常听力受试者进行听力实验,目的是为了证明时频调变域的理想二元遮罩保留语音理解度的能力与理想二元遮罩相同,并从实验中寻找最佳的参数解;接着我们利用深度学习的方法来估计经过遮罩后的语音,我们比较了深度神经网路以及深度递回神经网路不同音框数及不同架构下的客观语音理解度,经由实验我们发现到,深度递回神经网路处理回响的能力比同架构下的深度神经网路效果还要好;最后我们提出了在小架构下处理回响的方法,由于人工电子耳只能使用二元遮罩,又碍于硬体上的限制,未来若能开发出新的演算法在人工电子耳上,我们提供各种运算量上的评分标准可供各种不同运算量的演算法选择合适的理想二元遮罩。
We propose a new binary mask, the rate-scale ideal binary mask (RS_IBM), as a variation of the ideal binary mask (IBM) to address possible problems caused by the sparsity of IBM. The RS_IBM is built based on the spectro-temporal modulation energy instead of the total energy. In this thesis, we first conducted subjective listening tests on normal-hearing people to verify RS_IBM can preserve speech intelligibility as the ideal reverberation mask (IRM) in reverberant environments. After verifying the efficacy of RS_IBM, we used deep learning techniques to learn its masking effect on dereverberation. For learning such effect, we used a deep neural network (DNN) and a deep recurrent neural network (RNN) with various settings. Under comparable network structures, simulation results showed RNN has higher dereverberation effect than DNN. Finally, we used a RNN to directly estimate the RS_IBM for cochlear implants and provided a performance table for different RNN settings. Developers of the cochlear implant can then choose a particular RNN based on the computation limitation of the cochlear implant for optimally estimating the RS_IBM.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350726
http://hdl.handle.net/11536/139085
显示于类别:Thesis