标题: 贝氏学习法于语音回响消除之研究
Bayesian Learning for Speech Dereverberation
作者: 张友诚
简仁宗
Chang, You-Cheng
Chien, Jen-Tzung
电信工程研究所
关键字: 回响消除;线上学习;贝氏模型;变异性贝氏;非负矩阵分解;speech dereverberation;online learning;Bayesian modeling;variational Bayesian;nonnegative matrix factorization
公开日期: 2016
摘要: 在一个室内空间录制语音讯号通常会因为回响而降低其品质,而在语者移动的情况下会造成回响是不稳定的。本篇论文提出了一个线上语音回响消除的方法用来增强会随时间改变的回响的语音讯号的频谱。我们所建立的语音回响消除的模型中包含了非负卷积传递函数和非负矩阵分解。非负卷积传递函数是用来描述语音讯号和室内脉冲响应的频谱大小,而非负矩阵分解是用来表示语音频谱的精细结构。最为重要的是,语音回响消除模型是经由贝氏方法得到,其中我们利用卜瓦松机率分布来描述回响语音讯号,而利用指数机率分布来描述作为潜在变数的无杂讯的语音讯号、室内脉冲响应和附加的杂讯。在非负矩阵分解中,利用干净语音的训练资料事先训练好基底矩阵,另一方面,利用伽马机率分布表示权重矩阵之事前资讯。透过变异性贝氏期望最大化演算法有效地找出贝氏分解模型中变异性参数和模型参数的封闭解。更进一步地,我们利用此贝氏模型发展出线上学习的机制,使得回响消除模型可以自适应地学习以匹配各种回响条件。这种方法完全是数据驱动且无需事先知道有关室内空间的构造或语者特性的资讯。有趣的是,这个模型可以被简化并与已存在的一些方法形成关联。在实验中,我们利用2014 REVERB Challenge里的模拟资料和真实的录音来评估分析我们所提出的方法。将来,我们也会利用非稳定回响的情况来评估我们的方法。
Speech signals recorded in a room are commonly degraded by reverberation. The reverberant condition is generally nonstationary due to moving speakers. This study presents an online speech dereverberation approach to enhance the spectrum of the time-varying reverberant speech signal. We construct a speech dereverberation model which consists of a nonnegative convolutive transfer function (N-CTF) and a nonnegative matrix factorization (NMF). N-CTF is used to characterize the magnitude spectra of speech signal and room impulse response while NMF is applied to represent the fine structure of speech spectra. Importantly, the speech dereverberation model is learned through a Bayesian approach where the reverberant speech is represented by the Poisson distribution and the latent variables including clean speech, reverberation kernel and additive noise are modeled by the exponential distributions. In NMF, the basis matrix is pre-trained from clean training speech while the weight matrix is characterized by a gamma prior. A variational Bayesian expectation-maximization (VB-EM) algorithm is developed to implement an efficient closed-form solution to variational parameters as well as model parameters. An online learning mechanism is further developed under this Bayesian model so that the dereverberation model can be adaptively learned to match the various reverberant conditions. Such method is totally data-driven without prior knowledge about room configuration and speaker characteristics. Attractively, this model can be simplified and related to the existing methods. In the experiments, we evaluate the proposed method by using both simulated data and real recordings from the 2014 REVERB Challenge. We will also assess our method on the nonstationary reverberation condition.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070260252
http://hdl.handle.net/11536/142539
显示于类别:Thesis