標題: 貝氏學習法於語音迴響消除之研究
Bayesian Learning for Speech Dereverberation
作者: 張友誠
簡仁宗
Chang, You-Cheng
Chien, Jen-Tzung
電信工程研究所
關鍵字: 迴響消除;線上學習;貝氏模型;變異性貝氏;非負矩陣分解;speech dereverberation;online learning;Bayesian modeling;variational Bayesian;nonnegative matrix factorization
公開日期: 2016
摘要: 在一個室內空間錄製語音訊號通常會因為迴響而降低其品質,而在語者移動的情況下會造成迴響是不穩定的。本篇論文提出了一個線上語音迴響消除的方法用來增強會隨時間改變的迴響的語音訊號的頻譜。我們所建立的語音迴響消除的模型中包含了非負卷積傳遞函數和非負矩陣分解。非負卷積傳遞函數是用來描述語音訊號和室內脈衝響應的頻譜大小,而非負矩陣分解是用來表示語音頻譜的精細結構。最為重要的是,語音迴響消除模型是經由貝氏方法得到,其中我們利用卜瓦松機率分佈來描述迴響語音訊號,而利用指數機率分佈來描述作為潛在變數的無雜訊的語音訊號、室內脈衝響應和附加的雜訊。在非負矩陣分解中,利用乾淨語音的訓練資料事先訓練好基底矩陣,另一方面,利用伽馬機率分佈表示權重矩陣之事前資訊。透過變異性貝氏期望最大化演算法有效地找出貝氏分解模型中變異性參數和模型參數的封閉解。更進一步地,我們利用此貝氏模型發展出線上學習的機制,使得迴響消除模型可以自適應地學習以匹配各種迴響條件。這種方法完全是數據驅動且無需事先知道有關室內空間的構造或語者特性的資訊。有趣的是,這個模型可以被簡化並與已存在的一些方法形成關聯。在實驗中,我們利用2014 REVERB Challenge裡的模擬資料和真實的錄音來評估分析我們所提出的方法。將來,我們也會利用非穩定迴響的情況來評估我們的方法。
Speech signals recorded in a room are commonly degraded by reverberation. The reverberant condition is generally nonstationary due to moving speakers. This study presents an online speech dereverberation approach to enhance the spectrum of the time-varying reverberant speech signal. We construct a speech dereverberation model which consists of a nonnegative convolutive transfer function (N-CTF) and a nonnegative matrix factorization (NMF). N-CTF is used to characterize the magnitude spectra of speech signal and room impulse response while NMF is applied to represent the fine structure of speech spectra. Importantly, the speech dereverberation model is learned through a Bayesian approach where the reverberant speech is represented by the Poisson distribution and the latent variables including clean speech, reverberation kernel and additive noise are modeled by the exponential distributions. In NMF, the basis matrix is pre-trained from clean training speech while the weight matrix is characterized by a gamma prior. A variational Bayesian expectation-maximization (VB-EM) algorithm is developed to implement an efficient closed-form solution to variational parameters as well as model parameters. An online learning mechanism is further developed under this Bayesian model so that the dereverberation model can be adaptively learned to match the various reverberant conditions. Such method is totally data-driven without prior knowledge about room configuration and speaker characteristics. Attractively, this model can be simplified and related to the existing methods. In the experiments, we evaluate the proposed method by using both simulated data and real recordings from the 2014 REVERB Challenge. We will also assess our method on the nonstationary reverberation condition.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070260252
http://hdl.handle.net/11536/142539
顯示於類別:畢業論文