標題: | 變概率貝氏推論之非負矩陣拆解應用於聲音聲源分離 Variational Bayesian Inference Nonnegative Matrix Factorization with Application to Auditory Streaming |
作者: | 蔡鈺群 Tsai, Yu-Chun 冀泰石 Chi, Tai-Shih 工學院聲音與音樂創意科技碩士學位學程 |
關鍵字: | 非負矩陣拆解;聲音聲源分離;貝氏統計;變概率分布;Nonnegative matrix factorization;variational bayesian approach;audio source separation |
公開日期: | 2013 |
摘要: | 在近幾年計算機技術的快速發展之下,聲源分離已經成為一個重要的議題。伴隨著技術的發展,非負矩陣拆解(Nonnegative Matrix Factorization, NMF)是一種多變量分析算法,主要適用於音樂聲源分離。因此本文利用非負矩陣拆解法作為聲音聲源分離的主要演算法,結合Variational Bayesian approach與超參數(Hyperparameter)的概念,進一步將NMF以貝氏統計的形式進行聲音聲源分離,使得比起以往統計形式的NMF擁有更精確的音訊分離結果。此外,本文方法亦利用近似人耳聽覺頻帶的等效矩形頻寬(ERB)進一步對FFT頻譜作分析,並以此分析結果作為觀察變量,此舉可有效降低非負矩陣拆解演算法的運算時間,亦可凸顯音樂訊號中的主要共振峰。另一方面,對於所使用的4個超參數有54種組合,而有些組合並非適用於音樂類的頻譜分離。因此,本文提出先藉由判斷下界函式是否發散為依據來挑選適用的超參數組合,再經由感知評價法(PEASS)評分工具挑選出最佳的超參數組合。
實驗分析方面,本文利用訊號分離評估競賽(SiSEC, 2013)所提供的音訊,以及感知評價法(PEASS)作為聲音聲源分離後音訊品質的評分標準。實驗先以片段的音樂做為學習樣本找出4組最佳超參數的組合設定,在以此設定針對各種音訊作人聲及各種樂器進行分離。分析結果表明,在最佳的超參數設定下,本文方法可成功分離出主要的聲源資訊,並可在少許次數的迭代就有出色的表現。 In the application of audio streaming or so called audio source separation, the goal is to decompose a music recording into sound streams from individual instruments. One of the most effective classes of methods to separate sound streams stems from the nonnegative matrix factorization (NMF). This thesis presents a variational Bayesian (VB) treatment of NMF, based on the Itakura-Saito (IS) divergence and the concepts of hyper-parameters, and derives the marginal likelihood (low bound) to approximate the posterior density of the NMF factors. An efficient iterative algorithm, which outperforms the previously derived statistics NMF methods, such as Expectation-Maximization IS-NMF, is proposed. The proposed algorithm works in the equivalent rectangular bandwidth (ERB) domain, where the main resonance of the music signal is emphasized. In addition, the hyper-parameters are optimized in the case of inverse-Gamma prior. Simulations show the matrix factorization indeed improves separation results over the EM-IS-NMF using perceptual evaluation methods for audio source separation (PEASS) scoring tool. A comparative study between the VB-IS-NMF and the EM-IS-NMF algorithms when applying to ERB spectrogram of a short vocal and bass sequence recorded in real conditions is demonstrated. Simulations show the proposed VB-IS-NMF can be successfully used for streaming music clips from the signal separation evaluation campaign (SiSEC 2013). Finally, the proposed algorithm outperforms other methods which do not require explicit training data as well for the separation of audio signals provided by SiSEC. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070051901 http://hdl.handle.net/11536/73773 |
顯示於類別: | 畢業論文 |