標題: 使用鑑別式動態非負矩陣分解之單聲道聲源分離
Discriminative and Dynamic Nonnegative Matrix Factorization on Monaural Audio Source Separation
作者: 黃奕鈞
冀泰石
Huang, Yi-Chun
Chi, Tai-Shih
工學院聲音與音樂創意科技碩士學位學程
關鍵字: 單聲道聲源分離;局部保留;歌聲分離;語音消噪;鑑別式學習;非負矩陣分解;Monaural audio source separation;Local preserving;Singing voice separation;Speech denoising;Discriminative learning;Nonnegative matrix factorization
公開日期: 2016
摘要: 非負矩陣分解是一種熱門的聲音聲源分離工具,它可以從聲源頻譜中習得頻譜字典,並且利用這些習得之字典將混合訊號加以分離。然而,標準的非負矩陣分解在學習的過程中並沒有考慮聲源內的時間特性。而非負矩陣分解類似生成模型的特性,使得它無法保證具有良好代表性的頻譜字典對於聲音聲源分離有幫助。此外,字典的學習也應該被劃分為數個子區塊以處理聲音訊號的不同時頻特性,例如不同語者的語音訊號,或者音樂訊號中的不同樂器。因此,我們提出的方法結合數種非負矩陣分解的延伸方法以解決上述問題,應用於語音降噪與歌聲及背景音樂分離。在時間特性建模部分,我們使用一套向量自回歸模型的後處理方法;在子區塊劃分方面,則引進一套局部基底學習方法。我們也引進了一套修改過後的鑑別式學習程序,用以解決代表性與分離效能之問題。總而言之,我們基於非負矩陣分解的延伸方法考慮了局部的時間特性以及模型對不同聲源的鑑別能力。
The nonnegative matrix factorization (NMF), which learns dictionaries from source spectra and uses the learned dictionaries to decompose the mixture in the test phase, is a widely used tool for audio source separation. However, the standard NMF does not consider temporal properties of the signals when learning dictionaries. The standard NMF is also a generative model, which do not guarantee that a good representation model is also a good separation model. Besides, the learned dictionaries should be partitioned into subgroups to account for sources with different spectro-temporal properties, such as speech signals from different speakers or music signals from different instruments. Therefore, we propose a method by combine extensions of NMF to address these problems for speech denoising and singing voice separation. For temporal modeling, our method adopts a post-filtering technique, which derives a source specific vector autoregressive (VAR) model to smooth the NMF coefficients in the test phase. For partitioning, we make use of the mixture of local dictionaries (MLD) technique to divide dictionaries into subgroups by considering intra- and inter- group distances. We also introduce a modified discriminative learning procedure to deal with the representation-separation problem. To sum up, our NMF-extended method put additional considerations on the temporal properties of each subgroup and discrimination between sources.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070351901
http://hdl.handle.net/11536/139110
顯示於類別:畢業論文