使用時頻變化調變於歌唱語音分離

標題:	使用時頻變化調變於歌唱語音分離 Singing Voice Separation using Spectro-Temporal Modulations
作者:	嚴志軒 Yen, Zhi-Shuan 冀泰石 Chi, Tai-Shih 工學院聲音與音樂創意科技碩士學位學程
關鍵字:	歌唱語音;分離;調變;聽覺模型;singing voice;separation;modulation;auditory model
公開日期:	2014
摘要:	在過去十年來，由於數位聲音科技的進步使得歌聲分離技術逐漸備受關注。在音樂資訊檢索的研究領域中，歌聲訊號或背景音樂訊號有非常多的發展用途，例如歌者辨識、音高擷取、還有音樂曲風分類。然而在多數情況下，歌聲訊號往往是混合著背景音樂訊號，使得純歌聲訊號取得不易。因此如何將歌聲和背景音樂分離就成了重要的課題。在本論文中，我們使用聽覺感知模型提取出時頻調變參數組合，藉由這些參數組合和EM 演算法做無監督式的二階段聚類分析。我們將MIR-1K 音樂資料庫的歌聲和背景音樂以不同訊雜比(Signal to Noise Ratio) 混合做測試並將我們的時頻調變參數演算法與其他知名演算法做不同訊雜比下的優劣比較。實驗得到本研究所提出的演算法於低雜訊比的情況下能有最佳的分離表現。 Over the past decade, the task of singing voice separation has gained much attention due to improvements in digital audio technologies. In the research field of music information retrieval (MIR), separated vocal signals or accompanying music signals can be of great use in many applications, such as singer identification, pitch extraction, and music genre classification. However, in most cases the singing voice is mixed with the accompanying music which makes it difficult to obtain the clean singing voice signal. Thus, separating singing voice from the accompanying music has become an important task. In this thesis, two singing voice separation methods are proposed. The spectro-temporal modulations are extracted from the two-stage auditory model and used as modulation feature sets in the one-stage and two-stage unsupervised clustering system using EM algorithm. The proposed system is tested with the MIR-1K database under different signal-to-noise ratio (SNR) conditions. The experiment results are compared with results of several state-of-the-art unsupervised singing voice separation algorithms and showed better performances in low SNR conditions.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT070151906 http://hdl.handle.net/11536/75688
Appears in Collections:	Thesis