標題: 以空間線索為根據的時頻遮罩應用於雙耳回響聲源分離與雜訊消除
Time-Frequency Masking Based on Spatial Cues for Binaural Reverberant Source Separation and Noise Reduction
作者: 蕭方濟
Hsiao Fang-Chi
冀泰石
電信工程研究所
關鍵字: 聲源分離;雙耳線索;EM 演算法;統計模型;source separation;binaural cues;EM algorithm;statistical modeling
公開日期: 2014
摘要: 本論文中,我們從混合聲源的音訊頻譜中萃取出空間線索,如雙耳間的能量差與時間差,並藉由分類混合音訊頻譜上的時頻單元重建回目標聲源的頻譜。然而,聲音頻率的高低會影響時間差與能量差在空間定位上的鑑別度,所以本論文根據聽覺感知,在不同的頻率範圍分別選用鑑別度較高的資訊。本論文中所使用到的空間資訊有聲源在空間的角度、能量差以及雙耳間的頻譜所構成的混合向量,並利用雙耳間的一致性以及雜訊語句功率頻譜密度與雜訊功率頻譜密度的比值來判別時頻單元的可靠性,之後將可靠時頻單元上的空間線索利用最大期望演算法將它們做分類後建構出目標聲源的遮罩,並在目標聲源遮罩中對可靠性較低的時頻單元給定一個常數,之後利用濾波器組來平滑化目標聲源遮罩。最後,我們利用訊號對失真的能量比值(Signal-to-Distortion ratio, SDR)與聲源分離的感知評分(Overall Perceptual Score, OPS)來評比分離出的目標聲源效果,主客觀的實驗結果均顯示我們提出的方法較文獻[29]上的方法有較佳的聲源分離結果。
In this thesis, we extract the spatial cues such as interaural level differences (ILDs) and interaural time differences (ITDs) from the mixture spectrograms to reconstruct a spectrogram for the target source by classifying and assigning the time-frequency (T-F) units of the mixture spectrograms to the target source. However, the frequency of the sound affects the efficacy of ITD and ILD in localizing the sound. Hence, we select appropriate cues within different frequency ranges based on hearing perception. The sound angles derived from ITDs, ILDs, and mixing vectors are used as the spatial cues in this thesis. The interaural coherence (IC) and the power ratio of noisy speech and estimated noise are used to determine if the T-F unit is reliable. After selecting reliable T-F units, we employ the expectation-maximization (EM) algorithm to obtain the mask of the target source. The mask values of unreliable T-F units are set to a constant. We then apply the gammatone filterbank to the derived target mask to obtain the smoothed mask. Subjective tests and objective scores, the signal-to-distortion ratio (SDR) and the overall perceptual score (OPS), demonstrate our proposed method outperforms the state-of-the-art method [29] in segregating sounds.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070160235
http://hdl.handle.net/11536/75950
Appears in Collections:Thesis