標題: | 以聲道差與回授作語音分離之研究 Source Separation Based On Binaural Cues And Feedback |
作者: | 劉俊麟 謝世福 電信工程研究所 |
關鍵字: | 雙聲道語音分離;強度差;相位差;左右聲道;結合高斯模型;預分群;回授;Binaural Source Separation;ILD;IPD;microphone intensity;GMM;pre-Cluster;Feedback |
公開日期: | 2014 |
摘要: | 雙聲道語音分離技術中是利用聲道間的強度差與相位差資料經統計分群後達到將混合聲音中分離出各別聲源的目的,而本篇論文內容針對雙麥克風接收的資料做時頻分析與機率統計並加入回授機制。經時頻分析後截取左右聲道強度平面以及聲道間的強度差、相位差平面,兩平面的資料以內部相關外部不相關的方式建立結合高斯機率模型,也就是獨立結合做的預分群演算法,而這些結合高斯機率模型的統計參數,可經由Expectation-Maximization EM演算法估計,最後以機率遮蔽模式從混合聲音分離出不同聲源。回授部分是藉此分離的聲源進一步幫助先前結合高斯機率模型的參數估計,以達更好的分離效果。最後以電腦模擬驗證吾人提出之方法相較於其他演算法在Signal-To-Distortion ratio(SDR)測試標準上皆有2dB以上的改善,且在錄音筆實錄測試上也有1dB以上的改善,證明此方法的實用性。 Binaural source separation aims to isolate individual sound source from mixture by clustering interaural phase and level differences data. In this thesis, we perform statistical analysis of interaural spectrogram and incorporate feedback mechanism. A joint Gaussian Mixture Model (GMM) is built for binaural cues and the microphone intensities with various degrees of correlations. The GMM parameters can be estimated by Expectation-Maximization algorithm. Probabilistic masking follows GMM to separate sound sources. These estimated sound sources can be fedback to enhance EM estimation. Computer simulations show that our algorithm has at least 2dB improvement in signal-to-distortion ratio (SDR). In real tests, 1dB SDR improvement can be attained. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070060267 http://hdl.handle.net/11536/73843 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.