標題: 利用聲源分離和定位技術合成出虛擬聆聽點的3D音訊
Virtual Listening Point 3D Audio Synthesis using Sound Separation and Source Localization Techniques
作者: 羅偵源
Luo, Chen-Yuan
杭學鳴
盧延禎
Hang, Hsueh-Ming
Lu, Yan-Chen
電子研究所
關鍵字: 麥克風陣列;可選擇聆聽點系統;盲訊號源分離;到達方向偵測;三角定位;頭部相關傳送函數;雙線性法內插;3D音訊合成;microphone array;microphone array;blind source separation;direction of arrival;triangulation;Head Related Transfer Function;bilinear interpolation method;3D acoustic signal synthesis
公開日期: 2011
摘要: 本論文的目的在於利用盲訊號源分離、到達方向偵測和頭部相關傳送函數的技術來建立一個可選擇聆聽點(Selective Listening Point,SLP)的系統。在一個存在多個聲源的空間中,佈置麥克風陣列錄下混合聲音訊號。可選擇聆聽點系統的目的就是分離混合訊號以還原個別聲源訊號,同時取得這些聲源在空間中的訊息,包括聲源與麥克風陣列之間的角度及距離資訊。之後利用分離的聲源和空間資訊重新在空間中任意點合成出新的3D音訊。整個系統分成三個步驟:(1)用盲訊號源分離(Blind Source Separation,BSS)的技術分離混合聲音訊號。(2)利用到達方向(Direction of Arrival,DOA)偵測去估測聲源到達方向,並且用三角定位法估測聲源與麥克風陣列的距離。(3)利用頭部相關傳送函數(Head Related Transfer Function,HRTF)合成3D音訊。 本論文中採用兩組麥克風陣列。首先使用快速定點獨立向量分析演算法(fast fixed-point independent vector analysis algorithm)來得到分離矩陣(demixing matrix)以分離混合訊號,此處以訊號干擾比(Signal to Interference Ration,SIR )來評估分離訊號的效果。接著利用分離矩陣的虛擬反矩陣(pseudo inverse matrix )得到導引向量矩陣(steering vector matrix),此矩陣的行向量可用來分析獲取得到聲源到達方向。我們利用混合訊號的低頻帶所估測的角度來改善聲源方向偵測的準確度,並且利用聲源到達方向修正分離矩陣以獲得更佳的分離效果。利用兩個訊號的相關係數(correlation coefficient)對兩個麥克風陣列所分離的聲源進行配對,以進行三角定位獲得聲源與麥克風陣列之間的距離。最後利用頭部相關傳送函數和雙線性法(bilinear method)內插來合成3D音訊。
This thesis introduces a selective listening point (SLP) system consisting of three signal processing engines: blind source separator, direction of arrival estimator and spatial sound synthesizer based on head-related transfer functions (HRTF). The application scenario of this system is a space with multiple sound sources. The SLP system aims to separate the mixed signal picked up by microphone array into individual source signals based on their spatial properties, i.e., the angle and the distance from sound sources to the microphone array. Separating and recovering the locations of these sound sources and then re-synthesizing the sound at a selective listening point. The whole process can be divided into three steps: (1) using blind source separation technique to separate the mixed sound signals, (2) calculating the direction of arrival to estimate the angle of sound source location, and applying the triangulation method to estimate the distance between the sound sources and the microphone array, and (3) synthesizing 3D audio based on HRTF. Two microphone arrays are used in the proposed system. The fast fixed-point independent vector analysis algorithm is used to process the signal captured by these arrays in order to get the demixing matrix. This matrix is subsequently used to separate the mixed signal; its performance is evaluated by the signal to interference ration (SIR) criterion. Computing the pseudo inverse matrix of the demixing matrix can further obtain the steering vector matrix, whose column vectors can be used to estimate the position angles of sound sources. We found that only analyzing the lower frequency bands of input mixed signal can improve the estimation accuracy of the sound source incoming angle. This angle information can be used to correct the demixing matrix to enhance the source separation performance. The waveform correlation coefficient enables the pairing of signals captured by two different arrays for the purpose of triangulating the source locations and thus we obtain the distance information. When all the spatial information corresponding to each sound source is retrieved, it is possible to reconstruct the auditory scene of the space with the listening point altered. The use of HRTF delivers the naturalness of sound quality when synthesizing sounds. The bilinear interpolation method in HRTF is adopted in rendering the moving sounds.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079811645
http://hdl.handle.net/11536/46811
顯示於類別:畢業論文