標題: | 以參考訊號架構為基礎之穩健語者定位與語音純化法 Robust Reference-signal-based Speaker’s Location Detection and Speech Purification |
作者: | 鄭价呈 Chieh-Cheng Cheng 胡竹生 Jwu-Sheng Hu 電控工程研究所 |
關鍵字: | 麥克風陣列;語音純化;語者定位;Microphone Array;Speech Purification;Speaker Localization |
公開日期: | 2005 |
摘要: | 使用麥克風陣列來改善語音擷取的品質以及偵測語者方位在語音介面相關研究上非常重要。本研究的目的在於利用ㄧ組線性麥克風陣列以及參考訊號來定位某些特定所需之語者,並且提升語音辨識的正確性。本篇論文中所提出之方法皆是利用參考訊號為基礎的系統架構來間接地解決麥克風間匹配性問題。本篇論文所提出的語者定位法則利用高斯混合模型來針對每個位置所獨具的特徵(相位差分布)作出模型化的動作。此語者定位方法可以抵抗背景雜音與反射效應,並於近場與遮蔽的環境中提供準確的語者定位結果。
為了減低運算複雜度,本篇論文提出了兩種頻域語音純化法(SPFDBB與FDABB)。有ㄧ法則為:若在時域中語音訊號與通道之間的關係為捲積,則對應於頻域中這兩者的關係則變為一般的乘積。但是此法則並不適用於時域的濾波器階數大於轉換到頻域所取用的窗長度之情況中。因此,本論文所提出的語音純化法便將多個窗的資料結合在一起共同處理,以期能盡可能的逼近以上之法則。此外,還提出了一個參數以提供使用者可針對通道補償以及雜訊抑制來訂定不同的權重。提供上述功能的語音純化法稱之為SPFDBB。但若同時將太多個窗之資料統一處理,則此語音純化法便不適用於一會經常性變動的環境中。因而本論文又更進ㄧ步地提出新參數稱為CBVI來自動調整窗之個數。結合此CBVI參數與SPFDBB之語音純化法則稱之為FDABB。除了上述幾個議題外,模型化誤差亦為ㄧ重要課題。對此,本論文針對一著名理論稱之為H□ 理論做出相關研究,進而將其套用於所提出之兩種語音純化法中。最終,本論文利用模擬以及實際環境下的實驗結果來說明所提出方法的可行性。 The use of microphone array to enhance speech reception and speaker localization is very important. The objective of this work is to locate speakers of interest and then provide satisfactory speech recognition rates using a linear microphone array. The proposed approaches utilize a reference-signal-based architecture to indirectly solve a practical issue, microphone mismatch problem. Additionally, the proposed speaker’s location detection method utilizes Gaussian mixture model (GMM) to model a corresponding phase difference distribution for each specific location of the speaker. The proposed localization approach is useful in the presence of background noise and reverberations. Even under near-filed and non-line-of-sight environments, the approach can still provide high detection accuracy. In terms of effectiveness, the proposed beamformers, soft penalty frequency-domain block beamformer (SPFDBB) and frequency-domain adjustable block beamformer (FDABB) are designed in the frequency domain. However, due to the fact that the convolution relation between channel and speech source in time-domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, the proposed beamformers put several frames into a block to approximate the transformation. Furthermore, to put different emphases on channel recovery and noise suppression, a parameter named soft penalty is designed. Note that for a highly variant environment, it is not suitable to allocate too many frames into one block. Therefore, the SFPDBB is extended to the FDABB with a measurement index, named CBVI, which enables the FDABB to automatically adjust the number of frames. An H□ adaptation criterion is also investigated and applied to enhance the robustness to the modeling error. Finally, the results from simulations and practical experiments are provided as proof of the effectiveness and usefulness of these proposed approaches. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008912564 http://hdl.handle.net/11536/77057 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.