標題: 以信心值量測為基礎之麥克風陣列強健性語音辨識技術
Confidence Measure Based Dual-Microphone Robust Speech Recognition
作者: 莊孟魁
Chuang, Meng-Kuei
林寶樹
張森嘉
Lin, Bao-Shuh
Chang, Sen-Chia
多媒體工程研究所
關鍵字: 雙耳時間差;麥克風陣列;語音增強;語音辨識;Interaural Time Diffrence;Microphone Array;Specch Enhancement;Speech Recognition
公開日期: 2010
摘要: 近年來,因為其顯著的語音增強(Speech Enhancement)效能,雙麥克風(Dual-Microphone)抗噪技術逐漸受到重視。 本論文發展一種在雙麥克風下,以信心值(Confidence Measure, CM)為基礎,自動快速挑選雙耳時間差(Interaural Time Difference, ITD) 範圍以增進語音辨識正確率之技術。首先驗證最大信心值( Maximum Confidence Measure, MCM)和辨識結果的關係。接著為了提升整體運算效能,利用K-means 分群演算法將語音模型進行分類與簡化,使模型類別數減少。最後以工研院玩具遙控車語音命令語料庫進行測試,在雜訊角度位於30度及60度、訊噪比( Signal-to-Noise Ratio, SNR)為0dB的情況下進行測試,其辨識率可從原本的10%提升至約90%。
In recent years, dual-microphone plays an importance role in noise suppression gradually due to the significant speech enhancement performance. In this thesis, we develop a fast and automatic procedure based on confidence measure (CM) to choose a threshold of interaural time difference (ITD) used in dual-microphone robust speech recognition. First of all, we verify the relation between the maximum confidence measure (MCM) and the recognition rate. Secondly, we cluster the speech models by K-means algorithm to reduce the computation load in our system. In a voice command experiment a toy car voice command corpus from Industrial Technology Research Institute database was used to test the performance of our method. When the input signal-to-noise ratio (SNR) is 0 dB with noise located at 30 degrees or 60 degrees, the recognition accuracy was improved from10% to 90% .
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079857534
http://hdl.handle.net/11536/48456
Appears in Collections:Thesis