标题: 以信心值量测为基础之麦克风阵列强健性语音辨识技术
Confidence Measure Based Dual-Microphone Robust Speech Recognition
作者: 庄孟魁
Chuang, Meng-Kuei
林宝树
张森嘉
Lin, Bao-Shuh
Chang, Sen-Chia
多媒体工程研究所
关键字: 双耳时间差;麦克风阵列;语音增强;语音辨识;Interaural Time Diffrence;Microphone Array;Specch Enhancement;Speech Recognition
公开日期: 2010
摘要: 近年来,因为其显着的语音增强(Speech Enhancement)效能,双麦克风(Dual-Microphone)抗噪技术逐渐受到重视。 本论文发展一种在双麦克风下,以信心值(Confidence Measure, CM)为基础,自动快速挑选双耳时间差(Interaural Time Difference, ITD) 范围以增进语音辨识正确率之技术。首先验证最大信心值( Maximum Confidence Measure, MCM)和辨识结果的关系。接着为了提升整体运算效能,利用K-means 分群演算法将语音模型进行分类与简化,使模型类别数减少。最后以工研院玩具遥控车语音命令语料库进行测试,在杂讯角度位于30度及60度、讯噪比( Signal-to-Noise Ratio, SNR)为0dB的情况下进行测试,其辨识率可从原本的10%提升至约90%。
In recent years, dual-microphone plays an importance role in noise suppression gradually due to the significant speech enhancement performance. In this thesis, we develop a fast and automatic procedure based on confidence measure (CM) to choose a threshold of interaural time difference (ITD) used in dual-microphone robust speech recognition. First of all, we verify the relation between the maximum confidence measure (MCM) and the recognition rate. Secondly, we cluster the speech models by K-means algorithm to reduce the computation load in our system. In a voice command experiment a toy car voice command corpus from Industrial Technology Research Institute database was used to test the performance of our method. When the input signal-to-noise ratio (SNR) is 0 dB with noise located at 30 degrees or 60 degrees, the recognition accuracy was improved from10% to 90% .
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079857534
http://hdl.handle.net/11536/48456
显示于类别:Thesis