標題: 深度學習估計時頻調變域之理想二元遮罩以保留語音理解度
Rate-Scale Ideal Binary Mask Estimation Using Deep Learning to Preserve Speech Intelligibility
作者: 劉兆倫
冀泰石
Liu, Chao-Lun
Chi, Tai-Shih
電機工程學系
關鍵字: 語音理解度;深度學習;理想二元遮罩;時頻調變域;speech intelligibility;deep learning;ideal binary mask;rate-scale
公開日期: 2016
摘要: 為了驗證理想二元遮罩對於助聽器及人工電子耳的幫助,我們發展出了一項新的時頻調變域的理想二元遮罩,解決了理想二元遮罩稀疏的問題。我們先對正常聽力受試者進行聽力實驗,目的是為了證明時頻調變域的理想二元遮罩保留語音理解度的能力與理想二元遮罩相同,並從實驗中尋找最佳的參數解;接著我們利用深度學習的方法來估計經過遮罩後的語音,我們比較了深度神經網路以及深度遞迴神經網路不同音框數及不同架構下的客觀語音理解度,經由實驗我們發現到,深度遞迴神經網路處理迴響的能力比同架構下的深度神經網路效果還要好;最後我們提出了在小架構下處理迴響的方法,由於人工電子耳只能使用二元遮罩,又礙於硬體上的限制,未來若能開發出新的演算法在人工電子耳上,我們提供各種運算量上的評分標準可供各種不同運算量的演算法選擇合適的理想二元遮罩。
We propose a new binary mask, the rate-scale ideal binary mask (RS_IBM), as a variation of the ideal binary mask (IBM) to address possible problems caused by the sparsity of IBM. The RS_IBM is built based on the spectro-temporal modulation energy instead of the total energy. In this thesis, we first conducted subjective listening tests on normal-hearing people to verify RS_IBM can preserve speech intelligibility as the ideal reverberation mask (IRM) in reverberant environments. After verifying the efficacy of RS_IBM, we used deep learning techniques to learn its masking effect on dereverberation. For learning such effect, we used a deep neural network (DNN) and a deep recurrent neural network (RNN) with various settings. Under comparable network structures, simulation results showed RNN has higher dereverberation effect than DNN. Finally, we used a RNN to directly estimate the RS_IBM for cochlear implants and provided a performance table for different RNN settings. Developers of the cochlear implant can then choose a particular RNN based on the computation limitation of the cochlear implant for optimally estimating the RS_IBM.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350726
http://hdl.handle.net/11536/139085
顯示於類別:畢業論文