適合助聽器的噪音降低與回授消除演算法

標題:	適合助聽器的噪音降低與回授消除演算法 Noise Reduction and feedback cancellation algorithm for hearing aids
作者:	范姜毅 Fan, Chiang-Yi 周世傑 Jou, Shyh-Jye 電子工程學系電子研究所
關鍵字:	噪音降低;回授消除;助聽器;noise reduction;feedback cancellation;hearing aids
公開日期:	2014
摘要:	本論文提出適用於助聽器的語音偵測，噪音降低，動態範圍壓縮，回授消除演算法。語音偵測演算法是用來分辨語音區間和噪音區間，以協助其它助聽器的功能。噪音降低演算法是用來增加語音的解理程度和聆聽的舒適度。回授消除演算法是用來消除回授的聲音，以確保高增益的助聽器仍具有穩定性。動態範圍壓縮是為了讓助聽器收到的聲音轉換到聽損患者的殘餘聽力範圍。這些區塊都是助聽器非常重要的數位訊號處理單元。因此，本論文目標是提出低複雜度且具有高性能的演算法來實現這些數位訊號處理單元。針對噪音降低演算法，本論文提出了一種低運算複雜度的神經噪音降低演算法，適合單音節助聽器系統的應用，如華語助聽器。我們提出了音高的語音活動偵測和神經形態的噪音降低演算法，以增強語音並降低噪音。語音活動偵測演算法是針對ANSI S1.11濾波器組做開發，並採用單音節特性和非線性能量算子的方式，提高語音偵測的準確度。而神經形態的噪音降低演算法利用人類聽覺系統的特性和單音節語言的特性，降低了背景噪音。模擬結果表示該此演算法能夠在0dB的SNR時，達到約80\%的語音偵測正確度和4dB的SNR的改善，因此，提出的演算法能滿足輕度的聽損患者。為了滿足中度或中重度的聽損患者，演算法要能提供6dB或8dB的SNR改進。另外，為了實現動態範圍壓縮，需要考慮噪音降低演算法和動態範圍壓縮的互相作用。本論文提出了一種基於onset的噪音降低演算法並具有兩條動態範圍壓縮曲線以實現更高的SNR和語音品質改進和解決噪音降低和動態範圍壓縮的衝突問題。此演算法是針對10ms的Quasi-ANSI濾波器組設計的。我們提出使用兩種噪音降低增益曲線和六種噪音降低增益，以增加噪音降低演算法的效能。為了進一步提高表現噪音降低的效能，中頻率範圍的閾值將被最佳化以增加低頻帶和高頻帶的信號的連續性。另一方面，當噪音降低和動態範圍壓縮是串聯的情況下，來自噪音降低演算法的SNR和語音品質都會變差。因此，本論文提出使用兩條動態範圍壓縮曲線去解決問題，一個動態範圍壓縮曲線是給語音區間的，而另一條動態範圍壓縮曲線是給噪音區間的。模擬結果顯示了證明了此噪音降低演算法能提供8dB SNR的改進。跟只有一條具動態範圍壓縮曲線的方法相比，兩條動態範圍壓縮曲線能有效維持噪音降低演算法的效能。在複雜度方面，我們提出的神經噪音降低演算法和onset噪音降低演算法約能比文獻中演算法少了約50\%和35\%的乘法量，而且onset噪音降低演算法能提供最高的SNR改進。針於雙耳助聽器的語音偵測演算法，本論文提出了一種融合兩種語音偵測方式的演算法，即音高式語音偵測和雙耳互相關性語音偵測。因以單音節語音應用為目標，所以單音節語音的語音特性可以被利用。音高式語音偵測在白雜訊和汽車噪音下的語音偵測正確度很高，但在人聲噪音下的語音偵測正確度就不好。相反的，雙耳互相關性語音偵測在各種噪音都有不錯的語音偵測正確度，因為雙耳互相關性語音偵測是利用空間關係，因此此語音偵測方法和噪音種類是較無相關性的，不過其方法的缺點是雙耳傳輸是極耗能量的。因此，我們提出人聲噪音判斷器去判斷是否噪音的能量分佈是接近人聲噪音的，若是，就需要使用雙耳互相關性語音偵測。模擬結果顯示，融合語音偵測演算法在白噪聲和汽車噪聲下，約有90\%的正確率，而在工廠噪音和人聲噪音的情況下，約有81\%的正確率。模擬結果顯示，噪音降低演算法大致上是和語音偵測正確率是成正比，故本論文提出的語音偵測演算法能擁有較高的SNR改進。與文獻中的語音偵測方法相比，模擬結果顯示提出的融合語音偵測演算法在不同噪音環境下都能有最好的語音偵測正確度。針對回授消除演算法，本論文提出了一種基於預估共振峰的演算法去近似Prediction Error Method。跟Prediction Error Method相比，我們能顯著的降低十的四次方的運算複雜度。本論文利用順向路徑處理去完成基於共振峰基音的估計器，也就是去相關濾波器係數的更新和語音偵測演算法。這些有利於在反向路徑的回授消除濾波器能減少對語音品質的傷害。從助聽器系統的觀點來看，提出的演算法不但具有低的運算複雜度，而且也很容易與助聽器系統中的其他運算單元共用計算資源。此外，提出的演算法由於其規則的結構，是非常適合用於硬體實現的。模擬結果顯示，提出的方法和Prediction Error Method能達到接近的語音品質和最大穩定收益。跟傳統的回授消除方法相比(沒有去相關濾波器的方法)，提出的演算法的語音品質和最大穩定收益是遠優於傳統方法的。 This dissertation proposes algorithms for digital hearing aids (HAs). The algorithms include voice activity detection (VAD), noise reduction (NR), feedback cancellation (FC), and dynamic range compression (DRC). VAD is used to indicate the speech periods and noise periods to assist NR, DRC, and FC processing. NR is used to reduce the noise for the speech intelligence and listening comfort. FC is used to cancel the feedback sound for the stability in high-gain HAs. DRC is adopted to match the residual range of the hearing loss patients. These functions are very important for digital HAs. Thus, this dissertation aims to propose high performance algorithms with low complexity for these functions. This dissertation presents a low computational complexity hardware-oriented neuromorphic pitch based noise reduction algorithm for monosyllable HA applications. The proposed NR design consists of a pitch-based voice activity detection for speech detection and a neuromorphic noise reduction for speech enhancement. The pitch-based VAD is developed on ANSI S1.11 based filter bank architecture and employs the characteristics of monosyllable and nonlinear energy operator to improve the VAD accuracy. The neuromorphic noise reduction reduces the background noise by using the characteristics of the human hearing system and the clues of speech. Simulations show the proposed algorithm can provide about 80\% VAD accuracy and 4dB signal-to-noise ratio (SNR) improvement at 0dB SNR, which can satisfy the requirement of the mild hearing loss patients. To meet the requirement of the moderate or moderately severe hearing loss patients and to eliminate the contradiction between NR and DRC. An onset based noise reduction (ONR) with two dynamic range compression (T-DRC) is proposed for HA systems. The ONR is proposed to achieve higher SNR and perception evaluation of speech quality (PESQ), compared with the neuromorphic pitch based noise reduction algorithm. The ONR is implemented with a 10ms quasi-ANSI S1.11 1/3 octave based filter bank. The ONR uses two noise reduction gain curves for different levels of speech energy and six gain levels for different onset energy levels. To further improve the ONR performance, thresholds of gain levels of middle frequency subbands are refined to have a smoother threshold between low frequency subbands and high frequency subbands. When a series concatenation of ONR and dynamic range compression is used, the SNR and PESQ enhancement obtained from the ONR can be degraded. Thus, the T-DRC uses one DRC with normal compression for speech periods and another DRC with higher compression for noise periods based on the applied gain calculated from the ONR. Compared to commonly used methods for HAs, the ONR can achieve higher SNR and comparable PESQ by using only 60\% to 65\% multiplication operations. Also, simulation results show that the ONR with T-DRC can achieve better SNR and PESQ enhancement compared to the ONR without T-DRC. For binaural HAs applications, the dissertation proposes a fusion of two VAD algorithms, namely the pitch-based VAD (PBVAD) and the binaural cross-correlation based VAD (BCRVAD) with the aim of increasing the overall VAD accuracy obtained with different noise types. The proposed algorithm has low complexity and thus is suitable for practical binaural HAs applications. Furthermore, monosyllable speech applications are targeted and so specific speech characteristics can be exploited. The pitch-based VAD algorithm can achieve high accuracy in white and car noise by incorporating known properties of the human hearing system and monosyllable speech characteristics. On the other hand, the computationally more expensive binaural cross-correlation based VAD can achieve excellent accuracy in babble and factory noise by exploiting the spatial cues and monosyllable speech characteristics. With the aim of achieving high accuracy in different noise types and low computational complexity, a babble noise detector is introduced to activate the binaural cross-correlation based VAD algorithm only during babble noise and factory noise periods. The resulting fusion VAD algorithm achieves about 90\% VAD accuracy for white and car noise and about 81\% for babble and factory noise. It also shows that noise reduction performance for SNR improvement is in general proportional to VAD performance, so the fusion VAD can lead to higher SNR improvement. Comparisons with previous methods for binaural HAs are carried out to show that the proposed algorithm achieves superior VAD accuracy in all noise types. For feedback cancellation algorithm, this dissertation proposes a novel algorithm and architecture for the adaptive feedback cancellation (AFC) based on the pitch and the formant information for HA applications. The proposed method, named as Pitch based Formant Estimation (PFE-AFC), has significantly low complexity compared to Prediction Error Method AFC (PEM-AFC). The proposed PFE-AFC consists of a forward and a backward path processing. The forward path processing includes a low complexity pitch based formant estimator for decorrelation filter coefficients update and a pitch-based VAD for speech detection, which facilitates the feedback cancellation filter in the backward path to reduce feedback component and maintain speech quality. From system point of view, the PFE-AFC has low complexity overhead since it is easy to share computation resource with other components in the HA system, such as VAD and NR. In addition, the PFE-AFC is suitable for hardware implementation owing to its regular structure. Complexity evaluations show that the PFE-AFC has four orders lower complexity than the PEM-AFC. Simulation results show that the PFE-AFC and the PEM-AFC can achieve similar PESQ and added stable gain. Moreover, the proposed PFE-AFC can outperform the conventional AFC in PESQ and added stable gain.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079711829 http://hdl.handle.net/11536/75781
Appears in Collections:	Thesis