適用於華語數位助聽器之低延遲且類ANSI S1.11 1/3-octave規範濾波器組的音高式噪音消除與語音偵測輔助之廣泛動態範圍壓縮技術設計

標題:	適用於華語數位助聽器之低延遲且類ANSI S1.11 1/3-octave規範濾波器組的音高式噪音消除與語音偵測輔助之廣泛動態範圍壓縮技術設計 Design of Pitch Based Noise Reduction Adopting Low Latency Quasi ANSI S1.11 1/3 Octave Filter Bank and VAD-based Wide Dynamic Range Compression for Mandarin Digital Hearing Aid System
作者:	黃義政 Huang, Yi-Cheng 周世傑 Jou, Shyh-Jye 電子工程學系電子研究所
關鍵字:	助聽器;雜訊消除;動態範圍壓縮;語音區間偵測;華語;音高;動態背景環境;hearing aids;noise reduction;dynamic range compression;voice activity detection;mandarin;pitch;non-stationary background environment
公開日期:	2012
摘要:	在本論文中，我們提出一套採用低延遲的類ANSI 1/3 octave濾波器組且適合實現於助聽器系統的音高式雜訊消除系統與語音偵測基準之廣泛動態範圍壓縮技術。所提出的音高式雜訊消除系統包含一個音高式語音偵測器與仰賴子音起始的雜訊抑制器，而且使用語音的特性如音高與相對應之和諧音、子音起始和單音節字長度的時間。由於quasi ANSI濾波器組有低解析度的缺點，提出的音高式語音偵測器將音高與子音起始特性跟彈性和諧音偵測器整合在一起來提升語音偵測器的準度，而提出的仰賴子音起始的雜訊抑制器是設計來克服濾波器組的低解析度。除此之外，一個長期平均能量更新機制被使用來增進子音起始特性的偵測率，模擬的結果顯示，提出的音高式雜訊消除系統能同時在靜態背景雜訊環境與高動態背景雜訊環境有好的表現，提出的音高式語音偵測器的準度結果是可以與採用高解析度ANSI濾波器組的音高式語音偵測器相比的，平均準度可以分別在靜態與動態背景雜訊環境裡達到83.70%與85.70%。而提出的仰賴子音起始的雜訊抑制器的語音區段訊雜比和語音訊雜比在靜態背景雜訊環境中，平均改進5.95dB和9.12dB，在動態背景雜訊環境裡平均改進6.49dB和9.47dB。另外，語音品質(PESQ)在靜態與動態背景雜訊環境裡平均改進0.19和0.22。再來是提出的語音偵測基準之廣泛動態範圍壓縮技術，它可以提升語音與雜訊之間的能量差。由於廣泛動態範圍壓縮技術演算法通常是在沒有考慮背景雜訊的乾淨語音環境中設計的，被廣泛動態範圍壓縮技術衰減的高能量語音，程度可能會比低能量的背景雜訊多，當雜訊消除系統與廣泛動態範圍壓縮技術一起使用時，這會造成交互干擾的影響，雜訊消除系統的效能可能會因為廣泛動態範圍壓縮技術而衰減，因而縮減的語音與雜訊間的能量差，導致語音辨識度降低。有了來自雜訊消除系統的語音偵測結果幫忙，廣泛動態範圍壓縮技術可以針對語音區段和雜訊區段做不同的處理來增加語音辨識度，由模擬的結果可以看出語音偵測基準之廣泛動態範圍壓縮技術，對於減少雜訊消除系統與廣泛動態範圍壓縮技術之間的交互干擾影響是有益處的。所提出的音高式雜訊消除系統與語音偵測基準之廣泛動態範圍壓縮技術的運算複雜度是低的，而且一點改良的代價可以換來很好的效能。最後，提出的演算法包含類ANSI濾波器組的總延遲只有11.3ms，這是符合助聽器系統的要求而且適合應用在助聽器系統上。 In this thesis, we propose a pitch based noise reduction (NR) system and a VAD-based wide dynamic range compression (WDRC) which adopts a quasi-ANSI 1/3 octave filter bank with low group delay for realistic implementation in hearing aids (HA) systems. The proposed pitch based NR includes a pitch based voice activity detection (VAD) and onset-depended noise attenuation (ONA). The characteristics of speech such as pitch and corresponding harmonics, onset, and time of monosyllable word length are utilized by the proposed pitch based NR. Due to the drawback of low resolution resulted from quasi ASNI filter bank, the proposed pitch based VAD integrates the pitch and onset features with the flexible harmonics detection to improve the accuracy of VAD. The proposed ONA is designed to conquer the poor resolution of the filter bank. In addition, an update mechanism of long-term average magnitude is employed to enhance the detection of onset feature. The simulation results show that the proposed pitch based NR can perform well in both stationary (the situation that user is still) background noise environment and highly dynamic (the situation that user is moving) background noise environment. The accuracy results of proposed pitch based VAD are comparable with the pitch based VAD adopting ANSI filter bank which has high resolution. The average accuracy of proposed pitch based VAD is about 83.70% and 85.70% in stationary and dynamic noise situations respectively. And the average improvement of segmental signal-noise-ratio (SNRseg) and signal-noise-ratio (SNR) of the proposed ONA is 5.95dB and 9.12dB in stationary noise environment and 6.49dB and 9.47dB in dynamic noise environment. Moreover, the average improvement of sound quality (PESQ) is 0.19 and 0.22 in stationary and dynamic noise environments respectively. The proposed VAD-based WDRC enhances the energy difference between speech and noise. Because the WDRC algorithms are usually developed on clean speech scenarios without considering the presence of background noise, the high energy of speech may be suppressed more than low energy of background noise due to the characteristic of WDRC. This incurs the undesired interaction effect when NR and WDRC are connected. The performance of NR might be degraded by WDRC block. Thus, the energy difference between speech and noise is decreased and degrades the speech intelligibility. With the help of VAD information from NR block, WDRC can perform different operations to speech regions and noise regions and increases the speech intelligibility. The simulation results show that the proposed VAD-based WDRC has benefit to reduce the undesired interaction effect between NR and WDRC. For the proposed pitch based NR and VAD-based WDRC, the computational complexity of the proposed algorithms is low and the slight cost of modifications could exchange the outstanding performance. Finally, the total latency of the proposed algorithm including the quasi ANSI filter bank is only 11.3ms which matches the requirement of HA system and is suitable for the HA applications.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT070050240 http://hdl.handle.net/11536/72888
Appears in Collections:	Thesis