标题: | 以频带及小波分析为基础的强健性语音侦测系统之研究 A Study of Frequency Band and Wavelet Analysis for Robust Voice Activity Detection |
作者: | 王坤卿 Kun-Ching Wang 吴炳飞 Bing-Fei Wu 电控工程研究所 |
关键字: | 语音侦测;小波分析;适应性噪音估测器;子频带分解;voice activity detection;wavelet analysis;adaptive noise estimator;subband decomposition |
公开日期: | 2005 |
摘要: | 本论文主要是针对语音侦测系统(voice activity detection)在弱的讯号与噪音比值(the signal-to-noise ratio, SNR)及剧烈性的噪音程度变动下之所面临的问题作些探讨。迄今,所提出的语音侦测系统都是假定环境噪音程度是稳定的(stationary)。然而,由于传统演算法的特征参数都取决于能量的估测,因此其效能易受到实际噪音程度的变动所影响。比如在车上,剧烈的噪音变动就可能因为移动、引檠运转、车速、煞车及关车门声而经常地产生。为了要解决这个问题,我们先后提出两种具强健性(robust)特征参数为基础的语音侦测系统。在第一种方法中,根据共鸣频率(formant frequency)造成在声音光谱图(voice spectrogram)的带状性纹路(banded line)现象,我们可发现此带状性纹路可有效及简单地表示出具时变特性(time-varying property)语音的存在。透过频带分析,我们提出一个以熵为基础的语音侦测系统。首先,将讯号切成三十二个均匀大小的子频带以区隔出共振音频的分布。论文中提出一个定义在子频带上的带状性频谱熵值(banded spectrum entropy, BSE) 以充分地利用带状性纹路在声音光谱图上的固有特性。由于所切出的子频带可能被噪音干扰,为了增加BSE参数对噪音的抗杂讯能力,我们利用可适性临界方式(adaptive threshold method)的技巧,建立一个称作子频带自我撷取(subband self-extraction)的方法以能立即地撷取有效的子频带。但事实上,声音光谱图上带状性纹路现像只适合用来特征有声的语音讯号。为了要强化语音讯号的无声部份,其低频能量对全频带能量的比值(the ratio of low-band to full-band energy, RLF)可用来区隔无声语音与背景噪音特性的差别。相较于其它方法,实验结果可发现用以建立具强健性的语音侦测系统的BSE及RLF特征参数可成功地特征语音特性且不易受噪音程度变动。事实上,语音侦测技术使也在噪音估测器中扮演非常重要的角色;一般都采用语音侦测系统的技术作为判断何时追踪噪音频谱变动的指示器。为了针对噪音程度极遽变动情况下,所提出的噪音估测器加入以熵为基础的语音侦测技术并以叠代平均的方法及可调适的平滑因子为基础。 而在另一种语音侦测系统,我们利用语音的暂态及非稳定性的特性最为撷取语音讯号的依据,采以小波作为讯号的分析。首先,离散小波转换将输入讯号分成四个不均匀大小的子频带,而在每个子频带上采用一种非线性(non-linear)的Teager能量运算(Teager Energy Operator, TEO)以有效抑制噪音在各子频带的影响,而另一优点就是有助于子频带自我相关函示(spectral auto-correlation function, SACF)之结果。为了量化个子频带上的自我相关函示采用Mean-Delta(MD)运算以估测各频带的周期强度,最后并相加各子频带的MDSACF参数以建立一个以小波为基础的强健性特征参数。为了建立完整的语音侦测系统,我们采用一个可适性临界方式作为判断语音侦测结果的机制。相较于其他方法,实验结果证实了以小波为基础的语音侦测方法可提供在可变噪音程度下的强健性且具高效率及易实现的方法。 This dissertation mainly addresses the problem of a voice activity detection (VAD) failed in poor signal-to-noise ratio (SNR) and in dynamically time-varying background. So far, the commonly used VAD algorithms always assume that the background noise level is stationary. Since the feature extractions from conventional algorithms are closely depended on the estimation of energy level, the corresponding performances are easily contaminated by the variable noise-level. For example, may usually exit in car due to movements, engine running, speed change, braking, slam, etc. To solve the problem, the VAD algorithms based on two types of robust feature parameters are proposed in turn. In the first presented approach, it is found that the nature of banded line is highly efficient, compact representation for the time-varying characteristics of speech signals according to the appearance of banded line on voice spectrogram resulted from formant frequency. For frequency band analysis, an entropy-based VAD is presented herein. First, the input signal is decomposed into 32 uniform subbands to locate the formant frequency bands. A measure of entropy defined in subband domain, regarded as banded spectrum entropy (BSE) parameter, is then proposed to sufficiently exploit the inherent nature of banded lines on voice spectrogram. Due to that the some decomposed subbands can be contaminated by noise, a strategy of subband self-extraction (SSE) based on adaptive threshold skill is presented herein to execute the extraction of useful subbands with time and is further used to let the BSE be robust against to noises. The banded lines on voice spectrogram, in practice, are only suitable for characterizing voiced speech. In order to enhance the part of unvoiced speech, the ratio of low-band energy to full-band energy (RLF) is presented to discriminating the unvoiced sound from background noises. Compare to other VAD approaches, experimental results shown that the two BSE and RLF parameters used for determining voice activity successfully exploit the characteristic of speech signal and is nearly robust against variable noise level. A technology of VAD, in practice, plays an essential role in noise spectrum estimator. The VAD scheme is frequently employed into noise spectrum estimator as an indicator of updating noise spectrum. Enclosed herein the proposed noise spectrum estimation employs an entropy-based VAD above mentioned as an indicator of updating noise spectrum. In addition, a recursive averaging-based formula and an adaptive smoothing factor are then involved herein for quickly adapting to variable level of noise. In the alternative VAD method, wavelet analysis is used for extracting speech signals to further exploit the transient components and non-stationary property. First, we divide the input signal into four non-uniform subbands via discrete wavelet transform (DWT). In addition, a nonlinear Teager energy operator (TEO) is then utilized into each subband signals. We show that the TEO can decrease the influence of noise on subbands significantly. Besides, the other advantage is suitable for the result of subband auto-correlation function (SACF). To obtain the amount of periodicity, a Mean-Delta (MD) operator is then applied into SACF on each subband. Summing up the all MDSACFs derived from each decomposed subband, a robust wavelet-based feature parameter is then proposed. Finally, we adopt an adaptive threshold method as VAD decision to form a complete VAD. The simulation result shows the wavelet-based VAD is robust against changing noise level and is an efficient and simple approach as comparing with other methods. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008912803 http://hdl.handle.net/11536/77090 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.