Title: 以多頻帶為基礎的低位元率語音編碼技術
Multiband-based Low Bit Rate Speech Coding Technique
Authors: 羅述武
Lo, Shu-Wu
Lin Chin-Teng
Keywords: 線性預估編碼;混頻式弦波產生器;包絡形狀;多頻帶有聲/無聲的決策;LPC vocoder;vibrated sinusoidal wave generator;envelope shape;multiband voiced/unvoiced decision
Issue Date: 1996
Abstract: 隨著通訊網路在我們社會快速地擴展,低位元率、高品質的編碼技術已不
術 (LPC vocoder) 雖然能夠以2.4 Kbps之位元率產生可理解的語音,但
個有聲音框用週期性脈衝列 (periodic pulsetrain) 表示,僅以一個位
出結合了傳統LPC vocoder的結構和MBE 編碼器的構想的以多頻帶為基礎
之 (multiband-based) LPC vocoder。為了產生更自然的合成語音,我們
使用混頻式弦波產生器 (vibrated sinusoidal wave generator),基於
包絡形狀(envelope shape)的增益評估,和多頻帶有聲/無聲的決策(
multiband voiced/unvoiced decision) 等技術。混頻式弦波產生器能減
每一個頻帶做出有聲/無聲的判斷。雖然以多頻帶為基礎的LPC vocoder的
adaptive postfilter)來改善合成語音的聽覺品質。此外,我們使用格子
狀向量量化 (lattice vector quantization) 技術和2-D DLSPQ
(2-Dimensional differential LSP Quantization)技術,來量化線頻譜
對(line spectrum pair)參數。這兩種技術的特性皆是在沒有減少語音品
計算量。最後,我們將使用這個以多頻帶為基礎的LPC vocoder和一些其
As telecommunication networks spread rapidly through our modern
society, low bit rates and high quality coding techniques have
become an important research field. In various information
types, the speech signal is the most fundamental one, so we
focus on the speech coding technique in this research. The
traditional linear predictive coding (LPC) vocoder can produce
intelligible speech at 2.4 kbps, but it often generates
unnatural sounds such as buzzes, thumps, and tonal noises.
Especially in the noisy environment, the quality of synthetic
speech can*t be accepted. These problems arise from a periodic
pulse train, the voicing decision with only one bit, the
inaccurate gain estimation for every frame, etc.This
dissertation describes a speech coding technique which has low
bit rate and high quality, even in the noisy environment. In
this dissertation, we presents a multiband-based LPC vocoder
that is based on the traditional LPC vocoder structure and the
MBE coder. In this coder, we use the vibrated sinusoidal wave
generator, the gain estimation based on envelope shape, and the
multiband voiced/unvoiced decision to reproduce more natural
synthetic speech. The vibrated sinusoidal wave generator can
reduce sharp spectral peaks in the LPC spectrum which may result
in unnatural sounds. The gain estimation is performed using a
closed-loop analysis-by-synthesis technique, inwhich the
envelope shapes of the original and synthetic speech signals are
matched to obtain a cleaner and smoother speech output. The
multiband voiced/unvoiced decision scheme transforms the speech
signal to the spectrum in the frequency domain and divides the
full frequency domain into several bands, and then decides the
voiced/unvoiced status for each band. Although the performance
of the improved LPC vocoder at 2.4 Kbps is acceptable, it is
still perceived to be rough or noisy. Hence, we use the adaptive
postfilter to improve the perceptual quality of the synthetic
speech. Moreover, we use the lattice vector quantization (LVQ)
technique and the 2-D DLSPQ (2-Dimensional differential LSP
Quantization) technique to quantize the Line Spectrum Pair (LSP)
parameters. These two techniques use as few bits as possible to
quantize the LSP parameters without reducing the speech quality,
and require smaller memory size as well as low computational
complexity. Finally, we compare the multiband-based LPC vocoder
to other popular speech coding techniques.
Appears in Collections:Thesis