以多頻帶為基礎的低位元率語音編碼技術

標題:	以多頻帶為基礎的低位元率語音編碼技術 Multiband-based Low Bit Rate Speech Coding Technique
作者:	羅述武 Lo, Shu-Wu 林進燈 Lin Chin-Teng 電控工程研究所
關鍵字:	線性預估編碼;混頻式弦波產生器;包絡形狀;多頻帶有聲/無聲的決策;LPC vocoder;vibrated sinusoidal wave generator;envelope shape;multiband voiced/unvoiced decision
公開日期:	1996
摘要:	隨著通訊網路在我們社會快速地擴展，低位元率、高品質的編碼技術已不可避免地成為一個重要的研究領域。在眾多的網路資訊中，語音是最基礎的一種訊號，所以語音編碼是本論文的研究方向。傳統的線性預估編碼技術 (LPC vocoder) 雖然能夠以2.4 Kbps之位元率產生可理解的語音，但是它們時常產生不自然的聲音，如嗡嗡聲、砰砰聲、與音調雜音，尤其在噪音下，合成音的品質更無法令人接受。這些問題起源於在傳統技術中每個有聲音框用週期性脈衝列 (periodic pulsetrain) 表示，僅以一個位元來決定有聲或無聲，及不精確的增益評估等等。這篇論文旨在說明一個即使在有噪音的環境下,仍有高音質的低位元率語音編碼技術。本論文提出結合了傳統LPC vocoder的結構和MBE 編碼器的構想的以多頻帶為基礎之 (multiband-based) LPC vocoder。為了產生更自然的合成語音，我們使用混頻式弦波產生器 (vibrated sinusoidal wave generator)，基於包絡形狀(envelope shape)的增益評估，和多頻帶有聲/無聲的決策( multiband voiced/unvoiced decision) 等技術。混頻式弦波產生器能減少在LPC頻譜的尖峭端點所導致的不自然聲音。增益評估是使用一個閉迴路分析合成法技術來執行，它使原始語音訊號的包絡形狀能與合成語音訊號的包絡形狀一致，以獲得更乾淨、更平滑的語音輸出。多頻帶有聲/無聲的決策則是將聲音訊號轉換到頻域上，並將頻域切成數個頻帶，再針對每一個頻帶做出有聲/無聲的判斷。雖然以多頻帶為基礎的LPC vocoder的性能是可接受的，但仍然有雜音。因此，我們使用適應性後處理濾波器( adaptive postfilter)來改善合成語音的聽覺品質。此外，我們使用格子狀向量量化 (lattice vector quantization) 技術和2-D DLSPQ (2-Dimensional differential LSP Quantization)技術，來量化線頻譜對(line spectrum pair)參數。這兩種技術的特性皆是在沒有減少語音品質的情況下，儘可能地使用較少位元，並僅需要較少記憶體和低複雜度的計算量。最後，我們將使用這個以多頻帶為基礎的LPC vocoder和一些其他常見的語音壓縮技術做個比較。 As telecommunication networks spread rapidly through our modern society, low bit rates and high quality coding techniques have become an important research field. In various information types, the speech signal is the most fundamental one, so we focus on the speech coding technique in this research. The traditional linear predictive coding (LPC) vocoder can produce intelligible speech at 2.4 kbps, but it often generates unnatural sounds such as buzzes, thumps, and tonal noises. Especially in the noisy environment, the quality of synthetic speech can*t be accepted. These problems arise from a periodic pulse train, the voicing decision with only one bit, the inaccurate gain estimation for every frame, etc.This dissertation describes a speech coding technique which has low bit rate and high quality, even in the noisy environment. In this dissertation, we presents a multiband-based LPC vocoder that is based on the traditional LPC vocoder structure and the MBE coder. In this coder, we use the vibrated sinusoidal wave generator, the gain estimation based on envelope shape, and the multiband voiced/unvoiced decision to reproduce more natural synthetic speech. The vibrated sinusoidal wave generator can reduce sharp spectral peaks in the LPC spectrum which may result in unnatural sounds. The gain estimation is performed using a closed-loop analysis-by-synthesis technique, inwhich the envelope shapes of the original and synthetic speech signals are matched to obtain a cleaner and smoother speech output. The multiband voiced/unvoiced decision scheme transforms the speech signal to the spectrum in the frequency domain and divides the full frequency domain into several bands, and then decides the voiced/unvoiced status for each band. Although the performance of the improved LPC vocoder at 2.4 Kbps is acceptable, it is still perceived to be rough or noisy. Hence, we use the adaptive postfilter to improve the perceptual quality of the synthetic speech. Moreover, we use the lattice vector quantization (LVQ) technique and the 2-D DLSPQ (2-Dimensional differential LSP Quantization) technique to quantize the Line Spectrum Pair (LSP) parameters. These two techniques use as few bits as possible to quantize the LSP parameters without reducing the speech quality, and require smaller memory size as well as low computational complexity. Finally, we compare the multiband-based LPC vocoder to other popular speech coding techniques.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT850327046 http://hdl.handle.net/11536/61703
Appears in Collections:	Thesis