標題: 2.4 Kbps 位元率語音編碼技術
A 2.4 Kbps Bit Rate Speech Coding Technique
作者: 林信安
Lin, Hsin-An
林進燈
Lin Chin-Teng
電控工程研究所
關鍵字: 改良型線性預估編碼 Vocoder;非週期性脈衝;四分之一音框之有聲/無聲的決策;包封型狀;格子狀向量量化;Improved Linear Predictive Coding Vocoder;Aperiodic Pulse;Quarter Voiced/Unvoiced Decision;Envelope Shape;Lattice Vector Quantization
公開日期: 1995
摘要: 廣為人知的 F.S.1016 CELP 4.8 Kbps技術不僅能產生低位元率的語 音壓縮,而且能保持高音質的合成語音。然而,因為通訊頻道容量 和貯存量被限制,所以在現今以較低位元率(低於4.8 Kbps)來表 示語音訊號是重要的。傳統的線性預佑編碼vocoder( LPC vocoder) 能夠在2.4 Kbps產生可理解的語音,但是它們時常產生不自然的聲 音,如嗡嗡聲、砰砰聲、與音調雜音。這些問題起源於每個音框用 週期性脈衝列( periodic pulse train ),僅以一個位元來決定有聲或無 聲,和不正確的增益評估。在這論文提出改良型 LPC vocoder,它 是基於傳統LPC vocoder結構,在這個編碼器,為了產生更自然的 合成語音,我們將使用非週期性脈衝( aperiodic pulse ),四分之一 音框之有聲/無聲的決策( qarter voiced /nvoiced decision ),和基於 包絡形狀( envelope shape )的增益評估。非週期性脈衝能減少在LPC 頻譜的尖峭端點所導致不自然聲音,四分之一音框之有聲/無聲的決 策是把語音訊號的音框區分成四個次音框,再對每個次音框來決定 有聲或無聲,增益評估是使用一個閉迴路分析合成法技術來執行, 它使原始語音訊號的包絡形狀能與合成語音訊號的包絡形狀一 致,來獲得更乾淨、更平滑的語音輸出。雖然改良型 LPC vocoder 的性能是可接受的,但仍然有雜音。因此,我們使用適應性後處理 濾波器( adaptive postfilter )來改善合成語音的聽覺品質。此外,我 們使用格子狀向量量化( Lattice Vector Quantization )技術的特性(儘 可能使用較少位元而沒有減少語音品質),來量化線頻譜對( Linear Spectrum Pair )參數,這個LVQ僅需要較少記憶體和低複雜度的計 算量。平均意見分數( Mean Opinion Score )指出改良型 LPC vocoder 所實現的音質優於現存LPC-10版本。 The well-known F. S. 1016 CELP 4.8 kbps technique can not only produce low bit-rate compressed speech but also maintain high quality of synthetic speech. However, because the communication channel capacity and the storage is getting limited, it is important to represent the speech signal at lower bit-rate (less than 4.8 Kbps) nowadays. Traditional linear predictive coding (LPC) vocoders can produce intelligible speech at 2.4 kbps, but they often generate unnatural sounds such as buzzes, thumps, and tonal noises. These problems arise from a periodic pulse train, the voicing decision with only one bit, and the inaccurate gain estimation for every frame. This dissertation presents an improved LPC vocoder based on the traditional LPC vocoder structure. In this coder, we use the aperiodic pulse scheme, the quarter voiced/unvoiced decision, and the gain estimation based on envelope shape to reproduce more natural synthetic speech. The aperiodic pulse can reduce sharp spectral peaks in the LPC spectrum which may result in unnatural sounds. The quarter voiced/unvoiced decision scheme divides one frame of speech signal into four subframes, and each subframe is determined to be either voiced or unvoiced. The gain estimation is performed using a closed-loop analysis-by-synthesis technique, in which the envelope shapes of the original and synthetic speech signals are matched to obtain a cleaner and smoother speech output. Although the performance of the improved LPC vocoder at 2.4 Kbps is acceptable, it is still perceived to be rough or noisy. Hence, we use the adaptive postfilter to improve the perceptual quality of the synthetic speech. Moreover, we use the lattice vector quantization (LVQ) technique to quantize the Line Spectrum Pair (LSP) parameters by using as few bits as possible without reducing the speech quality. The LVQ requires smaller memory size as well as low computational complexity. The mean opinion score (MOS) shows that the improved LPC vocoder achieves the quality superior to that of the existing LPC-10 versions.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840327068
http://hdl.handle.net/11536/60328
顯示於類別:畢業論文