標題: 低位元率語音編碼之研究及其即時實現
An Investigation on Low Bit-Rate Speech Coding and Its Real-Time Implementation
作者: 楊政翰
Cheng Han Yan
杭學鳴
Hsueh Ming Hang
電子研究所
關鍵字: 語音編碼;ITU-T G.723.1;MPEG-4 HVXC;Speech Coding;ITU-T G.723.1;MEPG-4 HVXC
公開日期: 1999
摘要: 語音壓縮技術已經成為無線多媒體通訊系統中一個基本的要素。本篇論文之目標在於研究語音壓縮標準在於對抗通道雜訊上表現的特性以及語音壓縮標準之即時實現。基本上我們針對兩種低位元率語音壓縮標準作研究。第一種是由國際電信聯盟 (ITU) 所提出的ITU-T G.723.1雙位元率語音壓縮標準,它是一個以Code-Excited Linear Predictive Coding (CELP) 為設計基礎的語音壓縮標準。另一種是MPEG-4規格中的一個語音壓縮標準Harmonic Vector eXcitation Coding (HVXC)。 在本篇論文中我們首先採用了一些語音品質量測的方式來評估我們所研究的兩種語音壓縮標準。評估的結果顯示G.723.1 6.3kbit/s的客觀語音品質要比MPEG-4 HVXC 4.0kbit/s來得好。但是就主觀語音品質而言,這兩種語音壓縮標準的語音品質是相似的。因為無線通道中通常都是充滿雜訊的,所以我們接下來考慮通道雜訊對語音品質的影響。Additive White Gaussion Noise (AWGN) 和Markov (Gilbert) 這兩種通道模組被用來比較不同的語音壓縮技術。模擬的結果顯示,在高雜訊的環境下,這兩種語音壓縮標準的主觀和客觀語音品質皆衰減的很迅速。為了要改善語音品質,我們針對G.723.1語音壓縮標準提出了一個錯誤隱藏的技術。而模擬的結果顯示,在經由錯誤隱藏技術的處理過後,語音訊號的主觀和客觀品質都有很明顯的改善。 除此之外,我們也利用德州儀器公司 (TI) 的數位訊號處理器 (DSP) 來即時實現G.723.1語音壓縮標準。為了要加速數位訊號處理器的實現,我們需要對C原始程式作些修改好讓我們能利用到德州儀器公司數位訊號處理器的特性。在C原始程式經過適當的調整及最佳化之後,處理一次壓縮、解壓縮所耗費的運算量是0.9百萬運算週期。這個運算量是原本所需運算量 (C原始程式沒有經過調整的情況) 的1.7 %。
The speech compression technology is an essential element in a wireless multimedia communication system. The goal of this thesis is to investigate the channel robustness properties of the standard speech coding algorithms and their real-time implementations. Essentially, two types of low bit-rate speech coding schemes have been investigated. One is the ITU-T G.723.1 dual rate speech coding, which is designed based on Code-Excited Linear Predictive Coding (CELP). The other is Harmonic Vector eXcitation Coding (HVXC), which is a part of the MPEG-4 specifications. In this thesis, we first evaluate the quality of these two speech coding schemes. Experiments show that the objective quality of the G.723.1 6.3kbit/s speech coding scheme is better than that of the MPEG-4 HVXC 4.0kbit/s speech coding scheme, but the subjective quality of these two speech coding schemes are comparable. Then, we take the noise effects into consideration because the wireless channel is generally very noisy. We compare these two algorithms using two channel error models, additive White Gaussion Noise channel (AWGN) and Markov channel (Gilbert). Our simulation indicates that the objective and subjective speech quality of both schemes degrade rapidly in a highly noisy environment. In order to improve the speech quality, an error concealment technology is proposed for the G.723.1 scheme. The results show that both the objective and subjective quality of speech after this concealment post-processing are significantly improved. Moreover, we implement the G.723.1 speech coding scheme using the Texas Instrument (TI) TMS320C6201 fixed-point digital signal processor (DSP). In order to speed up the DSP implementation, we need to modify the C source programs to take the advantages of the TI DSP features. After proper tuning and optimization on the source programs, the total computation consumption for the encoding and decoding process is about 0.9 million instruction cycles, which is 1.7% of the original computation without tuning. Table of Contents Ⅵ List of Figures ⅦList of Tables Ⅸ CHAPTER 1 INTRODUCTION 1 CHAPTER 2 ITU-T G.723.1 SPEECH CODING 3 2.1 ENCODER PRINCIPLES 3 2.1.1 General description 3 2.1.2 Pre-process of Speech Signal 4 2.1.3 LPC analysis 5 2.1.4 LSP quantizer 5 2.1.5 LSP decoder and interpolator 7 2.1.6 Formant perceptual weighting filter 8 2.1.7 Pitch estimation 8 2.1.8 Harmonic noise shaping 9 2.1.9 Weighted synthesis filter 10 2.1.10 Pitch predictor 10 2.1.10.1 Closed loop pitch criterion 10 2.1.10.2 Pitch predictor gains criterion 11 2.1.10.3 Coefficients determination 11 2.1.11 High rate excitation coding (MP-MLQ) 12 2.1.12 Low rate excitation coding (ACELP) 13 2.1.13 Bit allocation 15 2.2 DECODER PRINCIPLES 16 2.2.1 General description 16 2.2.2 LSP decoder and interpolator 17 2.2.3 Pitch decoder 17 2.2.4 Excitation decoder 18 2.2.5 Pitch postfilter 18 2.2.6 LPC synthesis filter 20 2.2.7 Format postfilter 20 2.2.8 Gain scaling unit 21 2.2.9 Frame interpolation handling 21 2.2.9.1 Residual Interpolation 22 2.2.9.2 LSP interpolation 23 CHAPTER 3 MPEG-4 SPEECH CODING--HVXC 24 3.1 ENCODER PRINCIPLES (INFORMATIVE) 24 3.1.1 General description 24 3.1.2 LPC analysis 25 3.1.3 LSP quantization 25 3.1.4 LPC inverse filter 27 3.1.5 Pitch estimation 28 3.1.6 Harmonic magnitudes extraction 28 3.1.6.1 Estimation of the spectral envelope 28 3.1.6.2 Fine pitch search 29 3.1.7 Filter definition 29 3.1.8 Harmonic VQ encoder 30 3.1.8.1 Dimension conversion 30 3.1.8.2 VQ of spectral envelope vector 31 3.1.9 Voice/Unvoice (V/UV) decision 32 3.1.10 Time domain encoder 33 3.2 DECODER PRINCIPLES (NORMATIVE) 35 3.2.1 General description 35 3.2.2 LSP decoder 36 3.2.2.1 Decoding process for the base layer 36 3.2.2.2 Decoding process for the enhancement layer 38 3.2.3 Harmonic VQ decoder 39 3.2.4 Time domain decoder 41 3.2.5 Parameter interpolation for speed control 42 3.2.6 Voiced component synthesizer 45 3.2.6.1 Harmonic magnitudes modification 45 3.2.6.2 Harmonic excitation synthesis 46 3.2.6.2.1 Generation of a waveform over one pitch period 46 3.2.6.2.2 Cyclic extension and re-sampling of waveform 47 3.2.6.3 Noise component generation 49 3.2.6.4 LPC synthesis 50 3.2.7 Unvoiced component synthesis 50 CHAPTER 4 SPEECH QUALITY ASSESSMENT 52 4.1 SQA METHODOLOGY 52 4.2 CHANNEL ERROR MODEL AND CHANNEL CODING 55 4.3 ERROR CONCEALMENT 62 CHAPTER 5 REAL-TIME IMPLEMENTATION OF SPEECH CODING SCHEME 68 5.1 INTRODUCTION TO TI TMS320C62XX 68 5.2 REAL-TIME IMPLEMENTATION 70 5.2.1 Software development for DSP 70 5.2.2 Code Composer 71 5.2.3 C source file modification 71 5.2.4 Code analysis 72 5.2.5 Code optimization 73 5.2.5.1 C compiler optimization 73 5.2.5.2 C source file refinement 75 5.2.6 Optimization result 81 5.3 I/O MODULE (PMC) 82 5.3.1 Hardware overview 82 5.3.2 Software overview 83 5.3.3 User application code 84 CHAPTER 6 CONCLUSIONS AND FUTURE WORK 85 6.1 CONCLUSIONS 85 6.2 FEATURE WORK 86 BIBLIOGRAPHY 88
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880428016
http://hdl.handle.net/11536/65648
顯示於類別:畢業論文