新型控制適應性預估步階語音編碼器

標題:	新型控制適應性預估步階語音編碼器 A New Controlled Adaptive Prediction Delta Modulation Speech Coder
作者:	劉家宏 Chia-Horng Liu 黃家齊 Chia-Chi Huang 電信工程研究所
關鍵字:	語音編碼;無線個人通訊系統;長時間步階估計;短時間步階估計;適應性估計;語音週期;適應性後置濾波器;語音有效期檢測;speech coding;personal communication system;syllabic companding;instantaneous companding;adaptive prediction;pitch period;adaptive post filter;voice activity detection
公開日期:	2000
摘要:	在本論文中，我們提出了一個中速率且具有高編碼品質、高可靠度、及低複雜度的新型語音編碼器，我們稱之為控制適應性預估步階語音編碼器（CAPDM）。這種新型的語音編碼器是由簡單的增益調變技術演變改良而來。此種編碼器就預先一步觀察、長時間步階估計、短時間步階估計、適應性估計等特性。第一部份我們首先介紹這種語音編碼器的基本架構以及工作原理。它的基本架構包括：決策邏輯單元、步階估計單元、以及適應性預測單元。經由結果顯示，CAPDM具有不錯的主觀及客觀的編碼品質。而且，CAPDM的編碼品質也遠比傳統的控制變動式步階語音編碼器（CVSD）來的好。在第二部分，我們在這個新型語音編碼器基礎上進一步地去改進它的品質：包括客觀品質、主觀品質、以及在有雜訊干擾的通道下其可靠度之提升。在客觀品質的提升方面，我們採用了語音週期性估計器來找出語音中所具有的週期性並且將其由語音中移除。就由語音週期性估計器，我們可以獲得更佳的語音訊號的估計結果以及編碼品質。在主觀品質的提升方面，我們設計了一種適應性後置濾波器。這種濾波器是運用了人耳聽覺的遮蔽效應。因此我們可獲得更佳的主觀品質。在可靠度的提升方面，我們同時設計採用了波型重置技術以及封包獨立技術來對抗雜訊對封包的影響並且回復遺失的封包。實驗結果顯示，跟在第一部份所提出的新型語音編碼器比起來，藉由這些技術的使用我們將可獲得更佳的主觀以及客觀的語音編碼品質。在有封包遺失的情形下，藉由所提出的方法，我們將可大幅提升編碼器的可靠度。在第三部分中，我們利用語音週期性同步取代法將資料傳輸率由16 kb/s 降到9 kb/s。由於語音具有短時間週期性，所以我們可以只將每隔一個週期的語音封包編碼，而丟棄相鄰的封包。由於有將近一半的封包被丟棄，所以會影響編碼的品質。但是9 kb/s CAPDM仍有可接受的品質。在第四部分，我們針對CAPDM的架構設計了語音有效期檢測。這方法是利用預估增益還有預估效益量測。實驗結果顯示我們有效期檢測的方法是一個相當可靠的方法。因此，使用了有效期檢測可以將CAPDM資料傳輸率由16 kb/s 降到 8 kb/s 以下。 A new speech coding algorithm operates at median data rate with good speech coding quality, robustness, and low complexity is developed and presented in the dissertation. It is called controlled adaptive prediction delta modulation (CAPDM) coder. It is inspired by the simple delta modulation (DM) algorithm and combine the features of one-step look forward decision, syllabic companding, instantaneous companding, and adaptive prediction. We first present the basic structure and operation mechanism of CAPDM algorithm. The basic structure of a CAPDM codec is simply a decision logic, a stepsize estimation unit, and an adaptive prediction unit. Simulation results show that CAPDM has good subjective and objective speech coding quality. CAPDM also shows better speech coding quality over that of 16 kb/s CVSD. Next, we continue to improve the performance of the basic CAPDM, including objective quality, subjective quality, and robustness in a noisy channel. For objective quality enhancement, a pitch detector is investigated and implemented to exploited and remove the long-term information. Better prediction and speech coding quality are obtained by the use of pitch detection. For the subjective quality enhancement, an adaptive post-filter (APF) is designed and implemented to take advantage of masking effect of human ear. Therefore, less prediction noise is perceivable. For robustness enhancement in a noisy channel, both waveform substitution and packets isolation techniques are investigated and implemented to combat the packets loss effect and recover the lost packets. Simulation results reveal that higher SEGSNR and MOS is obtained compared to that of basic CAPDM algorithm. Simulation results also show that performance of CAPDM with pitch detection and APF degrades smoothly even when packet loss rate approaches 10%. The data rate of CAPDM is continuously reduced from 16 kb/s to 9 kb/s using pitch synchronous substitution. The fact that speech vowel waveform has a quasi-periodic structure accounts for why the speech samples are encoded only for every other pitch period. The reduction of data rate from 16 kb/s to 9 kb/s results in some speech coding quality impairment. However, the 9 kb/s CAPDM still shows a fair speech coding quality. Finally, a voice activity detection (VAD) algorithm dedicated for CAPDM structure is developed. This VAD algorithm is based on two parameters: prediction gain and prediction indicator. Simulation results indicate that the proposed algorithm is a robust algorithm. Thus, by using the VAD algorithm, the effective data rate of 16 kb/s CAPDM is reduced below 8 kb/s.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT890435001 http://hdl.handle.net/11536/67281
顯示於類別：	畢業論文