標題: MPEG-4時頻轉換編碼之模組分析與改良
On Improvement of Modules in MPEG-4 T/F coding
作者: 姚錦樹
Chin-Shu Yao
劉啟民
Chi-Min Liu
資訊科學與工程研究所
關鍵字: MPEG-4時頻轉換編碼;MPEG-4音訊編碼;長期預測模組;知覺上類雜訊取代;類雜訊偵測;位元切分算術編碼;轉換域上加權交錯向量量化;可調整的音訊編碼;MPEG-4 T/F coding;MPEG-4 audio coding;Long Term Prediction;Perceptual Noise Substitution;Noise detection;Bit-Sliced Arithmetic Coding;Transform-domain weighted interleave Vector Quantization;Scalable audio coding
公開日期: 1999
摘要: MPEG-4時頻轉換編碼以MPEG-2 Advanced Audio Coding(AAC)為基礎,加入了數個新的模組,包含:Long-term Prediction(LTP)、Perceptual Noise Substitution(PNS)、Twin-VQ、Bit-Sliced Arithmetic coding(BSAC)等,讓整個編碼結構更趨完美;同時,並引進一項新的概念:具有可調整的彈性(scalability),將架構分層處理,基層使用語音壓縮,保持最基本的部分;往上使用時頻轉換的編碼技術,讓音質提升至最佳,與原音相差無幾。在這樣架構下,除了壓縮率更高、品質更好外,亦適合網路傳輸與多媒體應用。使得中間傳輸的過程,或是在解碼端,能因時制宜取出所需要或想要的資料,完成解讀而不必全部要接收還原才行。 本篇論文主要著力在此四個模組的分析與改進,我們將逐一探討其架構,分析其議題與困難點,以及應用時機和優劣,提出改良方法與建議。在LTP中,我們探討延遲時間計算的困難點,並提出兩個方法加以改進,對LTP的採用決定也作一個詳細的分析與評估。在PNS裡,我們利用雜性的特性與音訊編碼技巧,思考了三個方法來做類雜訊訊號的偵測,並以五種PNS使用的實驗,來印證我們的想法;對於Twin-VQ與BSAC,我們著重於理論探討和操作流程,研究模組的目的與實施空間,並與原本既存模組以綜合比較來評判優劣,最後提出我們的建議。
MPEG-4 T/F coding is an important audio processing technique, which includes speech coding and audio coding to support multimedia applications, such as transmission, telecommunication, broadcasting, etc. T/F coding is based on MPEG-2 Advanced Audio Coding (AAC) and adds new modules to make the coding scheme have higher coding efficiency. These modules include Long Term Prediction (LTP), Perceptual Noise Substitution (PNS), Bit-Sliced Arithmetic Coding (BSAC) and Transform-domain weighted interleave Vector Quantization (TwinVQ). In addition, T/F coding provides a scalable structure consisting of base layer and enhancement layer(s). Scalability enables partial decoding and is suitable for various transmissions and applications. This thesis focuses on the analysis and improvement of four new modules. We study the structure of each module and address coding issues. Then we present our proposals to improve them and probe into the motivation, advantages and coding space for each module. In LTP analysis, we discuss the problem of lag searching and suggest two methods for speeding up. In PNS analysis, we exploit the noise characteristic and audio coding skills to propose three methods for noise detection. Moreover we verify the methods in five different manners of PNS implementation. Finally for BSAC and TwinVQ modules, we explore the principle and framework in detail, analyze and compare with existed Huffman coding and AAC quantizaton and coding and present coding space for each of them.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880392010
http://hdl.handle.net/11536/65405
Appears in Collections:Thesis