用於先進音訊編碼之高效率編碼策略

標題:	用於先進音訊編碼之高效率編碼策略 Efficient Coding Strategies for Advanced Audio Coding
作者:	楊政翰 Cheng-Han Yang 杭學鳴 Hsueh-Ming Hang 電子研究所
關鍵字:	先進音訊編碼;位元率-失真控制演算法;位元分配演算法;聲道間預測;Advanced Audio Coding (AAC);Rate-Distortion Control Algorithm;Bit Allocation Algorithm;Inter-Channel Prediction
公開日期:	2004
摘要:	先進音訊編碼(Advanced Audio Coding, AAC)是ISO/IEC MPEG標準委員會所訂定新一代高效能而且複雜的音訊壓縮標準。由於先進音訊編碼器的設計並不在標準的規範內，因此編碼器中編碼模組的設計對編碼效能有很大的影響。而其中，一個適當的位元率-失真(rate-distortion)控制演算法就是能促成一個良好的先進音訊編碼器的關鍵要素。位元率□失真控制演算法和其相關的議題將會是本論文的重點。籬柵圖(trellis-based)演算法是一個用於先進音訊編碼有名的位元率□失真控制演算法。它是利用籬柵圖搜尋整個訊框(frame)方法來找尋適當的編碼參數。籬柵圖演算法可以逹到令人讚賞的效能，但是它的運算量非常之高。本論文的第一個貢獻是為先進音訊編碼設計了兩種型式的低複雜度、高效能的位元率-失真控制演算法，分別是串聯式籬柵圖(cascaded trellis-based, CTB)演算法以及增強型BFOS (enhanced BFOS, EBFOS)演算法。在所提出的第一種型式演算法中(串聯式籬柵圖演算法)，我們把籬柵圖方法中一個運算量非常大的單一步驟分成兩個運算量較少的連續步驟來實現。藉由這種方式，我們有效的減低籬柵圖方法中的運算負擔。除此之外，我們可以藉由大幅減少籬柵圖搜尋的候選人這個方法，再更進一步減低運算量。在所提出的第二種型式演算法中(增強型BFOS演算法)，我們是以將位元一步一步分配到最被需要頻帶(band)的方法取代籬柵圖搜尋整個訊框方法。在這個方法中，頻帶層次(band-level)的位元使用效率以及先進音訊編碼中編碼程序的頻帶間相互依賴性兩項我們都考慮到了。模擬結果顯示，我們所提出的兩種型式位元率-失真控制演算法的編碼效能明顯比MPEG-4先進音訊編碼驗證原型(Verification Model)來得好，而且很接近原始籬柵圖演算法的效能。此外，與籬柵圖演算法相比，所提出的演算法需要小於1/140的運算複雜度。儘管現行的音訊編碼技術是這麼的成功，在減少多聲道音訊內的聲道間多餘資訊上卻沒有太多的成果。本論文的第二個貢獻是發展了一個用於知覺音訊編碼中用來移除聲道間多餘資訊(redundancy)的有效演算法。在我們的方法中，知覺比重聲道間預測技術被用在改良式離散餘弦轉換(Modified Discrete Cosine Transform, MDCT)係數上。以這個基本架構作為基礎，有兩種型式的預測器(predictor)被採用，分別是時間訊號式(time-signal based)預測器以及頻譜係數式(spectral-coefficient based)預測器。和INT-DCT方法相似，我們的方法並不需要額外的知覺遮罩控制，同時也並不會造成音訊品下降。另外，對於大多數典型的音訊訊號，我們的方法在減少位元率的效能上比INT-DCT方法來得好約10%。 The Advanced Audio Coding (AAC) is a recent, high performance and sophisticated audio coder specified by the ISO/IEC MPEG Standard Committee. Because the design of encoder in AAC standard is non-normative, the coding performance is greatly influenced by the design of the coding modules (tools) in an AAC encoder. One critical element contributing to a good AAC encoder is a properly designed rate-distortion (R-D) control algorithm. This and its related issues will be the focus of this dissertation. One well-known R-D control algorithm designed for AAC is the trellis-based algorithm. It performs the trellis search through entire frame for finding proper coding parameters. It can achieve a praiseworthy performance, but their computational complexity is extremely high. The first contribution of this dissertation is the design of two types of low complexity and high performance rate-distortion control algorithms, which are Cascaded Trellis-Based (CTB) algorithm and Enhanced BFOS (EBFOS) algorithm. In the first type of the proposed algorithms, CTB, we efficiently reduce the computational burden of the trellis-based algorithms by splitting the heavy calculation stage in the trellis-based approach into two consecutive steps with much less computation. In addition, the complexity is further reduced by decreasing significantly the number of candidates in the trellis search. In the second type of proposed algorithms, EBFOS, instead of performing the trellis search through the entire frame, we allocate the bits to the most needed band step by step. In this approach, we consider both the “bit-use efficiency” at band-level and the inter-band dependency of the coding process in AAC. Simulation results show that the coding performance of the proposed two types of rate-distortion control algorithms is significantly better than that of the AAC Verification Model and is close to that of the original high-cost trellis-based algorithms. Roughly, the proposed algorithms require less than 1/140 complexity in computation when it is compared to the original trellis-based algorithms. Despite the success of current audio coding techniques, little effort has been made to reduce the inter-channel redundancy inherent in multichannel audio compression. The second contribution of this dissertation is to develop an efficient algorithm for removing inter-channel redundancy in perceptual audio coding. In our approach, the perceptually weighted inter-channel prediction is applied to the Modified Discrete Cosine Transform (MDCT) coefficients. Based on this basic structure, two types of inter-channel predictor are proposed, the time-signal based predictor and the spectral-coefficient based predictor. Similar to the existing INT-DCT based approach, no extra perceptual masking control is needed for our approach; in the meanwhile, no audio quality degradation will be induced by our method. The bit rate reduction of our method is about 10% or higher than that of the INT-DCT based approach for most typical audio sequences.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT008911831 http://hdl.handle.net/11536/76935
Appears in Collections:	Thesis

Files in This Item:

183101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.