標題: H.264/AVC 適應性算術編碼與解碼器的設計
Design of Context Adaptive Arithmetic Encoder and Decoder for H.264/AVC Video Coding
作者: 陳建隆
Jian-Long Chen
張天烜
Tian-Sheuan Chang
電機學院電子與光電學程
關鍵字: 適應性算術編碼;CABAC
公開日期: 2004
摘要: 摘 要 H.264/AVC為目前最新一代的視訊壓縮標準 ,相關研究顯示H.264/AVC相較於MPEG-2及MPEG-4,無論是壓縮率或視訊品質皆有大幅的提升,使得H.264/AVC非常適用於多媒體串流(multimedia streaming)及行動電視(mobile TV)的相關應用。本篇論文針對在小面積上提供一個快速H.264/AVC CABAC編碼與解碼;CABAC主要由三個方塊函數所組成: Binarization,Context Model,與 Arithmetic Coding。以傳統的方式處理,一個位元需要耗費13個步驟才能完成更新;在此藉由重新安排每個步驟(包含平行處理與加入pipeline),成功的將之縮短到4 個步驟。在 Binarization方面,對於固定碼(Unary)在此皆以組合電路來實現,對於 UEGK編碼我們採table partition的作法以減少額外的編碼電路;相較於用軟體方式來實現該函數;微處理器需要額外的中斷來回應服務,因此利用硬體來分攤微處理器的工作量;多出的電路只有5k gate count。Context Model方面,為了將流程最佳化在此選擇雙埠記憶體讓同時讀寫記憶體得以實現。Arithmetic Coding中當range與low小於1/4*range時就會進入renormalization的步驟。renormalization有兩個迴圈,傳統的方式為將bit循序的處理。本論文作法為先使用one-slipping 的做法引用LZD 電路來偵測第一個迴圈的執行次數;緊接著利用bit-parallel的處理方式引入遮罩產生電路來實現第二迴圈。相比於傳統的作法,遮罩產生的作法大約省了AC 10%的執行時間,平均需要1.8個週期處理一個Bin;並將Arithmetic Coding的三個模式結合在一個電路以達到最大的硬體共享;另一方面[3]以prefix adder 來處理 renormalization,本論文為降低面積,引用shifter直接去除MSB並利用FSM來儲存狀態如此就可省去兩個10bits的加法電路。相較於[3],約省了50%的電路。整體而言,實現的編碼器可操作在333MHz,gate count 為13.3k,運算時間約為傳統方法的90%。解碼可達 333MHz,總共的gate count 為16.7k。因此該設計符合低成本,高產出的特性。
ABSTRACT H.264/AVC is the latest video compression standard. Relevant research shows that, comparing with MPEG-2 and MPEG-4, H.264/AVC has tremendously improved both compression ratio and video quality. Such feature makes H.264/AVC best fit in the applications of multimedia streaming and mobile TV. This thesis focused on fast CABAC encode/decode in a small area under H.264/AVC. CABAC is mainly composed of three function units: Binarization, Context Mode, and Arithmetic Coding. 13 steps will be required to renew a bin, if processed by traditional method. By re-arranging every step (including parallel processing and adding pipeline), the procedure can be successfully reduced to 4 steps. In term of Binarization, combination circuit is applied to Unary coding; and table partition is utilized to reduce extra computing complexity for UEGK coding. In contrast to performing the function by hardware, processing by software will interrupt the operation of the microprocessor in order to respond this service, whereas the hardware can share the loadings by additional 5k gate count only. In term of Context Mode, dual-port memory is adopted to read and write simultaneously and maximize the efficiency of the procedure. In term of arithmetic Coding, renormalization happens when range and low are less than 1/4*range. There are two loops in renormalization, and are traditionally processed bit by bit. In the experiment that this thesis based on, we followed the one-slipping method using LZD circuit to detect loop back times. Next, we use bit-parallel to generate mask hence to implement the second loop. Comparing to conventional method, the mask method saved AC around 10% time of execution with averagely 1.8 cycles to process a bin. Meanwhile, the thesis also integrates the three modes of Arithmetic Coding in one circuit in order to achieve the best hardware sharing. On the other hand, [3] also use prefix adder for renormalization. In order to lower the cost, we adopted shifter to remove MSB, take FSM for storing the status and eliminate two 10-bits adding circuits as result. Comparing with [3], that is about 50% of circuit saving. Overall, in respect of encoding, the implemented encoder can be operated at 333 MHz with gate counts at 13.3k, about 90% of the operation time of the traditional way. As for decoding, the decoder can also go up to 333 MHz with 16.7k gate counts. In short, this design fits in with the key feature of low-cost, high output.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009267509
http://hdl.handle.net/11536/77709
顯示於類別:畢業論文