標題: | H.264/AVC Scalable High Profile解碼器之設計與實作 Design and Implementation of H.264/AVC Scalable High Profile Decoder |
作者: | 陳宥辰 Chen, Yu-Chen 張添烜 Chang, Tian-Sheuan 電子研究所 |
關鍵字: | 影像壓縮;可調式影像解碼器;scalable video coding;inter-layer prediction;video coding |
公開日期: | 2010 |
摘要: | 隨著愈來愈先進的視訊標準,視訊裝置的應用也更趨廣泛。在這些標準之中,可調性影像編碼(SVC)不僅提供高規格的影像編碼,同時也支援了時間、品質、空間上的可調性。然而這些可調性在視訊晶片的設計上會造成解碼時間、記憶體頻寬、邏輯閘成本等額外的負擔。因此,本篇論文呈現了Scalable High profile H.264/AVC 解碼器從解碼流程分析、架構設計到模組實作的優化。
在解碼流程上,本篇論文採取先前提出的畫面幀為基礎(frame-based)之空間層解碼,並提出一個可以在記憶體頻寬和巨圖塊的處理週期分別能達到71%和66%縮減的單次品質層解碼流程。對於在層間幀內預測的質地填充(texture padding)方面,我們提出了基本層級(BL-level)的填充流程並節省了層間幀內預測巨區塊26%的解碼時間。
在上述流程下,本解碼器採取四級管線架構設計來增加解碼速度。第一個管線級是由三品質層平行處理的熵解碼器(Entropy Decoder)和語法解析器(Syntax Parser)所組成。第二個管線級是由殘餘重建路徑、層間預測器、以及參考像素抓取單元所組成。本論文特別針對殘餘重建路徑進行優化,以解決由可調性所造成的額外複雜度。經由實驗結果,我們所提出的平行管線架構和暫存結果重複使用(temporal result reusing)方法相對於傳統方法能節省54%的邏輯閘。對於層間預測,我們提出中央化的累加器型層間對應結構、簡化的多相插值器以及有效率的移動向量向上取樣器來節省邏輯閘成本和解碼時間。第三個管線級是由動作補償和幀內預測器所組成。而第四個管線級是由去區塊濾波器和質地填充器所組成。為了有效存取外部記憶體,本篇論文使用了針對可調性解碼規格客製化的記憶體要求協定。
最後,我們提出的Scalable High profile解碼器在UMC 90奈米的製程環境下總共約使用了54萬個邏輯閘和3萬9千個位元組的內部記憶體。其在一秒內可以處理60張CIF-SD480p-HD1080p規格和三層品質層的畫面幀。相對於較早的解碼器,本實作能在多樣可調性的基礎上提供更好的解碼效率。 Video applications are everywhere with the more and more advance standards. Scalable Video Coding (SVC) not only supports high definition specifications but also provides temporal, quality, and spatial scalabilities. However, these additional scalabilities cause the decoding time, memory bandwidth, and area cost overhead in chip design aspect. Thus, this thesis presents an H.264/AVC Scalable High Profile decoder with optimizations on decoding flow, architecture design, and module implementation. For decoding flow, this thesis adopts the previous proposed frame-based flow for spatial layer decoding, and proposes one-pass MB-based flow for quality layer decoding that saves 71% and 66% in external memory bandwidth and macroblock processing cycle respectively. For texture padding in inter-layer intra prediction, we propose BL-level padding flow that saves 26% decoding time in IntraBL coded macroblocks. With above flow, the decoder adopts four stages pipeline architecture to enhance the decoding throughput. The 1st stage is composed of entropy decoder and syntax parser which deal with 3 quality coefficients in parallel. The 2nd stage is composed of residual reconstruction path, inter-layer predictor, and reference pixels fetch unit. This thesis specifically optimizes the residual reconstruction path with parallel-pipeline architecture and temporal result reuse to cope with the additional complexity from SVC standard, which leads to 54% gate count savings compared with the traditional serial-pipeline architecture. For inter-layer predictor design, we propose the centralized accumulation-based CCSP concept, simplified poly-phase interpolator, and efficient MV upsampler to save the area cost and decoding time. The 3rd stage is composed of motion compensation and Intra predictor. The 4th stage is composed of the deblocking filter and the texture padder. To efficiently access external memory, a SVC-customized memory protocol is adopted in this thesis. Finally, the proposed design Scalable High profile decoder is implemented with UMC 90nm CMOS technology, which cost 565.12k gate count, and 39.66 Kbytes on chip memory. It is capable of 60fps, CIF-SD480p-HD1080p, and 3 quality layers decoding at 135MHz. Compared to the previous designs, the proposed decoder achieves better decoding efficiency based on multiple scalabilities. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079711660 http://hdl.handle.net/11536/44361 |
顯示於類別: | 畢業論文 |