标题: 建构在 ARM 平台的 H.264/MPEG-4 AVC 解码器以及去方块濾波加速器
ARM-based Platform Design for H.264/MPEG-4 AVC Decoder and Accelerator for Deblocking Filter
作者: 张世骞
Shihchien Chang
蒋迪豪
Tihao Chiang
电子研究所
关键字: 解码器;平台式设计;去方块濾波器;单晶片系统;巨方块平行处理架构;去方块濾波加速器;H.264 decoder;MPEG4 AVC;Deblocking Filter;Loop Filter;ARM;Platform
公开日期: 2004
摘要: 本論文使用最佳化的平台式设计方法去建构一个H264/MPEG-4 AVC 解码器。考量其高效能、低成本及广泛的应用范围,我们使用ARM微处理器作为CPU核心。同时,我们使用高效能控制汇排流架构 (AMBA) 去提升系统传输效能和弹性。为了提升解码器的速度,我们同时对软体及硬体做最佳化。同时,我们提出一个巨方块平行处理的架构(macroblock-level pipelining) 使得软体和硬体能够同步处理而提升效能。在我们的硬体设计裡,我们实现三个加速器去满足三个计算需求最强的模组: 去方块濾波器(deblocking filter), 动作补偿(motion compensation) 和转置DCT 运算(inverse transform)。其中,在去方块濾波器的设计裡,我们提出适应性传输方法(adaptive transfer scheme)和汇排流同步传输的架构(bus-interleaved architecture)。考量去方块濾波器需要大量的传输频宽,我们将传输分成8种模式以适应性的方法减少传输资料量而使频宽有效被利用。另外,为了减少去方块濾波处理的时间,我们使用汇排流同步传输资料的架构使资料传输和濾波处理能平行处理。和前人去方块濾波硬体设计比较,我们最高有7倍的效能改善。就整体解码效能改善而言,我们的设计比起H.264參考软体JM6.0有9到16倍的效能提升。整体而言,我们的平台系统设计可以快速的整合到单晶片系统(system-on-chip)的设计中。而且,我们提出的硬体架构设计也可满足低成本与即时播放(real-time)的应用。
In this thesis, we present a baseline H264/MPEG-4 AVC decoder based on an optimized platform-based design methodology. In our platform, we employ the ARM microprocessor as the CPU core due to its high performance, low cost, and wide application. Besides, the Advanced Microcontroller Bus Architecture (AMBA) is integrated into our system as the on-chip bus due to its high performance and flexibility. To improve our system, we jointly optimize the software and hardware in the decoder. Also, we propose a macroblock-level pipelining architecture to achieve the synchronization of the software and the dedicated hardware co-processors. In our hardware design, three dedicated accelerators of deblocking filter, motion compensation and inverse transform, which are the most computationally intensive modules, are realized. Specifically, in the architecture design of deblocking filter, we proposed an adaptive transfer scheme and a platform-based bus-interleaved architecture. As considering the high bandwidth usage of bus for deblocking filter, we classify the filtering mode into 8 types and use an adaptive transmission scheme to avoid redundant data transfers so as to efficiently use the bus bandwidth. Moreover, to reduce the processing latency, we use a bus-interleaved architecture for conducting data transfer and filtering operation in parallel. As compared to the state-of-the-art designs of deblocking filter, our scheme offers up to 7x performance improvement. To compare the overall decoding performance, our experiments show that the throughput of H.264 reference software of JM6.0 decoder can be improved by 9 to 16 times. Finally, our proposed platform system can be easily applied in the system-on-chip design. Also, our proposed hardware architectures are suitable for low-cost and real-time applications.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009211618
http://hdl.handle.net/11536/66935
显示于类别:Thesis


文件中的档案:

  1. 161801.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.