标题: JPEG 2000之晶片设计与系统整合
Chip Design and System Integration of JPEG 2000
作者: 林重甫
Chung-Fu Lin
吴炳飞
Bing-Fei Wu
电控工程研究所
关键字: JPEG 2000;离散小波转换;位元平面编码;状块影像;管线化设计;编码区块;JPEG 2000;DWT;EBCOT;Bit-plane coding;Tile size;Pipeline architecture;code-block
公开日期: 2005
摘要: JPEG 2000是由ISO与IEC委员会所制订的一套新的静态影像压缩规格,有别于传统的JPEG压缩方法,JPEG 2000提供了更好的压缩品质并也提供了多样化的应用需求,像是无失真压缩,固定位元率(bit-rate)失真压缩,有兴趣范围(ROI)压缩,渐进式影像传输(scalability)等,而此多样化的功能,可适用于许多的影像应用中,像是有线与无线网路的影像传输、高阶数位相机、远端监控或是医学影像等等。
本论文主要贡献在于提出高速、低记忆体之小波转换(Wavelet)架构,使其能以较少层数的管线化设计达到高速的运算处理,并且进一步讨论二维小波转换运算流程中资料的相依关系,使其在低记忆体的架构下,仍能达到高速的运算,实现低记忆体与高速处理的二维小波转换。而在JPEG 2000 coprocessor的研究中,由于在传统的设计架构中,需要等待一张块状影像(tile image)经小波转换处理后,才开始后续位元平面编码以及算数编码等运算,因此,需要很大的内部记忆体(512K Bytes for 512x512 tile image),来完成不同处理单元的整合,因此,在JPEG 2000硬体架构的研究中,我们修改了离散小波转换系数的输出顺序,使其更适合于后续位元平面编码的输入时序,如此,可减少至少3/4以上的内部记忆体(27.16K Bytes for 512x512 tile image),同时,此运算流程亦提升了离散小波转换与位元平面编码之间的运算平行度,进而减少内部记忆体的资料搬移次数。
本论文主要分为六个章节,第一章我们对JPEG 2000的影像压缩技术做简单扼要的介绍。第二章我们针对JPEG 2000中各处理单元做进一步的分析与讨论,并回顾之前关于JPEG 2000硬体架构的研究。
第三章我们针对JPEG 2000中的小波转换做进一步的探讨,由于在实现补偿式小波转换的架构上,大多是以管线化的设计方式,来加速运算的速度,因此,将会使用到较多的管线化暂存器(pipeline register)来减少每一级暂存器之间的逻辑运算时间(propagation delay),然而,这样的架构会造成二维转换中,需要大量的内部记忆体,在此章节中,我们修改了原始一维小波转换的运算流程,使其能以较少的管线化暂存器达到高速的设计,并且进一步讨论二维小波转换运算流程中资料的相依关系,以减少内部所需的记忆体,提出兼顾高速处理与低记忆体的二维小波转换硬体架构。
第四章我们针对JPEG 2000 coprocessor的硬体架构做一讨论,在目前设计JPEG 2000 coprocessor的研究当中,多是先将各处理单元先做最佳化的设计,接着,再将各处理单元利用记忆体或是FIFO整合在一起,然而,由于离散小波转换与位元平面编码处理单元,其资料输入与输出的时脉顺序不同,因此,需要利用较大的内部记忆体,来完成各处理单元的整合,因此,在本章节中,我们修改了离散小波转换系数的输出顺序,使其更适合于后续的位元平面编码的输入时序,如此,可将减少至少3/4以上的内部记忆体,同时,亦增加了离散小波转换与位元平面编码的处理平行度,减少内部记忆体的资料搬移次数。
第五章我们将所提出的高速与低记忆体小波转换与低记忆体JPEG 2000 coprocessor硬体架构实现于ARM-based platform,在此一章节中,我们将JPEG 2000 coprocessor包上了一层ARM wrapper,使其能与ARM processor沟通以共同完成影像压缩的流程,在JPEG 2000 coprocessor的部分,我们分别实现于FPGA以及UMC 0.18 um,完成硬体的整合与验证,并利用ARM processor完成整个压缩流程的控制。最后,在第六章的部分,我们整理了此论文的主要贡献与未来展望。
JPEG 2000 is a new image coding system that delivers superior compression performance and provides many advanced features in scalability, flexibility, and system functionalities. The two key technologies of JPEG 2000 are Discrete Wavelet Transform (DWT) and Embedded block coding with optimized truncation (EBCOT). DWT can decompose the signals into different sub-bands with both time and frequency information and facilitate to achieve high compression ratio. Embedded block coding with optimized truncation (EBCOT) is another important technology in JPEG 2000. It coded each coding block independently and can achieve rate-distortion optimization and scalable coding. The attractive features also brings many imaging applications such as the Internet, wireless, security, and digital cinema. In this thesis, we focus on some design challenges of JPEG 2000 implementation (i.e. memory issue, processing speed and throughput of DWT design and JPEG 2000 coprocessor).
Firstly, for the implementation of two-dimensional Discrete Wavelet Transform (DWT) operating in whole image, it uses internal memory to store the intermediate column-processed data, whose size is proportional to the image dimension. Besides, the pipeline stages of DWT data path would also prolong the data dependency and increase the memory size. Thus, in the high-speed and memory-efficient design, we explore the issues between the critical path and internal memory size with lossless 5/3 and lossy 9/7 filters of JPEG 2000. To ease the tradeoff between the pipeline stages of 1-D architecture and memory requirement of 2-D implementation, a modified algorithm is proposed for the designs of 1-D and 2-D pipeline architectures. Based on the modified data path of lifting-based DWT, the proposed architecture can achieve high-speed processing frequency by inserting more pipeline stages without increasing the internal memory size (i.e. the detail is mentioned in Chapter 3).
As for the integration issue of JPEG 2000 coprocessor for DWT, BPC (bit-plane coder), AC (arithmetic coder) components, the overall encoding system may suffer performance degradations and need more hardware resources, since different components require different I/O bandwidth and buffers. To decrease the internal memory size and increase the overall throughput, we propose a (Quad code-block) QCB-based DWT engine to ease the performance degradation of integration issue. Based on the changed output timing of the DWT process, three code blocks are iteratively generated every fixed execution time slice. The DWT and BPC processes can reach higher parallelism than the traditional DWT method. Moreover, the overall performance can preserve the high performance of the individual component and the internal memory size is also reduced (i.e. the detail is mentioned in Chapter 4).
To verify the proposed architectures for DWT and JPEG 2000 coprocessor, we implement the DWT design and JPEG 2000-based image system on ARM-based platform (i.e. Integrator System). To make the JPEG 2000 coprocessor more applicable, we wrap the design through AHB (Advanced High-performance Bus) interface. Following the bus communication, the JPEG 2000 coprocessor can communicate to JPEG 2000 coprocessor and complete the JPEG 2000 compression process. The overall system is realized by the integration of ARM processor and JPEG 2000 coprocessor (i.e. the detail is mentioned in Chapter 5). Finally, we give a brief conclusion and future works in Chapter 6.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009012819
http://hdl.handle.net/11536/81014
显示于类别:Thesis