標題: JPEG 2000之晶片設計與系統整合
Chip Design and System Integration of JPEG 2000
作者: 林重甫
Chung-Fu Lin
吳炳飛
Bing-Fei Wu
電控工程研究所
關鍵字: JPEG 2000;離散小波轉換;位元平面編碼;狀塊影像;管線化設計;編碼區塊;JPEG 2000;DWT;EBCOT;Bit-plane coding;Tile size;Pipeline architecture;code-block
公開日期: 2005
摘要: JPEG 2000是由ISO與IEC委員會所制訂的一套新的靜態影像壓縮規格,有別於傳統的JPEG壓縮方法,JPEG 2000提供了更好的壓縮品質並也提供了多樣化的應用需求,像是無失真壓縮,固定位元率(bit-rate)失真壓縮,有興趣範圍(ROI)壓縮,漸進式影像傳輸(scalability)等,而此多樣化的功能,可適用於許多的影像應用中,像是有線與無線網路的影像傳輸、高階數位相機、遠端監控或是醫學影像等等。
本論文主要貢獻在於提出高速、低記憶體之小波轉換(Wavelet)架構,使其能以較少層數的管線化設計達到高速的運算處理,並且進一步討論二維小波轉換運算流程中資料的相依關係,使其在低記憶體的架構下,仍能達到高速的運算,實現低記憶體與高速處理的二維小波轉換。而在JPEG 2000 coprocessor的研究中,由於在傳統的設計架構中,需要等待一張塊狀影像(tile image)經小波轉換處理後,才開始後續位元平面編碼以及算數編碼等運算,因此,需要很大的內部記憶體(512K Bytes for 512x512 tile image),來完成不同處理單元的整合,因此,在JPEG 2000硬體架構的研究中,我們修改了離散小波轉換係數的輸出順序,使其更適合於後續位元平面編碼的輸入時序,如此,可減少至少3/4以上的內部記憶體(27.16K Bytes for 512x512 tile image),同時,此運算流程亦提升了離散小波轉換與位元平面編碼之間的運算平行度,進而減少內部記憶體的資料搬移次數。
本論文主要分為六個章節,第一章我們對JPEG 2000的影像壓縮技術做簡單扼要的介紹。第二章我們針對JPEG 2000中各處理單元做進一步的分析與討論,並回顧之前關於JPEG 2000硬體架構的研究。
第三章我們針對JPEG 2000中的小波轉換做進一步的探討,由於在實現補償式小波轉換的架構上,大多是以管線化的設計方式,來加速運算的速度,因此,將會使用到較多的管線化暫存器(pipeline register)來減少每一級暫存器之間的邏輯運算時間(propagation delay),然而,這樣的架構會造成二維轉換中,需要大量的內部記憶體,在此章節中,我們修改了原始一維小波轉換的運算流程,使其能以較少的管線化暫存器達到高速的設計,並且進一步討論二維小波轉換運算流程中資料的相依關係,以減少內部所需的記憶體,提出兼顧高速處理與低記憶體的二維小波轉換硬體架構。
第四章我們針對JPEG 2000 coprocessor的硬體架構做一討論,在目前設計JPEG 2000 coprocessor的研究當中,多是先將各處理單元先做最佳化的設計,接著,再將各處理單元利用記憶體或是FIFO整合在一起,然而,由於離散小波轉換與位元平面編碼處理單元,其資料輸入與輸出的時脈順序不同,因此,需要利用較大的內部記憶體,來完成各處理單元的整合,因此,在本章節中,我們修改了離散小波轉換係數的輸出順序,使其更適合於後續的位元平面編碼的輸入時序,如此,可將減少至少3/4以上的內部記憶體,同時,亦增加了離散小波轉換與位元平面編碼的處理平行度,減少內部記憶體的資料搬移次數。
第五章我們將所提出的高速與低記憶體小波轉換與低記憶體JPEG 2000 coprocessor硬體架構實現於ARM-based platform,在此一章節中,我們將JPEG 2000 coprocessor包上了一層ARM wrapper,使其能與ARM processor溝通以共同完成影像壓縮的流程,在JPEG 2000 coprocessor的部分,我們分別實現於FPGA以及UMC 0.18 um,完成硬體的整合與驗證,並利用ARM processor完成整個壓縮流程的控制。最後,在第六章的部分,我們整理了此論文的主要貢獻與未來展望。
JPEG 2000 is a new image coding system that delivers superior compression performance and provides many advanced features in scalability, flexibility, and system functionalities. The two key technologies of JPEG 2000 are Discrete Wavelet Transform (DWT) and Embedded block coding with optimized truncation (EBCOT). DWT can decompose the signals into different sub-bands with both time and frequency information and facilitate to achieve high compression ratio. Embedded block coding with optimized truncation (EBCOT) is another important technology in JPEG 2000. It coded each coding block independently and can achieve rate-distortion optimization and scalable coding. The attractive features also brings many imaging applications such as the Internet, wireless, security, and digital cinema. In this thesis, we focus on some design challenges of JPEG 2000 implementation (i.e. memory issue, processing speed and throughput of DWT design and JPEG 2000 coprocessor).
Firstly, for the implementation of two-dimensional Discrete Wavelet Transform (DWT) operating in whole image, it uses internal memory to store the intermediate column-processed data, whose size is proportional to the image dimension. Besides, the pipeline stages of DWT data path would also prolong the data dependency and increase the memory size. Thus, in the high-speed and memory-efficient design, we explore the issues between the critical path and internal memory size with lossless 5/3 and lossy 9/7 filters of JPEG 2000. To ease the tradeoff between the pipeline stages of 1-D architecture and memory requirement of 2-D implementation, a modified algorithm is proposed for the designs of 1-D and 2-D pipeline architectures. Based on the modified data path of lifting-based DWT, the proposed architecture can achieve high-speed processing frequency by inserting more pipeline stages without increasing the internal memory size (i.e. the detail is mentioned in Chapter 3).
As for the integration issue of JPEG 2000 coprocessor for DWT, BPC (bit-plane coder), AC (arithmetic coder) components, the overall encoding system may suffer performance degradations and need more hardware resources, since different components require different I/O bandwidth and buffers. To decrease the internal memory size and increase the overall throughput, we propose a (Quad code-block) QCB-based DWT engine to ease the performance degradation of integration issue. Based on the changed output timing of the DWT process, three code blocks are iteratively generated every fixed execution time slice. The DWT and BPC processes can reach higher parallelism than the traditional DWT method. Moreover, the overall performance can preserve the high performance of the individual component and the internal memory size is also reduced (i.e. the detail is mentioned in Chapter 4).
To verify the proposed architectures for DWT and JPEG 2000 coprocessor, we implement the DWT design and JPEG 2000-based image system on ARM-based platform (i.e. Integrator System). To make the JPEG 2000 coprocessor more applicable, we wrap the design through AHB (Advanced High-performance Bus) interface. Following the bus communication, the JPEG 2000 coprocessor can communicate to JPEG 2000 coprocessor and complete the JPEG 2000 compression process. The overall system is realized by the integration of ARM processor and JPEG 2000 coprocessor (i.e. the detail is mentioned in Chapter 5). Finally, we give a brief conclusion and future works in Chapter 6.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009012819
http://hdl.handle.net/11536/81014
顯示於類別:畢業論文