標題: 針對影像轉換編碼之離散餘弦轉換與離散小波轉換的可重組式IP
Reconfigurable IP of DCT and DWT for image transform coding
作者: 陳毓宏
單智君
Jyh-Jiun Shann
電機學院IC設計產業專班
關鍵字: 可重組式;RHIP
公開日期: 2008
摘要: 本篇論文主要是針對結合離散餘弦轉換DCT (Discrete Cosine Transform)與離散小波轉換 DWT (Discrete Wavelet Transform) 設計一個可重組式的架構,利用最少的硬體實現這兩種轉換 法。DCT 在影像處理的規格上常見於JPEG、H.26X、MPEG1、2、4…等規格。而DWT 則因 為其具有多重解析以及可把訊號的頻帶做有效的分離,很適合用於影像的壓縮處理;目前常被 應用於JPEG2000 上。 在合併架構的處理上,我們修改蝴蝶結架構演算法的DCT 來縮小硬體需要量並增加其平行 度,同時提升DCT 的速度以便讓IP 的規格可以符合動態影像處理的需求。在DWT 的部分我們 選擇JPEG2000 標準中的9/7 filter 做為我們的設計目標,而為了與DCT的架構可以做結合,我們 捨棄了最小硬體設計的提升式演算法 (Lifting scheme),改以迴旋積分 (convolution based)的演 算法來處理。 由於DCT 與DWT 的乘法係數都為定數,因此在硬體設計上我們針對乘法的部分採用 canonic signed digit (CSD) encoded multiplier 來取代傳統的乘法器的設計,並且再有效利用DCT 與DWT 間的係數關係,來設計一個可變式定數乘法器 (VCM ,variable constant multiplier),以期 達到降低面積上的需求,而且以CSD encoded multiplier 的設計亦可幫助我們提升此IC 的速度。 在硬體實現方面,我們利用硬體描述語言 (Verilog Hardware Description Language)來設計此 架構,並在台積電所提供的.13 製程library 及Synopsys 所提供的合成軟體做合成電路,而本篇所 提的電路最高時脈可達105MNz,而面積為11940gate counts。與近期發表的ASIC 的 DCT 加 DWT 架構相比,我們的設計大約可省33%的面積,並在速度亦可符合MPEG2 的規格。
This paper presents a design and development of a reconfigurable architecture for image transform coding. Because of DCT (Discrete Cosine Transform) is usually used in JPEG, MPEG, and H.26X system. And DWT (Discrete Wavelet Transform) is used in JPEG2000 system. They are both the part that damage huge computations in system. So we want to integrate DCT and DWT architectures with the same hardware to reduce hardware requirement for these system. To merge DWT into DCT, we use butterfly structure for DCT and 9/7 convolution based structure for DWT that not only can make DCT computations time faster, but also can increase hardware utility rate for DWT. To reduce hardware area and speed up DCT function, we use the subexpression to reduce our hardware requirement and canonic signed digit (CSD) encoded multiplier to design ours multiplication that can effective reduce multiplication area, and also can speed up the chip to achieve the mpeg spec. The architecture executes a DCT and DWT achieving up to 90% higher throughput and occupying as little as 20% area compared to a commercial digital signal processing and other application- specified integrated circuit implementations while maintaining precision. A comprehensive comparative analysis is also provided. The proposed architecture is implemented in 0.13- m CMOS technology and operates with a 100-MHz clock.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009495528
http://hdl.handle.net/11536/38006
Appears in Collections:Thesis