標題: | JPEG2000編碼器之加速和TI DSP系統平台上之實現 Acceleration and Implementation of JPEG2000 Encoder on TI DSP Platform |
作者: | 劉建志 杭學鳴 電機學院IC設計產業專班 |
關鍵字: | JPEG2000;TI DSP;DSP系統加速;EBCOT;JPEG2000;TI DSP;DSP platform acceleration;EBCOT |
公開日期: | 2006 |
摘要: | 由於數位影像應用的逐漸普及,為了提供更有壓縮效率以及支援更多功能的影像處理,一個新一代的靜態影像壓縮標準JPEG2000於是產生。它在高壓縮率下也能夠提供相當好的主觀品質,此外,它在壓縮效能和傳送位元流時提供了更細緻的調整功能。然而,JPEG2000在計算上的複雜度相當的高,在本論文中,我們將JPEG200編碼器實現在TI DSP平台上。我們根據JPEG2000中最複雜的Tier部份,提出兩種改善方法,並且加上TI DSP最佳化的各種相關工具來進行加速。
我們的參考軟體採用了openJPEG ver.1.0,因為這套軟體的小波轉換模組已經使用一維補嘗式結構(lifting scheme)來進行加速,所以針對佔了整個編碼器九成運算量的Tier1模組,我們先探討常見的改善方式,並實際在我們所使用的平台上做測試,然後我們提出了兩種改進方法,一種稱為VGOSS(Variable group of sample skip),另外一種則是修改VGOSS的方式,來達成減少運算量的目的。這個方式是將需要編碼的資料紀錄起來,減少對不需要的編碼的資料所浪費的檢查時間。另外,我們改變了原來編碼的順序,提供更快的運算架構。當我們對影像使用無失真編碼時,除了採用所提供的加速方法,還有使用DSP的編譯程序最佳化、及程式碼的加速技術、還有快取記憶體的重新配置等功能,在最後的在DSP系統上的實驗數據顯示,我們使用以上所有技術後,可以比最原始的效能還要快32倍,如果比較在同樣的DSP最佳化設定還有記憶體配置下,我們的快速演算法仍然可以減少45%的運算量。 Because the usage for digital imagery gets increasingly popular, to enhance the compressed image efficiency and features, a new still image coding standard called JPEG2000 was proposed. It provides an excellent subjective quality at low bit rates. It also offers fine granularity scalability in compression efficiency and transmitting compressed bit stream. However, JPEG2000 is also very complicated in computational complexity. In this thesis, we implement a JPEG2000 encoder on the TI DSP platform. We propose two speed-up methods and use the TI DSP optimization tools to accelerate the Tier1 module, which is the most complex part in the JPEG2000 standard. We start with the ver.1.0 OpenJPEG reference software, which has adopted the 1-D lifting scheme to accelerate the DWT module. Thus we focus on the Tier1 module, which takes about 90% of total computing time. We study the previous methods first and examine their effectiveness on our DSP platform. Then, we propose two improved methods, one is called VGOSS (Variable Group Of Sample Skip), and the other is a modified VGOSS method. We eliminate the unnecessary checking cycles by recording the NBC (Need-to-Be-Coded) samples on a list. Furthermore, the sample index is reordered to facilitate fast execution. In the DSP implementation of the proposed methods, we use code acceleration techniques and DSP compiler-level optimization. We also tune the cache allocation to reduce memory access time. The experimental results show that the best performance is up to 32 times faster than the original program without any optimization on the DSP platform. If the original program is compiled with the DSP optimization tools and proper cache assignment, our fast algorithm can still reduce the computation by 45%. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009395506 http://hdl.handle.net/11536/80343 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.