即時的積分直方圖基準之聯合雙邊濾波演算法分析與設計

標題:	即時的積分直方圖基準之聯合雙邊濾波演算法分析與設計 Analysis and Design of Real-time Integral Histogram Based Joint Bilateral Filtering
作者:	許博雄 Hsu, Po-Hsiung 張添烜 Chang, Tian-Sheuan 電子研究所
關鍵字:	積分直方圖;雙邊濾波;integral histogram;bilateral filter
公開日期:	2010
摘要:	雙邊濾波演算法和聯合雙邊濾波演算法已經被廣泛運用在許多影像處理的領域中，例如去除雜訊、色調處理、甚至是立體的相關應用和MPEG標準。它雖然可以用快速演算法中的積分直方圖方法加速，但針對需要即時處理的應用，仍然遭受高運算複雜度，高記憶體使用量的問題。要解決這些問題，VLSI實現是個必要的方法。本篇研究針對積分直方圖基準之(聯合)雙邊濾波演算法提出一個有效率的硬體架構，其中包含三個自提的記憶體減量方法和可大量平行運算的單元。這些自提的記憶體減量方法包含動態更新方法，條狀切割方法，和積分起點位移方法。其中動態更新方法是在運算期間，利用演算法循序逐列掃描計算的特性，移除不再使用的資料。而條狀切割方法則進一步將每一張畫面切割成許多縱向的條狀區域並作為逐列掃描計算的單位；每個條狀區域的寬度比畫面寬度短得多，因此逐列掃描計算只需通過較短的列長，使得資料暫存量大減，不再需要整個畫面寬的記憶體空間。最後，積分起點位移方法利用循序動態積分起點的概念，協助原始直方圖演算法的積分過程減少對儲存資料的依賴，使得記憶體使用量得以由整張畫面的尺度，減少至列的尺度。整體來說，這三個方法很容易結合起來，可以將記憶體使用量減少至原演算法的0.003%。另一方面，自提的硬體架構利用延遲暫存資料共用方法和使用查表選擇器，分別解決了積分直方圖運算上高頻寬需求和大量查表的問題;並且利用記憶體的切割來提升內部頻寬的容量。除此之外，它也使用數值(在影像中則為亮度)空間平行方法來有效率地執行大量積分直方圖單元運算，而達到高產出。另外，這個硬體架構的運算模組佈局與參數的選擇無關，因此對於不同參數需求的應用，將不需再重新設計。最後的硬體實現，在聯華電子90奈米製程下，使用200 MHz 的工作時脈，每秒可以執行60張HD1080p (1920x1080)影像。晶片總共需要355 K個邏輯閘和23 K個晶片記憶體。 Bilateral filtering and joint bilateral filtering have been widely used in many image processing fields, such as de-noising, tone-management, and even the 3-D applications and MPEG standard. They can be accelerated by the associated fast algorithm, integral histogram, but still suffer from highly computational complexity and massive memory, especially for real-time applications. To conquer them, VLSI implementation becomes a necessary solution. In the thesis, we design an efficient hardware architecture, which consists of three proposed memory reduction methods, and highly parallel computational components for integral histogram based (joint) bilateral filtering. The proposed memory reduction methods include runtime updating method (RUM), stripe-based method (SBM), and sliding origin method (SOM). The RUM in runtime takes advantage of progressive raster-scan process of computation to discard unnecessary data. The SBM further divides each frame into vertical stripes and processes them one by one. These stripes are much narrower than a frame; therefore, the raster scan process can traverse along shorter rows and the original frame-wide memory cost can be significantly reduced. Finally, the SOM uses the concept of progressive sliding integral origin to help the original histogram integration process lessen the dependency on storage data; therefore, the memory requirement can be reduced from frame-scale-magnitude to line-scale-magnitude. On the whole, the three methods can be easily combined to reduce the memory cost to 0.003% of the original requirement. On the other hand, the proposed hardware architecture solves the integral histogram computational high bandwidth and large table problem by using delay-buffer data-reuse method and table selector, respectively. And use memory banks to enlarge the capacity of internal memory bandwidth. Besides, it uses range (intensity, for image)-space-parallelism methods to process large amount of histogram bins simultaneously to achieve high throughput. What’s more, the function block layout of the hardware architecture is invariant to parameter selection; therefore, it doesn’t have to be redesigned for applications of different parameter demands. The final design implemented by UMC 90nm CMOS technology can achieve 60 frames per second for HD1080p (1920x1080) resolution image under 200MHz clock rate. The chip consumes 355 K gate counts and 23 K Bytes on-chip memory.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079711591 http://hdl.handle.net/11536/44294
Appears in Collections:	Thesis

Files in This Item:

159101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.