Title: 適用於HEVC Screen Content Coding之快速演算法與設計
Fast Algorithm and Design for HEVC Screen Content Coding
Authors: 陳彥合
電子工程學系 電子研究所
Keywords: 視窗視訊編碼;調色盤模式;畫面內區塊複製;超大型積體電路;Screen Content Coding;palette mode;intra block copy;VLSI
Issue Date: 2016
Abstract: 如今Screen Content Coding廣泛地應用於網路會議,遠端桌面共用,線上合作,因此成為了視訊編碼的一個重要話題。而palette mode和intra block copy是Screen Content Coding中的兩個十分重要的編碼工具。
這篇論文提出了一個適用於硬體實現的palette mode的演算法並完成其對應的硬體設計。在演算法部分,我們提出了一個簡化的palette建立流程,以此降低其過程中的運算量。然後我們限定了在建立調色盤過程中,其顏色的最大數量,並提出了在此條件下提升編碼效率的方法。之後我們針對RD-cost公式進行了化簡,此外還在palette預測器的顏色挑選的過程中增加了使用SAD的預選流程。這些所提出的演算法在減少硬體花費的同時,也保證了編碼的品質,在BD-rate上平均僅有0.86%的增加。
我們的硬體採取四級管線不同深度平行設計。同時,我們讓幾個不同的運算單元共用最小數選擇器,以此來進一步節省硬體花費。最終完成的硬體若以TSMC 90nm的技術合成,大約需要415K邏輯閘的數目量及9.16K位元組的晶片內建記憶體,而其可在工作頻率為270MHz的情況下,滿足處理畫面大小為4K×2K,每秒30張畫面的影片規格。
我們還針對intra block copy提出了用二值化後的影像搜尋的查找策略,從而節省了超過80%的資料存取,並僅有0.47%的BD-rate上升。
Screen content coding is a significant topic of the video coding because of its widely use in web conferencing, desktop sharing, and online collaboration. Palette mode and intra block copy are the two important coding tools in screen content coding.
This paper presents a hardware efficient palette mode algorithm and its design for real time processing. In the algorithm part, we represent a simple palette derivation to reduce computations. Then an upper bound is set to the palette size to ease hardware design. After that, we simplify the RD-cost function of the palette entry to eliminate dividers. Finally, we pre-select the predictor candidates by using SAD instead of using SSE for lower complexity. The proposed algorithm reduces required hardware cost while keeps low quality loss by hardware friendly algorithm modifications. With these modifications, the BD-rate loss is only 0.86% in average.
The corresponding hardware adopts a depth parallel design with a four stage pipeline. The hardware cost is further reduced by sharing the most costly minimal selector for most of the units. The final implementation with TSMC 90nm CMOS process can achieve real time 4K×2K@30fps encoding with 415K gate count and 9.16KB SRAM when operating under 270MHz clock frequency.
For intra block copy, we propose an all-binary search strategy with over 80% saving for data access and just 0.47% BD-rate loss.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350298
Appears in Collections:Thesis