標題: | 適用於即時超高畫質HEVC之快速框內預測演算法與設計 Fast Intra Prediction Algorithm and Design for Real Time QFHD High Efficiency Video Coding |
作者: | 丁奕晴 Ting, Yi-Ching 張添烜 Chang, Tian-Sheuan 電子工程學系 電子研究所 |
關鍵字: | 影像壓縮;框內編碼;Video Coding;Intra prediction;HEVC |
公開日期: | 2013 |
摘要: | 在最新的影像編碼標準HEVC中,由於框內編碼採用了遞迴式的編碼結構、更大的預測單位大小及更多方向的預測模式,這使得即時影像編碼的困難度大幅地上昇;為了滿足即時編碼的需求,這篇論文首先提出了一種適用於硬體設計的快速框內編碼預測演算法,其結合了兩個方法:畫面切割區塊大小的選擇及率失真優化(RDO)候選模式的決定,而後此論文亦提出了相對應的硬體架構設計。
畫面切割區塊大小的選擇,是透過簡單的Sobel運算先得到小切割區塊的梯度值,再使用由小而大的架構來重覆利用小切割區塊的結果,推得較大切割區塊的梯度值以節省這部分的運算量,如此一來最後僅有兩種可能性較高的區塊大小需要進一步地作框內編碼的運算;而在RDO處理程序的部分,根據機率分佈的統計結果,候選模式的決定僅需使用初步預測模式決定(RMD)程序所得的最佳預測模式及鄰近區塊的最可能預測模式(MPM)的資訊,便能有效地達到減少RDO候選模式數量的效果。實驗的模擬結果顯示,與HM 9.0rc1全框內編碼的原始設定相比,我們所提出的演算法平均下來可以節省掉高達71.5%的編碼時間,而BD-rate卻僅增加了3.8%左右;另外切割區塊大小及RDO候選模式的計算量則是分別節省了約60%及55%。
在硬體實作上,由於採用了基於4×4區塊由小而大的結構及動態選取參考點的控制單位,不同的區塊大小及預測模式均可由共用的運算單元計算;此外RMD及RDO處理程序將分別於不同階段作處理:透過使用原始像素作為參考值的方法,RMD階段可快速地挑選出適當的RDO候選模式,而RDO階段則是採用了交替式的編碼排序來降低資料相依所導致的計算時間浪費。我們設計的硬體若以TSMC 90nm的技術合成,大約需要871.9K邏輯閘的數目量及10.46K位元組的晶片內建記憶體,而其可在工作頻率為270MHz的情況下,滿足處理畫面大小為4K×2K,每秒30張畫面的影片規格。 The high computational demand of the intra prediction in the latest High Efficiency Video Coding (HEVC) standard becomes a big challenge for the implementation of real time video encoder due to the recursive coding structure, larger prediction unit size, and more prediction modes. To meet real time demands, this thesis proposes a hardware-friendly fast intra prediction algorithm and its hardware design with fast block size selection and rate distortion optimization (RDO) candidate mode selection. The fast block size selection uses simple Sobel operators to find the gradient of the small block size and build the gradient of the larger block size with the bottom-up structure, which can reuse the results of the small block size and save the costs of the gradient calculation. Thus, only two possible block sizes are further checked their best prediction modes. Then the fast RDO candidate mode selection reduces the RDO candidate modes to only the best mode from the rough mode decision (RMD) and the most probable modes (MPMs) of neighboring blocks for further RDO process according to the statistical results of candidate probability distribution. The simulation results show that the proposed algorithm can save 71.5% encoding time on average for all-intra main case compared to the default encoding scheme in HM-9.0rc1, with slightly 3.8% BD-rate increases. Besides, the calculation savings of PU size and RDO candidate modes achieve 60% and 55% respectively. The resulted hardware design adopts an unified prediction module for different PU size and prediction mode computations with the 4×4 based bottom-up structure and adaptive sample fetch controller. In addition, the RMD and RDO process are divided into two different stages. The RMD stage uses original pixels as the reference to narrow down the RDO candidates quickly, and the RDO stage uses the interleaved coding schedule to resolve the influence of data dependency further. The hardware implementation with TSMC 90 nm CMOS process costs 871.9K logic gates and 10.46 Kbytes of on-chip memory, which can support the processing 4k×2k 30 fps video at 270 MHz operation frequency. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070050189 http://hdl.handle.net/11536/72998 |
Appears in Collections: | Thesis |