適用於即時高畫質HEVC編碼之分像素移動估測快速演算法與設計

標題:	適用於即時高畫質HEVC編碼之分像素移動估測快速演算法與設計 Fast Sub-pixel Motion Estimation Algorithm and Design for Real Time High Efficiency Video Coding
作者:	陳怡雯 Chen, I-Wen 張添烜 Chang,Tian-Sheuan 電子工程學系電子研究所
關鍵字:	高效能視訊編碼;框間預測;分像素移動估測;HEVC;Inter prediction;FME
公開日期:	2014
摘要:	在視訊壓縮技術中，移動估測是對於減少畫面間冗餘資訊並增進編碼效能重要且耗時的核心技術。在新一代高效能視訊編碼標準中，移動估測是分為兩階段性並且採用不同精確度之整像素及分像素移動估測來完成。為了加速移動估測的過程，許多針對整像素點移動估測的快速演算法被採用來大幅降低搜尋點數以及運算複雜度，這也造成分像素移動估測佔據整體運算時間的比例大幅上升。為了因應在分像素移動估測中像素內插以及採用更大預測單元所造成的高運算及高硬體複雜度，本篇論文提出了一套適用於硬體設計的分像素移動估測快速演算法，並規劃了不同於先前設計的硬體架構以解決在HEVC中資料高度相依性的問題。演算法的部分，基於單一平面假設而衍生的免內插方法，採用更多整數點的資訊以及在碼率失真最佳化的匹配誤差中用經過4x4哈德曼轉換的預測殘差絕對值總合作為依據，直接預測出量化後的搜尋點位置；並以此搜尋點做為中心在鄰近相距1/4精確度的位置依據資料重複利用的特性決定出額外的兩點搜尋位置，在這3個搜尋點中找出提供更高準確度者作為最終的預測搜尋點並得出最佳移動向量。本論文提出之演算法在分像素點的移動估測，比HEVC參考軟體的設計降低81.25%的搜尋點數量，從實驗結果得到我們針對分像素移動估測改良的演算法和HM 13.0對照在以BD-rate作為編碼效能評估依據的表現上，以Main profile及雙向預測低延遲(Low Delay-B)組態中，在YUV成分分別平均增加了1.8%，1.7%，及1.5%。硬體方面，我們結合了先前設計的整像素快速演算法，以規劃更完整的硬體架構能適用於HEVC中高度遞迴相依性結構問題。在我們提出的硬體設計中，採取了交錯式管線化以及高度平行化的硬體設計，其中硬體平行化可以適用在即時處理HEVC中採用多種不同預測單元而造成的高運算複雜度，而交錯式管線化的設計則能進一步改善硬體使用效能並解決資料高度相依性的問題。從實驗結果可以得到我們結合兩項針對移動估測設計的演算法和HM 13.0對照在BD-rate效能的表現上，以Main profile及Low Delay-B組態中，在YUV成分分別平均增加了3.3%，4.0%，及4.0%。在硬體成本中，我們設計的硬體若以TSMC 90nm的技術合成，需要751.6K邏輯閘數目及 21.2 K位元組的晶片內建記憶體，在工作頻率為270MHz的情況下，可以支援每秒30張4Kx2K的畫面大小的影片。 Motion estimation is a critical technology and time consuming process to remove temporal redundancies and get better coding performance. In HEVC, motion estimation is performed in two levels of accuracy and is carried out by IME and FME process, respectively. In order to accelerate ME process, many fast search algorithms are adopted for IME to greatly reduce the search points as well as the computational cost. This make the FME process occupies most of the ME encoding time. Therefore, it is critical to develop fast sub-pixel ME algorithms due to the high computational complexity from interpolation process. In this paper, we proposed a hardware efficient fast FME algorithm based on the concept of interpolation free and uni-modal assumption. The minimum point with best matching error in quadratic function measured by the sum of absolute transformed differences (SATD) directly predicts the position of most probable search point. And then apply a refinement search around this search point at 1/4-pixel accuracy based on the reusing data characteristic to determine the position of extra 2 search points. The final MV in 1/4-pixel accuracy is obtained among these 3 predicted search candidates, which achieve highest prediction accuracy. Our proposed algorithm in this paper reduced the number of fractional search points from 16 to 3, which indicates the number of search point evaluations was reduced by 81.25% in relation to the HEVC reference software. Moreover, the simulation result of coding performance compared to HEVC reference software HM 13.0 shows the average BD-rate increases by 1.8%, 1.7% and 1.5% for Low Delay-B main configuration on Y, U, and V component, respectively. For hardware design development, we combined the fast IME algorithm in our previous work in order to design a complete architecture that can solve the high computational complexity from the various PU size range and high data dependency problem in HEVC. Therefore, an interleaving pipeline with parallelism enhanced hardware design is proposed. We exploit the parallelism in ME hardware design for different PU levels and partition modes. To further improve the hardware utilization efficiency, we adopt an interleaving hardware scheduling based on PU partitions. By combining these two fast ME algorithm, the simulation result compared to HEVC reference software HM 13.0 shows the average BD-rate increases by 3.3%, 4.0% and 4.0% for Low Delay-B main configuration on Y, U, and V component, respectively. The proposed architecture cost 751.6K logic gates and 21.2 Kbytes of on-chip memory under TSMC 90nm CMOS process. It is assumed to operate at 270MHz clock frequency supporting video to be encoded in real time with 4kx2k frame size and 30 fps frame rate.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT070150197 http://hdl.handle.net/11536/76514
Appears in Collections:	Thesis