用於H.264/MPEG-4 AVC可調式視訊編碼標準之快速編碼演算法設計

標題:	用於H.264/MPEG-4 AVC可調式視訊編碼標準之快速編碼演算法設計 Fast Encoding Algorithm Design for H.264/MPEG-4 AVC Scalable Video Coding Standard
作者:	林鴻志 Lin, Hung-Chih 杭學鳴 Hang, Hsueh-Ming 電子研究所
關鍵字:	可調式視訊編碼;編碼器優化;快速編碼模式選擇法;快速時域預測選擇法;Scalable Video Coding;Encoder Optimization;Fast Mode Decision Algorithm;Fast Temporal Prediction Selection Algorithm
公開日期:	2009
摘要:	為了使視訊影像能夠穩健地在異質網路環境中傳輸，H.264/MPEG-4 AVC視訊編碼標準(H/264/AVC)已擴展出可調式視訊編碼標準(H.264/SVC)。在H.264/SVC視訊編碼標準中，主要提供了三種可調特性，包含了時間上、空間上與畫質上之可調特性。H.264/SVC視訊編碼標準能夠在一次壓縮視訊影響的前提下，根據不同的儲存、傳輸需求，擷取出部分位元流(bit-stream)並解碼出較低畫面率或低解析度之視訊影像。H.264/SVC視訊編碼標準採用了層次時域預測(hierarchical temporal prediction)編碼架構達到時間可調特性，含有兩個單方向預測與一個雙方向預測(bi-directional prediction, BI prediction)。此外，亦採用了層級編碼架構(layered coding approach)來實現空間與畫質之可調性，而各編碼層以層次時域預測為其基本結構。為了讓編碼效能達到最佳，H.264/SVC視訊編碼標準會評估編碼參數的所有組合可能，其中包含了H.264/AVC視訊編碼標準的編碼工具組與新提出的層間預測(inter-layer mechanism)機制。然而，此選定編碼參數作法會招致相當龐大之運算量。根據實驗數據指出，模式決定(mode decision)過程與其所需要之動作估計(motion estimation)之程序占了絕大部份之編碼運算量。因此，發展用於減少H.264/SVC視訊編碼標準運算複雜度之快速演算法是必要的。首先，針對二指數(dyadic)層次時域預測架構，我們提出了一套有效率的選擇時域預測模式(temporal prediction type)之快速演算法。根據16x16切割模式所選定之最佳時域預測模式，利用其高度相關繼承特性，可以有效地避免在大切割模式中(含16x8、8x16與8x8)的非必要之雙方向預測計算。此外，我們也謹慎地找出單方向預測與雙方向預測，兩者的誤差(distortion)與動作碼率(motion rate)數值之關係，用以設定出一組適應性調整之臨界值，排除不必要之雙方向預測運算。而在小切割模式中(含8x4、4x8與4x4)，根據我們的分析，不僅其最佳之時域預測模式可以參考8x8切割模式而得知，而且雙方向預測模式在編碼效能提升上是非常有限的。因此，這些分析可以有效地用來屏除層次時域預測架構中的無效之雙方向預測計算。接著，在全幀內(intra-only)預測之可調編碼架構下，因內部4x4預測(intra 4x4)與內部8x8預測(intra 8x8)之誤差與碼率(rate)在層級間具有對數線性(log-linear)之關係。利用此特性與基本層(base layer)所選定之最佳內部預測模式，可以大量地減少加強層(enhancement layer)之內部預測測試個數。此外，在較平滑之影像區域，我們保留了內部16x16預測(intra 16x16)的評估效應。最後，在幀間(inter)預測之可調編碼架構下，考慮了時間與畫質兩種可調性組合，提出了一幀內/幀間模式與動作向量選擇演算法。我們觀察不同切割模式的編碼效能與其切割模式在層級之間的條件機率分布，對於幀內模式而言，基於參考基本/參考層(reference layer)之資訊，加強層可以節省至少一半以上的測試個數；另一方面，對於幀間模式而言，藉由層級間量化參數(quantization parameter)的差異，調整加強層所要查詢的切割模式表。另外，為了減少動作搜尋的計算量，基本層的參考畫面位置也可以被選擇性地使用，而且基本層所選定之動作向量(motion vector)亦可被拿來當作加強層中的起始搜尋點。綜合而言，本論文藉由分析與觀察層級之間的高度相關性，排除罕見的編碼模式組合。實驗數據指出，與H.264/SVC之標準參考軟體相比，我們所提出之快速演算法可以在維持鮮少之效能損失下，節省65%~85%之編碼時間。 To enable robust video transmission over heterogeneous networks, the H.264/MPEG-4 AVC (H.264/AVC) has developed an extension of scalable video coding scheme (H.264/SVC). In the H.264/SVC, there are three main modalities of scalability, consisting of temporal, spatial, and quality scalability. The H.264/SVC can compress the video signal once but enable partially decoding the encoded bit-streams with lower temporal frame rate or spatial resolutions, depending on the storage and transmission requirements. To achieve the temporal scalability, the H.264/SVC uses the coding structure of the hierarchical temporal prediction, in which there are two uni-directional predictions and one bi-directional (BI) prediction. In addition, the spatial and quality scalabilities are realized by adopting the layered coding approach, where the hierarchical temporal prediction forms a basic coding structure in each coding layer. In order to provide high coding efficiency, the H.264/SVC exhaustively evaluates all possible combinations of encoding parameters, including the conventional coding tools in the H.264/AVC and the novel inter-layer prediction mechanism. However, the procedure of selecting optimal coding parameters dramatically results in huge computational complexity. The experimental results show that the mode decision process with related motion estimations significantly dominates the overall encoding time. Hence, it is necessary to develop fast encoding algorithms to reduce the encoding computations in the H.264/SVC. First, we propose a fast algorithm that efficiently selects the temporal prediction type for the dyadic hierarchical-B prediction structure in the H.264/SVC temporal scalable video coding. Referring to the best temporal prediction type of 16x16, we utilize the strong correlations of prediction type inheritance to eliminate the unnecessary computations for the BI prediction in the finer partitions, 16x8/8x16/8x8. In addition, we carefully examine the relationship of motion-rate costs and distortions between the BI and the two uni-directional temporal prediction types. As a result, we construct a set of adaptive thresholds to remove the unnecessary BI calculations. Moreover, our analysis points out that the coding efficiency of the BI prediction is limited in small partitions. For the block partitions smaller than 8x8, one of the two uni-directional temporal predictions is skipped based upon the information of an 8x8 partition. Hence, these analyses can be used to efficiently reduce the extensive computations burden in performing the BI prediction. Second, we make use of the log-linear rate-distortion relationship of inter-dependent layers to predict the better performer among the Intra4x4 and Intra8x8 prediction types at the enhancement layers for intra-only scalable video coding. Based upon the base-layer chosen prediction type, we can further reduce the number of candidate modes. In addition, to ensure the best trade-off between complexity and coding efficiency, the Intra16x16 prediction is retained and enabled only for coding high-resolution videos with smooth image contents. Finally, we provide a layer-adaptive intra/inter mode decision algorithm and a motion search scheme for the hierarchical B-frames in the H.264/SVC with combined coarse-grain quality scalability (CGS) and temporal scalability. We examine the rate-distortion performance contributed by different coding modes at the enhancement layers and the mode conditional probabilities at different temporal layers. For the intra prediction on inter frames, the number of Intra4x4/Intra8x8 prediction modes can be reduced by 50% or more, based on the reference/base layer intra prediction directions. For the enhancement-layer inter prediction, the look-up tables containing inter prediction candidate modes are designed to use the macroblock coding mode dependence on and the reference/base layer quantization parameters (Qp). In addition, to avoid checking all motion estimation reference frames, the base-layer reference frame index is selectively reused. And according to the enhancement-layer macroblock partition, the base-layer motion vector can be used as the initial search point for the enhancement-layer motion search. In conclusion, our proposed algorithms efficiently eliminate the unlikely combinations of coding options. The experiments show that our approaches can reduce 65%~85% encoding time with a similar coded quality, as compared to the reference software of the H.264/SVC.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079311605 http://hdl.handle.net/11536/40486
Appears in Collections:	Thesis

Files in This Item:

160501.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.