標題: | MPEG-4視訊系統之移動估計設計與實現 Motion Estimation Engine for MPEG-4 Video |
作者: | 秦浩雲 Hao-Yun Chin 張添烜 Tian-Sheuan Chang 電子研究所 |
關鍵字: | 移動估計;二元移動估計;形狀編碼;區塊比對;Motion Estimation;Binary Motion Estimation;Shape Coding;Block Matching;MPEG-4 |
公開日期: | 2003 |
摘要: | 本論文中提出了用於MPEG-4形狀編碼及材質編碼的移動估計演算法,及其對應的硬體架構。
在二元形狀編碼(BME)方面,首先我們提出了一種可略過多餘SAD計算之演算法,即使在最差輸入環境下,仍能維持與未使用此演算法相同的效率。根據模擬結果,共有11%~29%的計算可被省略,同時實作此演算法的硬體代價相當低。另外我們也提出了一組最佳的參考畫面暫存記憶體架構,此架構可以有效的支援因為VOP大小隨時間變化而產生的各式存取需求。此外,根據 MPEG-4標準的規範,Boundary BAB最多只能占全體BAB的50%,藉由增加一組矽面積頗小(原參考畫面暫存記憶體的2.73%-5.08%)的略圖記憶體(Thumbnail Buffer),可減少50%的參考畫面暫存記憶體。最後,我們實作了一個參考電路,工作在12.6MHz的頻率之下即可支援CP@Lv2(11880BAB/s),邏輯閘數為58,748。
在材質編碼方面,本論文也提出了一種基於減少像素的移動估計演算法。所提出的Quartet-pel移動估計(QME)演算法除了減少每個檢查點所需的運算量,也設法減少進行移動估計運算所需的記憶體讀取次數,因此可減低計算複雜度及記憶體頻寬。另外,此演算法的資料流具有規律性,因此可以用硬體有效率地實作本演算法。模擬結果顯示QME在PSNR及編碼後位元串流長度上的表現均與full-search BMA相近。同時本演算法具有一可調整之參數用來決定候選移動向量的個數,藉由調整本參數,能夠讓電路在處理時間及編碼品質間進行調整。QME之參考電路可在19.9MHz的工作頻率下支援 ASP@Lv5 (48,600MB/s, 720x576),最快可以運作在83.3MHz。以20MHz為時脈週期所合成的電路邏輯閘數為115,268。利用Synopsys PrimePower搭配隨機產生的畫面做為輸入,測得之平均功率消耗為47.64 mW,計算每個MV所需的能量為0.871μJ。使用本演算法編碼後的影像品質與full-search BMA相近,PSNR下降不超過0.0011 dB,同時編碼後位元串流長度最多增加1.03%。 In this thesis, we propose the algorithms and associated architectures of motion estimation for MPEG-4 shape coding and texture coding. The improvements on binary motion estimation (BME), the motion estimation of shape coding, are a penalty-free skipping algorithm that prevents redundant computations of SAD. According to simulation results, 11%–29% of SAD calculation are redundant and could be saved by the proposed algorithm at very little hardware cost. In addition, an optimal data storage scheme for alpha frame memory is proposed. First, a distributed tile-based memory organization is used to efficiently support the time-varying size of alpha plane. Second, a compression scheme is used to reduce the number of memory access and the size of the alpha frame memory. Under the criteria of MPEG-4 standard, the size of alpha frame memory can be reduced to 50% by introducing a small thumbnail buffer (2.73%–5.08% of the original frame memory size). The proposed techniques for BME are all lossless. Thus, the picture quality of encoded video is maintained. A reference implementation for BME supporting CP@Lv2 (11880BAB/s) is synthesized to operate at 12.6 MHz, and has the gate count of gates. This thesis also presents a new pixel decimation-based search algorithm for motion estimation of texture coding. The proposed quartet-pel motion estimation (QME) search algorithm reduces the number of pixels taken into account for the cost function of the matching criteria at every search point, therefore reduces the computational complexity and the required memory bandwidth. Furthermore, the proposed algorithm has a regular data flow, which in turn leads to hardware-efficient implementations. Simulation results show that QME achieves almost the same coding performance as full-search block matching algorithm (full-search BMA) in terms of PSNR and the size of the coded bit stream. A configurable parameter of candidate numbers can be adjusted in trade-off between coding performance and processing time. The synthesized circuit of QME can support ASP@Lv5 (48,600MB/s, 720x576) at 19.9MHz and can operate at a maximum speed of 83.3MHz. The gate count is 115,268 synthesized under 20MHz constraint. The average power consumption is 47.64 mW, estimated by Synopsys PrimePower with random inputs. The energy for the processing of a MV is 0.871μJ. The encoded video quality is close to that of full-search BMA, with a maximum PSNR drop of 0.0011 dB and 1.03% of increment in encoded bit-stream size. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009111589 http://hdl.handle.net/11536/43535 |
Appears in Collections: | Thesis |