標題: | 用於MPEG-4 與H.264視訊編碼的移動估測設計 Design of Motion Estimation for MPEG-4 and H.264 Video Coding |
作者: | 海珊 Esam A. Al_Qaralleh 張添烜 任建葳 Tian-Sheuan Chang Chein-Wei Jen 電子研究所 |
關鍵字: | 視訊編碼;移動估測;motion estimation;data reuse;binary motion estimation for shape coding;variable block size |
公開日期: | 2005 |
摘要: | Motion estimation is one of key part in modern video standards like MPEG-4 and H.264 to remove the temporal redundancy between video frames. However, it is also computational intensive and memory intensive. Thus, in this dissertation, we propose two designs, binary motion estimation and variable block size motion estimation, to reduce the computational load, and one vertical data reuse scheme to minimize the memory access. The first work supports the binary motion estimation for shape coding adopted by MPEG-4. In binary motion estimation, its processing is at the bit level and thus is not suitable for general purpose processors due to their word-level processing capability. Thus, we propose a fast algorithm and its architecture that takes advantages of this bit level (binary level). With the count of bits in a block, the proposed algorithm classifies and tests every candidate search position and then skips those unlikely to be a match. The proposed algorithm can adaptively overlap matching between different classes to get more accurate motion vector or more skipping ratio. The proposed algorithm achieves a saving in computational complexity ranging from 96.69% to 99.71% comes with the expense of increasing the shape encoded bits by 0.7% to 12.8%. Due to the simplicity and the regularity of the algorithm, the proposed hardware is also regular and needs only 11582 gate count. The second work supports the variable block size motion estimation. Variable block size limits the efficiency of early termination, but the algorithm shows good performance in this field. This design uses the early termination that adaptively changes its threshold to fit the variable block size and achieve early skipping. Different variables can be tuned by the algorithm to compromise between the high skipping ratio and the accurate motion vector. The proposed algorithm outperforms other similar algorithms with a complexity reduction of 78% and 51% for MPEG-4 and H.264 respectively. The hardware implementation of the algorithm can process one MB in 16 clock cycles, and completes a 16x16 search window in 4096 clock cycle without any termination process and an average 1032 clock cycles with termination process. The hardware uses only 16 registers and 31 adders and gate count of 16k. Finally, the third work reduces the huge memory access by vertical processing adjacent current macroblocks. Vertical processing can achieve the same speed up of the horizontal processing but lower memory access especially for large search window. A design is introduced to demonstrate the efficiency of the vertical processing compared to horizontal processing using the same number of processing elements. This simple and regular design can be easily extended to any number of PE without extra cost to the control circuit or any change in the data flow. The required data bandwidth is reduced by 60.9% with four processing elements and 61k gate count when compared to the previous designs. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008911638 http://hdl.handle.net/11536/76802 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.