標題: | 適用於功率受限視訊編碼系統之運動估測演算法與積體電路架構設計 Algorithm and Architecture Design of Motion Estimation for Power Constrained Video Coding Systems |
作者: | 王士豪 Shih-Hao Wang 蔣迪豪 Tihao Chiang 電子研究所 |
關鍵字: | 運動估測;低功率設計;功率可調適設計;MPEG-4;H.264/AVC;Motion estimation;Low power design;Power adaptive design;MPEG-4;H.264/AVC |
公開日期: | 2007 |
摘要: | 受限於可攜式行動設備之有限的電池容量,功率受限之視訊編碼系統設計逐漸受到了重視,在此之中,以低功率和功率可調適設計為最熱門的研究主題,本論文將以此兩主題為研究中心,以二元化搜尋技術為中心,逐步發展出兩項適用於低功率與功率可調適視訊編碼系統之運動估測技術,這兩項技術皆包含了演算法與積體電路架構設計。
本論文第一部份為提出一個具低功率與低頻寬需求之低功率全二元化搜尋之運動估測(Low Power-All Binary Motion Estimation, LP-ABME)積體電路設計。低功率與低頻寬需求為應用於行動視訊編碼應用上的兩大重要設計因素。為達到低功率與低頻寬需求,本技術架構於一個全二元化的運動估測(ABME)演算法上,藉由使用二元化的影像來完成運動估測,並將二元化的搜尋技術實現於金字塔式搜尋架構(pyramid search)下,以大量地降低了運動估測運算複雜度,且二元化的影像也降低了在I/O頻寬上的存取需求。為達成全二元化的運動估測(ABME)於積體電路實現,我們提出了一個基於原二元化的運動估測(ABME)之新的低功率全二元化搜尋之運動估測(LP-ABME)演算法與硬體架構設計。此設計具有四項重要的特色:(1)基於MB管狀設計的前處理器設計,(2)高硬體運算效率的二元化搜尋架構,(3)平行化的8x8 與16x16 搜尋架構,(4)可平行處理雙向預測搜尋架構。第一項技術降低了對I/O存取頻寬上的需求,另三項則降地了運算複雜度與運算功率消耗。此積體電路架構設計在I/O存取頻寬、效能、與功率消耗上表現出很好的效能。功率消耗方面,執行IPPPP CIF 30fps ,功率消耗為763微瓦(uW),IPBPB CIF 30fps則為896微瓦。I/O存取頻寬方面,則可節省54.3至67.1%.
本論文第二部份為提出一個具功率感知能力的功率可調適疊代二元化搜尋(Power Adaptive Iterative Binary Search, PA-IBS)技術,目的在改善: (1)功率可調適能力,(2)高硬體閒置,與(3)功率-失真(Power-Distortion)效能。舊有功率可調適運動估測設計,使用了硬體遮罩的方式實現功率可調適性,卻也延伸出許多問題,如:多餘的I/O存取頻寬浪費,多餘的記憶體頻寬浪費,與高硬體閒置等問題,導致功率可調適能力降低,與不好的功率-失真效能。為解決這些問題,本論文延伸了二元化搜尋技術的應用,發展出一套具功率可調適能力的演算法與積體電路架構。此演算法稱之為功率可調適疊代二元化搜尋(PA-IBS),其包含了: (1)疊代二元化搜尋技術,與(2)內容感知之疊代迴圈控制器。疊代二元化搜尋技術使用了最多八個迴圈的二元化搜尋,藉由疊代迴圈的應用,達到不同層次的預測品質與運算複雜度。內容感知之疊代迴圈控制器,則藉由運動向量(motion vector)來偵測視訊影像的運動複雜層度,以調整疊代迴圈數,並達到利用最少的迴圈達到最佳的預測品質與運算複雜度。積體電路設計方面,則使用頻率延展(frequency scaling)技術,將疊代迴圈數與功率消耗作一連結,藉由調整疊代迴圈數,來控制功率消耗與功率可調適能力,並解決高硬體閒置問題。實驗結果證明,相較於既有的功率可調適設計,PA-IBS可改善功率可調適能力達19-125%,I/O存取頻寬需求最高則可降低87.5%,同時具有較佳的功率-失真曲線。
總結,本論文提出兩個適用於低功率與功率可調適視訊編碼系統之運動估測技術。第一個技術達成了低於1毫瓦(mW)的功率消耗,和高於50%的 I/O存取頻寬節省。第二個技術則改善了現有功率可調適設計在功率可調適能力、高硬體閒置,與不好的功率-失真(Power-Distortion)效能等方面的問題。在功率受限視訊編碼系統上面的應用,提供了顯著的改善與更大的應用空間。 The design of power constrained video coding systems has drawn attentions in mobile devices or portable terminals due to the limited battery energy. Among the power constrained video coding applications, low power and power adaptive designs are two of the most attractive design topics. Inside the video coding system, motion estimation (ME) takes most of computation powers, and becomes the design bottleneck of the low power and power adaptive video coding systems. This thesis contains 2 major parts to address the design issues of low power and power adaptive motion estimation. The first part is to propose a new Low Power-All Binary Motion Estimation (LP-AMBE) hardware design for motion estimation to achieve low power and bus bandwidth efficiency. Low power and high bus bandwidth efficiency are the two key issues for portable video applications. To address such issues, we first study an efficient algorithm called all binary motion estimation (ABME), and analyze its architecture issues in operational flow and bus access. Then, we propose an hardware architecture for ABME with four new features (1) macroblock level pre-processing (2) efficient binary pyramid search structure (3) parallel processing of 8x8 and 16x16 block searches (4) parallel processing of bi-directional search. Such architecture leads to a superior performance in bus access, speed and power. The experiments show that the power consumption is as low as 763uW for IPPPP CIF 30fps and 896uW for IPBPB CIF 30fps. The bus bandwidth savings are 54.3% for P-frame search and 67.1% for B-frame search. The second part is to propose a new Power Adaptive Iterative Binary Search (PA-IBS) design for motion estimation to improve the power adaptation performance. In the prior power adaptive ME designs that use the hardware masking approach, there exist design overheads such as redundant bus access, unnecessary on-chip memory access, and poor hardware utilization that lead to poor power adaptation performance. Our proposed power adaptive solution addresses these issues with a new ME algorithm called Iterative Binary Search (IBS) and the associated hardware architecture called PA-IBS. The IBS uses eight binary searches where each search can be either an independent search or one of the eight joint searches. Hence, redundant bus and on-chip memory access are eliminated. A Content Adaptive Mechanism (CAM) is used to dynamically select the number of iterations on a macroblock basis. The PA-IBS uses the frequency scaling technique to provide a link between the number of iterations and the power consumption level. Therefore, it reduces hardware idling and enhances hardware utilization. Experiments show that the PA-IBS delivers lower peak power consumption, better power adaptation performance and lower bus bandwidth requirement as compared to the prior hardware masking based designs such as sub-sampling or least significant bits truncation methods. As compared to those approaches, the power adaptation performance is improved up to 19-125% and bus bandwidth is saved up to 87.5%. In conclusion, we have presented two algorithm and architecture designs of motion estimation for different power constrained video coding applications, and showed the advantages in low power consumption and bus bandwidth requirements as compared to prior works. The proposed power adaptive design is also shown to have better power adaptation ability and better power-distortion performance. Moreover, the proposed low power and power adaptive ME designs can be applied to upcoming Scalable Video Coding (SVC) standard for further complexity and power reduction. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009011843 http://hdl.handle.net/11536/80792 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.