標題: | 針對高畫質視訊之H.264/MPEG-4 AVC視訊編碼器設計 Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
作者: | 林佑昆 Yu-Kun Lin 張添烜 任建葳 Tian-Sheuan Chang Chein-Wei Jen 電子研究所 |
關鍵字: | H.264編碼器;高畫質視訊;H.264 Encoder;High Profile;High Definition Video |
公開日期: | 2007 |
摘要: | H.264因為其具備的高壓縮率與高畫質,已是目前最被廣泛採用的視訊壓縮標準。但是其主要的問題是需要極高的運算量,特別是要支援到1920x1080 (1080p),所謂的高清畫質解析度時,其所需即時處理的資料量更達到以往1280x720 (720p) 解析度的四倍以上,所需要支援的功能也更多,很難使用軟體架構進行即時編碼。所以使用單晶片架構來設計H.264編碼器,已被廣泛採用於業界與學界。但如果使用硬體架構進行即時的H.264編碼,不論是在硬體面積、記憶體的數量與頻寬等方面,仍需要極高的成本。此外H.264所需的高運算量會導致低資料輸出率與高操作頻率。總和以上因素,巨大的功率消耗也是不可避免的。因此本論文提出了學術界第一個可以即時編碼1080p解析度之視訊,並且支援H.264高級規範的單晶片,此晶片中使用多種演算法與架構上的最佳化技術,將其硬體的成本與消耗功率降到最低,並且幾乎對其畫質與壓縮率沒有影響。
本論文共包含三大部分。首先,本論文針對H.264編碼器中最消耗硬體資源與運算量的移動偵測模組,進行討論與分析。因應H.264特有的可變區塊尺寸移動偵測技術,我們提出了模式濾波技術,在所有可能的區塊尺寸組合中,只挑出兩組最好的組合進行微調,藉此節省了73.2%的運算量。在整數移動偵測部分,為了達到影像品質與硬體成本之間的最佳平衡,本論文採用了多層次的平行化移動偵測的技術,此技術可以減少91.7%的運算量與30%的硬體面積。此外本論文也使用C層級的資料重複採用技術,以減少記憶體的存取量,藉此減少88%的內部記憶體與46%的記憶體頻寬。接著在分數移動偵測部分,本論文採用一次遞迴的技術,使資料處理速度變成以往所有採用二次遞迴技術之設計的兩倍,同時也節省了68%的硬體。綜合了以上的技術之後,本論文提出了一個能夠支援1080p解析度,並且搜尋範圍能夠達到±128的H.264移動偵測器。相較於之前的研究,我們的設計可減少60%的硬體面積與68.9%的內部記憶體。
論文的第二部分是H.264 框內編碼器的架構設計。H.264規格中的框內編碼,提供了比過去的影像壓縮技術如JPEG2000等,更高的壓縮率,可是又不需像移動偵測如此巨大的運算量與系統資源,因此是影像處理或低功耗視訊壓縮的一個新選擇,但其硬體設計的主要缺點是因為其可選擇的預測模式過多而導致的低資料輸出率。因此本論文提出了一個高資料輸出率與小面積的H.264框內編碼器。首先,本論文採用了一個修改過的三步快速演算法,在確保影像品質不下降時,減少運算所需要的時間。此外,此編碼器採用可變平行度的設計概念,在運算量較高的部分採用較高平行度架構,但在非瓶頸區域,則採用較低平行度架構,以減少硬體需求。此設計同樣能夠即時處理1080p解析度的視訊,並減少23.5%的硬體面積。此外因為操作頻率可以減少48%,並且也採用了多項低功率技術,故能夠達到低功耗的效果。
本論文的最後一部分是一個完整的H.264高級規範編碼器,因為許多支援高清解析度的應用採用H.264標準中的高級規範,所以我們將論文前半部提出的移動偵測器與框內編碼器,再結合了高級規範裡的新工具,整合成一個完整可支援1080p解析度的H.264高級規範編碼器。因為比起基礎規範編碼器,高級規範編碼器的設計有更大的挑戰在資料傳輸率、硬體資源與功率消耗上。此外,移動偵測模組與框內編碼模組在三級平行化系統架構當中,其重建模組會有時間上的衝突,因此在系統層面上,我們提出了跨平行化階層的硬體共享技術,以除去這項時間衝突與減少重複的硬體。此外我們採用全八點平行處理的技術,更進一步的加快資料處理速度,以免新增的高級規範工具變成系統瓶頸。在移動偵測的部分,我們讓新的雙向移動偵測共用同一組硬體,以減少面積;此外整數移動偵測與分數移動偵測硬體間也共享內部記憶體,以減少記憶體面積與所需頻寬。總之,這個學術界第一個發表的高級規範編碼器,在145MHz下便可支援1080p解析度,使用0.13微米製程時,其面積只要3.17x3.17平方毫米,只占過去類似設計的54%。支援1080p解析度時的功率消耗只要242毫瓦,而支援720p解析度時,功率消耗只需要過去類似設計的46.3%。而此小面積、低功率但高資料處理速度的設計也證明了本論文的研究成果確實適用在高畫質的視訊處理之上。 H.264 video standard has been widely adopted in high definition video applications because of its high compression efficiency and video quality. However, the major bottlenecks of H.264 implementation are its high computational loading and large memory bandwidth, especially for encoding 1920x1080 (1080p) high definition video in real time. Therefore, this dissertation proposes the first chip in academia which can both support H.264 high profile and encode 1080p video in real time. This dissertation contains three parts. First, we discuss and analyze the inter prediction modules which occupy the most memory bandwidth and hardware cost in H.264 encoder. To overcome these problems, we present a low complexity and hardware efficient motion estimation design with several design techniques. The first low complexity technique, mode filtering, selects the best two candidates of all possible block size combinations for refinement, and reduces the computations of fractional refinement by 73.2%. To further reduce the complexity and hardware cost, we propose a multi-level parallel processing technique in integer motion estimation stage. By this technique, 91.7% of complexity and 30% of gate count can be reduced. Furthermore, 88% of local memory size and 46% of external memory bandwidth can be reduced by the level C data reuse technique. Finally, our proposed single iteration technique can remove 68% of gate count and double the throughput of fractional motion estimation stage, which is a bottleneck in the inter prediction modules. In summary, the proposed H.264 inter prediction engine not only can support 1080p resolution and ±128 search range but also can reduce 60% of hardware and 68.9% of internal SRAM than previous work. The second part of the dissertation is the architecture design of H.264 intra encoder. The intra encoder in H.264 standard provides comparable coding efficiency with JPEG 2000 standards. To achieve high throughput and low area cost, we apply the modified three-step fast intra prediction to reduce the cycle count while keeping the quality as close as full search. Then, we further adopt the variable pixel parallelism to speed up performance on the critical intra prediction part while keeping other parts with low area cost. The achieved design supports 1080p video encoding and reduces 23.5% of gate count cost compared to the previous design. In addition, this design can achieve low power consumption by reducing 48% of operating frequency and several low power techniques. The final part of this dissertation is a complete H.264 high profile encoder. Because several high definition applications apply H.264 high profile, we integrate our motion estimation engine, intra encoder, and the new coding tools of H.264 high profile into a complete H.264 high profile encoder supporting 1080p video. These 1080p high profile applications present a series of new design challenges in throughput, cost and power. Furthermore, in system level, a timing conflict happens in the reconstruction stage of inter and intra prediction due to the three pipelined stages architecture. Therefore, we first propose the crossing stage hardware sharing technique to remove the conflict and repeated hardware. To solve the high throughput demands and structural hazards, this design adopts full eight-pixel parallelism. In motion estimation part, the bi-directional motion estimation modules share the hardware, and the integer and fractional motion estimation modules also share the local SRAM to reduce the internal memory size and bandwidth. In summary, we propose the first H.264 high profile encoder in academia which supports 1080p resolution under only 145MHz. The core area is 3.17x3.17mm2 under 0.13μm process, which is only 54% of previous work. The power consumption is 242mW for 1080p resolution and is only 46.3% of previous work for 720p resolution. Therefore, the small area, low power, and high throughput design is suitable for high definition video applications. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009111832 http://hdl.handle.net/11536/44468 |
顯示於類別: | 畢業論文 |