標題: H.264/AVC視訊編碼器於NVIDIA CUDA之平行演算與實現
H.264/AVC Encoder Parallelized Realization on NVIDIA CUDA
作者: 陳威年
Wei-Nien Chen
杭學鳴
Hsueh-Ming Hang
電子研究所
關鍵字: 顯示處理器;統一運算單元架構;先進視訊編碼;內框模式估測;動作估測;GPU;CUDA;H.264/AVC;intra prediction;motion estimation
公開日期: 2007
摘要: 由於顯示處理器的快速發展,近年來漸漸發展出將顯示處理器應用於非圖形的運算,以輔助中央處理器,此技術通稱為GPGPU。美國NVIDIA公司在2007年提出一個全新的顯示處理器架構,其全名為「統一運算單元架構」,簡稱CUDA,為現今對運算能力要求極高的資料密集型應用程式提供了具彈性的大型平行運算平台。在本篇論文中,我們將H.264/AVC的編碼系統建立在此架構上。 我們針對H.264/AVC編碼器中最耗費運算能力的motion estimation以及intra prediction 模式選擇兩個部份作CUDA平台的實現。我們對於intra prediction模式選擇提出了block層級的平行化,並且提出使用原始影像作為預測參考的intra prediction演算法。此外,為了要能完全的利用CUDA的處理能力,我們對於執行緒的分配使用與記憶體的配置做了最佳化,並且以此基礎設計了一套五個步驟的 motion estimation流程。我們在NVIDIA GeForce 8800GTX GPU平台上驗證我們的演算法,對於個別的模組達到了約12倍的加速,而整體H.264/AVC編碼器也有大約5倍的加速。
Due to the rapid growth of graphics processing unit (GPU) processing capability, using GPU as a coprocessor to assist the central processing unit (CPU) in computing massive data becomes essential. NVIDIA announced a powerful GPU architecture called Compute Unified Device Architecture (CUDA) in 2007. This new architecture largely improves the programming flexibility of general-purpose GPU. In this thesis, we propose a highly parallel intra mode selection scheme and a full search motion estimation scheme with fractional pixel refinement optimized for the CUDA architecture. In order to achieve the block-level parallelized intra mode selection, the original pixel values rather than the coded pixels are used for deciding the best intra-prediction mode. In addition, to fully utilize the computation power of CUDA, the thread usage and memory access pattern are carefully tuned. Following the parallel processing optimization rules, we design a motion estimation algorithm consisting of 5 stages. We try to process as many data as possible to fully use the computing power of this GPU. The proposed algorithms are evaluated on the NVIDIA GeForce 8800GTX GPU platform. The speed up ratios of these two modules are about 12 times faster, and the overall H.264/AVC encoding time is about 5 times faster than the PC only counterpart.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009511607
http://hdl.handle.net/11536/38134
顯示於類別:畢業論文