標題: | 30fps超高畫質HEVC解碼器設計 A QFHD 30fps HEVC Decoder Design |
作者: | 江沛澤 Chiang, Pai-Tse 張添烜 Chang, Tian-Sheuan 電子工程學系 電子研究所 |
關鍵字: | 高效率視訊編碼;解碼器;移動補償;反向轉換;HEVC;Decoder;Motion compensation;Inverse Transform |
公開日期: | 2013 |
摘要: | 在新一代的視訊編碼標準HEVC中,提出了許多新的編碼方式,如: 遞迴式的編碼架構、更多元的預測方式以及更大的預測單元和轉換單元等。而這使得編碼效能獲得大幅的改善,但也同時增加了即時編碼與解碼的困難度。為了滿足即時解碼的需求並解決高記憶體頻寬、不規則的處理單位和大量的面積成本等問題,這篇論文經由解碼器的相關分析,提出了一個合適的HEVC水管線架構以及優化的模組實作。
在水管線架構的部分,我們依據解碼器中不同功能性單元的特性,採用了混合式方塊大小的四級管線架構,與使用LCU大小作為處理單元的管線架構相比,對於管線級與級之間的級緩衝器可節省約90%的使用量。在模組實作方面,關於移動補償的設計我們以16x16的大小來做為要求參考資料的單位,並利用分割區塊內資料共用、針對不同精確度使用不同的資料存取以及具快取記憶體特性的暫存器等方法,可達到約88%的資料縮減。此外在運算方面,我們先使用預測單元中的角落位置來獲取正確的移動預測候選,並設計了優化的相容性內插單元來完成移動補償的運算。在反向轉換的設計上,我們則是利用了轉換矩陣係數間的特性,實作出可支援4x4到32x32轉換運算的相容性硬體架構。
最後,我們所提出的HEVC解碼器設計在TSMC90奈米的製程下,大約需要399K的邏輯閘數目量以及17.5K位元組的晶片內建記憶體,在工作頻率為270MHz的情況下,可以支援每秒30張4Kx2K的畫面大小的影片規格。 The latest High Efficiency Video Coding (HEVC) standard can improve the coding efficiency with many new coding methods such as recursive coding structure, various prediction modes and larger processing size for prediction or transform. However, these additional methods would lead to memory bandwidth increase, irregularity and area cost overhead in real time applications. Therefore, this thesis proposed an HEVC decoder design with appropriate pipeline architecture and the optimized module implementation to meet the real time demand. The proposed decoder adopts a four stage mixed block size pipeline structure with variable-size processing unit to save about 90% pipeline stage buffer size compared with the LCU based pipeline structure. For the module design, the motion compensation part uses the 16x16 block based data access, precision based data access and smart buffer mechanism to reduce the data bandwidth by about 88%. Moreover, the corner position computation for MVP and the optimized reconfigurable interpolation design are adopted to handle the irregular MV computational size and different FIR filter types. The inverse transform exploits the numerical properties of the transform coefficients matrix to design a reconfigurable architecture to process the inverse computation with the size from 4x4 to 32x32. The overall proposed design costs 399K logic gates and 17.5 KBytes of on-chip memory with TSMC 90nm CMOS process. It could support 4Kx2K 30 fps video at the 270MHz operation frequency. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT070050188 http://hdl.handle.net/11536/73338 |
Appears in Collections: | Thesis |