標題: | 具率失真最佳化之高效移動估測設計 Efficient ME Design with Rate Distortion Optimization |
作者: | 楊雲翔 張添烜 Yang, Yun-Xiang Chang, Tian-Sheuan 電子研究所 |
關鍵字: | 率失真;移動估測;硬體設計;rate distortion optimization;motion estimation;hardware design |
公開日期: | 2017 |
摘要: | 本論文提出具率失真估測的幀間預測即時硬體加速器設計,解決傳統做法由於高資料相依性問題,導致低硬體使用率與高硬體花費成本與計算時間之缺點。因此在整數移動估測中,PU16x16, PU16x8(PU8x16), 與PU8x8 的資料處理我們以全交錯式排程,犧牲BD-rate 微幅提升來換取無資料相依性的預測編碼區塊同時交錯運算,最終提升8.73%的硬體平均使用效率與降低38.63%硬體成本。在分像素移動估測設計方面,為了降低因大量分像素點需內插所產生的高硬體花費成本,我們提出硬體資源導向的分像素點設計排程,在面臨相同的規格限制下將原先所需硬體套數降低46.67%且維持原先相同的執行時間。而在整係數離散餘弦
變換硬體設計我們換採取對角式位址記憶體存讀取方式,使記憶體控制複雜度簡單化且能在最短時間內完成存讀取。在碼率估測硬體設計部分,由於前後級硬體資料掃描方向不一致造成資料相依性,我們提出對應位址轉換控制,解決資料相依性問題。
實驗結果在HM13.0 BD-rate 效能的表現,在YUV 分別降低了4.9%、7.8%及8.1%,設計的硬體以TSMC 40nm 的技術合成,需要622.05K 邏輯閘數目及30.6875K 位元組的晶片內建記憶體。在工作頻率400MHz 的情況下,可支援Biprediction編碼每秒30 張4Kx2K 畫面大小的影片。 Conventional ME designs suffer from low hardware utilization and high hardware cost due to high data dependency problem in ME algorithms. This thesis proposed a real time efficient ME design with rate distortion optimization to solve above problems. For the integer pel motion estimation, we proposed a fully interleaved scheduling for PU16x16, PU16x8(PU8x16), and PU8x8 blocks to process different blocks without dependency with small performance loss but 8.73% of hardware utilization increase and 38.63% of hardware cost reduction. For the fractional pel motion estimation, we proposed a cost aware hardware allocation for FME interpolation filters to reduce 46.67% of hardware but still meet the same processing constraint. For the DCT-like transform, we proposed a diagonal addressing for transpose memory to reduce the complexity of memory control. For the rate estimation, we propose a dependency free rate estimation address translation to solve the data dependency problem due to inconsistent scan order. The simulation result show the BD-rate performance drop by 4.9%, 7.8% and 8.1% for Y, U and V component, respectively when compared to the HEVC reference software HM 13.0. The proposed design cost 622.05 logic gates and 30.6875 Kbytes of on-chip memory under TSMC 40nm CMOS process. It could support 4Kx2K 30 fps video coding under bi-prediction condition at 400 MHz operation frequency. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450284 http://hdl.handle.net/11536/142160 |
顯示於類別: | 畢業論文 |