標題: 高度平行化之非規律性LDPC解碼器在通用式圖形處理器上之設計
A Highly Parallel Design for Irregular LDPC Decoding on GPGPUs
作者: 邱奏翰
Chiu, Tsou-Han
賴伯承
Lai, Bo-Cheng
電子工程學系 電子研究所
關鍵字: 低密度奇偶檢查碼;圖形處理器;LDPC;GPU
公開日期: 2013
摘要: 低密度奇偶檢查碼解碼是一個複雜且動態運行的行為,為了獲得更高的解碼性能,需要一個強大且具有彈性的的運算平台。通用式圖形處理器(GPGPU)是一個多核高效率的處理器,能夠處理大量的平行運算和有效地提高運算效能,儘管通用式圖形處理器的表現常受限於不足的資料頻寬去支援大量的處理核心的讀取要求。這篇論文專注在設計一個高效能的低密度奇偶檢查碼解碼在時下的通用式圖形處理器。此論文針對傳統的點基礎低密度奇偶檢查碼解碼,提出了一種新穎的資料管理方法來達到更好的解碼效率。此篇論文更進一步提出新穎的線基礎解碼,在線基礎解碼下,資料能用更簡單的方式編排,同時得到和點基礎解碼一樣的記憶體存取效率。此篇論文藉由廣泛的分析和測量,探討兩種平行演算法的設計考量和兩者的優劣點,並給出完整的設計方法流程和完整的效能提升比較。實驗結果顯示,平行演算法跑在Tesla C2050通用式圖形處理器比單核演算法跑在高檔的中央處理器加速126.47倍,最大解碼速率可達到111.43Mbps.
The complex decoding scheme and dynamic execution behavior of LDPC decoding necessitate a powerful yet flexible computation platform to attain high performance. GPGPUs are many-core throughput processors that enable massive parallel computing and superior performance enhancement. However, the GPGPU performance is usually confined by the insufficient data bandwidth to support the demand from enormous processing cores. This paper focuses on designing a high performance LDPC decoding on modern GPGPUs. A novel data management for the conventional node-based LDPC design scheme is proposed and demonstrated better performance enhancement. This paper further introduces an innovative edge-based design scheme that facilitates easier data layout and enables efficient memory accesses when compared with the conventional node-based designs. By comprehensively exploring the design concerns and trade-offs from these two parallelism schemes, this paper proposes complete design solutions for each scheme and has demonstrated significant performance enhancement. The experiments on the Tesla C2050 GPGPU have demonstrated up to 126.47x runtime improvement, when compared with an LDPC decoder on a high-end CPU. The maximum throughput can reach 111.43 Mbps.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079911623
http://hdl.handle.net/11536/73479
顯示於類別:畢業論文