標題: 低密度奇偶校驗碼解碼演算法之創新設計以及高速部份平行解碼器架構之設計
Novel LDPC Decoding Algorithms and Designs of High-Throughput Partial-Parallel Decoder Architecture
作者: 洪瑞徽
Jui-Hui Hung
陳紹基
Sau-Gee Chen
電子研究所
關鍵字: 低密度奇偶校驗碼;解碼器;通道編碼;硬體架構;超大型積體電路;LDPC;decoder;channel coding;architecture;VLSI
公開日期: 2006
摘要: 低密度奇偶校驗碼(Low-Density Parity-Check Code)最早是由R.G. Gallager博士在1962於其博士論文發表,此種編碼方式有著非常優越的性能,且其更正能力更遠超其他的編碼技術,使得其解碼效能十分的逼近向農極限 (Shannon limit)。本論文分別針對LDPC的解碼演算法以及硬體的架構上做討論與改進。 在解碼演算法方面,有鑑於在低量化位元數的fixed-point設計中,MP (Message Passing) algorithm family 的效能很差。本篇論文提出了一個新的演算法稱為Fixed-window Ranging and Voting (FRV)演算法,可以在MP演算法解碼過程之後,進一步的找出錯誤的位元並加以更正,而不需要做病徵(Syndrome)測試,使得效能得以增進約0.5dB並且延後了error floor 的出現,且只需要增加非常少量的硬體。和直接提高量化精確度的方法比較,FRV algorithm是更有效率的選擇。 另外,本論文又探討了BF (Bit-flipping) 演算法的效能與潛力,並且提出了一個全新的演算法,稱為Low-Correlation Multi-Bit Flipping (LCMBF) 演算法。此方法在相同的iteration number之下可以大幅的提升BF family演算法的效能達1~2dB以上,而且由於BF family 演算法並非訊息傳遞類型的演算法,並不需要儲存解碼過程中的訊息,也就是可以大大減少記憶體的需求,因此在考慮硬體實現的因素,LCMBF演算法有很大的可能性取代MP 演算法成為LDPC decoder的新主流。 在硬體架構上,由於現今的CNU (Check Node Unit)架構只針對特定的輸入數目並且是以手工的方式做最佳化,而並無一個有系統化的架構設計。因此本論文提出了三個演算法,並可依據演算法的結果建構出相應的CNU架構。這些演算法適用於不同的設計要求,但皆有著非常短的運算時間與非常低的複雜度。除此之外,每個演算法皆是適用於任何輸入數目的CNU,並且可以輕易的程式化。 最後,本論文針對802.16e的標準,利用更有效率的記憶體架構,設計了一個新的partial-parallel、high-throughput且 area-efficient的LDPC decoders,並且採用本論文提出的系統化CNU架構。在802.16e的標準之下,採用(576, 288)的LDPC code 可以達到1.45 Gbps 的data throughput。採用UMC 0.18 製程,其chip gate count為884k,功率消耗為565 mW。本論文也實作了FRV演算法的硬體設計及其比較,結果顯示使用FRV演算法的設計只會微量的增加約3k gate count及12mW的功率消耗,即可獲得超過0.5dB的效能增益。相較於直接提高量化精確度的方法,採用FRV演算法 可以節省約127k chip gate count和 77mW的功率消耗。
Low-Density parity check (LDPC) code was first introduced by Gallager, which can achieve performance close to Shannon bound. This thesis focuses on the improvement of decoding algorithms and efficient designs of decoder hardware architectures. In improving decoding algorithms, in order to the better performances, LDPC decoding processes often use the family of Message Passing (MP) algorithms. However, this thesis finds out that performances of MP algorithms would degrade seriously in fixed-point designs with short word lengths. To alleviate this problem, this thesis proposes a novel algorithm, called Fixed-window Ranging and Voting (FRV) Algorithm which can further identify error bits after each decoding iteration process in fixed-point design when using MP algorithms. Based on the algorithm, fixed-point decoding performances are much closer to floating-point results than conventional MP algorithms, only at the cost of very little hardware area. In [7:5] fixed-point quantized configuration, this algorithm can improve the performance by 0.5dB and delay the onset point of the error floor occurrence over the conventional algorithms. Furthermore, in order to totally avoid the fixed-point implementation problem, this thesis rediscovers the potential of Bit-flipping (BF) algorithm family and presents a novel multi-bit flipping algorithm, called Low-Correlation Multi-Bit Flipping (LCMBF) algorithm. This algorithm can improve performance of BF algorithms by 1 to 2 dB. Since BF algorithms are not based on message passing, it does not need to storage all messages in the iteration processes. It means LCMBF algorithm has very high potential to replace MP algorithm family as the new main decoding method particularly in hardware realization. In hardware implementation, since existing CNU design techniques only do customized optimization case by case for a specific CNU’s comparator architecture with a prescribed input number, but do not propose a systematic and generalized design approach for arbitrary input numbers. For cases with long code lengths, it will be impractical to optimize the designs by hand. Thus, this thesis presents three CNU comparison algorithms to solve this problem for the ease of the hardware design. These algorithms are based on different design concepts and all have very short delay time and very low complexity. Besides, each proposed algorithm suits for CNU with arbitrary input number which is easy to program. Finally, this thesis presents a LDPC decoder design for 802.16e standard which is a more efficient LDPC decoder architecture (based on a proposed message reordering scheme) than existing designs. The decoder, which adopts the mentioned proposed CNU comparison algorithm, is partial parallel, high throughput and area efficient. The design for (576,288) LDPC codes can achieve a throughput of 1.45Gbps. It adopts UMC 0.18 process; its gate count is 884k and power consumption is 556mW. This thesis also realizes the FRV algorithm and does some comparison. By applying FRV algorithm one can gain 1dB performance improvement over the conventional algorithms, at the cost of only 3k gate count and power consumption of 12mW. Compared with the design of directly increasing quantized wordlength, the design due to FRV algorithm can reduce about 127k gates and power consumption of 72mW.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009411672
http://hdl.handle.net/11536/80586
顯示於類別:畢業論文