高速低密度同位元檢查區塊/迴旋碼解碼器之設計與實現

标题:	高速低密度同位元检查区块/回旋码解码器之设计与实现 Design and Implementation of High-Throughput LDPC-BC/CC Decoders
作者:	陈志龙 Chen, Chih-Lung 张锡嘉李镇宜 Chang, Hsie-Chia Lee, Chen-Yi 电子研究所
关键字:	低密度同位元检查码;高速低密度同位元检查回旋码;错误更正码解码器;LDPC;LDPC-CC;ECC Decoder;High-throughput implementation
公开日期:	2012
摘要:	在行动通讯系统里，高运算量的通道编解码模组往往扮演相当关键的角色，不仅要达到传输需求的高吞吐量，也必须降低伴随而来的功率消耗，以提供具有技术竞争力的解决方案。近年来，低密度同位元检查区块码因为解码效能优异，被广泛使用于各种通讯规格里，然而文献中低密度同位元检查区块码解码器难以提供弹性编码率与可变码字长度。相反地，低密度同位元检查回旋码结合了近似低密度同位元检查区块码的优异解码效能与旋回码的可变码字长度特性，却面临解码延迟过长、平行度低、解码吞吐量偏低等缺点，如何达到Gb/s的吞吐量并且降低功率消耗仍是重大挑战。据此，本论文研究探讨低密度同位元检查区块码与低密度同位元检查旋回码的解码器设计以达成更高吞吐量与低佳的能量效率。在低密度同位元检查区块码的部份，本论文设计了(2048, 1920)不规则低密度同位元检查区块码解码器，利用提出的CP-PEG演算法进行码的建置以达成更佳解码效能，但伴随高码率15/16而来的高节点维度也造成实现上的瓶颈。为了设计高码率低密度同位元检查区块码解码器，本论文提出了以变数节点为主的循序排程来减少叠代次数、单一管线解码器架构来减少信息储存记忆体容量、以及检查节点最佳化来进一步缩减暂存器数量，跟传统架构相比，可节省73%的信息储存记忆体容量。藉由90奈米制程下线，此低密度同位元检查区块码解码器测试晶片可以在1.4伏特操作电压下达到最高11.5Gbps吞吐量，晶片面积为2.7 × 1.4 mm2，并可在达成IEEE 802.15.3c吞吐规范量5.77Gbps的情况下将电压下降至0.8V，能量效率可达0.033nJ/bit。对于低密度同位元检查回旋码的部份，本论文实现了一个(491,3,6)时变的低密度同位元检查回旋码解码器晶片，结合了演算法层级、节点层级、位元层级的最佳化，以可接受的硬体代价与功率达成超过2Gbps的吞吐量。演算法层级改善了即时变数节点启动排程，将通道值隐藏至其它信息之中，不但能达到log-BP演算法的两倍快解码收敛速度，也能减少17%的信息储存记忆体容量。节点层级的最佳化则复制了多套检查节点与变数节点与提出对应的架构，提高平行度的结果达到12倍的吞吐量。至于位元层级最佳化则提高了操作频率，混合分割式FIFO把记忆体容量分割储存至多块双埠记忆体中，不仅能提供足够的记忆体频宽给多套节点使用，同时也降低功率消耗。结合了这些技术，90奈米的低密度同位元检查回旋码解码器测试晶片占用2.37 × 1.14 mm2面积，最高吞吐量在1.2V操作电压下为2.37Gb/s，能量效率0.024nJ/bit比区块码解码器更佳，若将电压下降至0.8伏特可进入低功耗模式，在达成1.58Gb/s吞吐量的情况下只消耗90.2mW的功耗。总结本论文提出的两个实作结果，可提供涵盖数百Mbps至数个Gbps的吞吐量范围、具弹性的码率与可调大小的frame、优异解码效能、以及出色的硬体与功耗效率。藉此可使低密度同位元检查区块码与低密度同位元检查回旋码比其它错误更正码更具有竞争力。 The channel coding module with high computation load plays an important role in wireless communication system. The competitive design must not only meet the system requirements in high throughput but also improve the energy efficiency. In the past decade, LDPC block codes (LDPC-BCs) are widely adopted in communication specifications for excellent error-correcting performance and high throughput. However, the state-of-the-art designs of LDPC-BC decoders show their weakness for providing flexible code-rates and variable codeword length. Contrarily, the LDPC convolutional codes (LDPC-CCs) combine the excellent error-correcting performance similar to LDPC block codes and variable data frame size similar to convolutional codes. But the drawbacks of LDPC-CC include the long decoding latency, low parallelism, and low to medium decoding throughput. How to achieve over Gbps throughput and to reduce the power consumption are still difficult to LDPC-CC decoder design. Accordingly, this dissertation investigates both LDPC-BCs and LDPC-CCs to explore the potential for higher throughput and better energy efficiency. For LDPC-BCs, an (2048, 1920) irregular LDPC code is generated by proposed CP-PEG algorithm with better performance than other PEG-based codes; however, the large check node degrees introduced by high code-rate 15/16 become the implementation bottleneck. To design such a high code-rate LDPC decoder, our approach features variable-node-centric sequential scheduling to reduce iteration number, single pipelined decoder architecture to lessen the message storage memory size, as well as optimized check node unit to further compress the register number. Overall 73% message storage memory is saved as compared with traditional architecture. Fabricated in 90nm 1P9M CMOS technology, the test chip of LDPC-BC decoder could achieve maximum 11.5Gbps throughput under 1.4V supply voltage with core area of 2.7 × 1.4 mm^2. The energy efficiency is only 0.033 nJ/bit with 5.77 Gb/s at 0.8V to meet IEEE 802.15.3c requirements. For LDPC-CCs, a (491,3,6) time-varying LDPC-CC decoder chip is implemented. The proposed design combines the algorithm level, node level, and bit level optimizations to achieve over 2Gb/s throughput with acceptable hardware cost and power. The algorithm level optimization is the on-demand variable node activation scheduling with concealing channel values, which can not only achieve twice faster decoding convergence speed than log-belief propagation (log-BP) algorithm but also reduce the 17% message storage capacity. The node level optimization duplicates the check node units and variable node units and unfolds the message storage FIFOs so that the throughput becomes twelve multiplying with clock frequency. In the meantime the bit level optimization is employed to retime the critical path such that the higher clock frequency can be achieved and message storage size is slightly reduced. Furthermore, a novel hybrid-partitioned FIFO is proposed to provide sufficient memory bandwidth to processing units and alleviate power consumption. With these schemes, a test chip of proposed LDPC-CC decoder has been fabricated in 90nm CMOS technology with core area of 2.37 × 1.14 mm^2. Maximum throughput 2.37Gb/s is measured under 1.2V supply with energy efficiency of 0.024nJ/bit/proc. Depending on the operation mode, power can be scaled down to 90.2mW while maintaining 1.58Gb/s at 0.8V supply. Eventually these two works provide good features covering hundreds Mbps to several Gbps throughput range, flexible code rates, adjustable frame size, excellent performance, and better hardware/power efficiency. The proposed methodologies would make LDPC codes more competitive to the other error-control codes.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079511823 http://hdl.handle.net/11536/41060
显示于类别：	Thesis