标题: 多模式通道解码器应用于无线通讯系统之设计与实现
Design and Implementation of Multi-Mode Channel Decoders For Wireless Communication Systems
作者: 严绍维
Yen, Shao-Wei
周世杰
张锡嘉
Jou, Shyh-Jye
Chang, Hsie-Chia
电子研究所
关键字: 低密度同位元检查码;涡轮码;多模式;LDPC Codes;Turbo Codes;Multi-Mode
公开日期: 2012
摘要: 本论文探讨低密度同位元检查区块码解码器与涡轮解码器应用于数位通讯系统。针对可变的通道环境及多重使用者需求,通道解码器设计可支援多重模式,包含多重码率及多重码字长度。根据标准中制定的规格,在符合规格制定的吞吐量下,我们提出适当的解码方式及架构以达到更好的硬体与能量效率。
我们提出应用于IEEE 802.16e标准的全模低密度同位元检查区块码解码器晶片。由于同位元检查矩阵可分解成数条包含循环位移的次矩阵,相位重叠的讯息传递解码被提出用来即刻更新讯息,进而提高解码吞吐量。在仅有一个基于位移的排列架构下,自我绕线交换网络用来合并IEEE 802.16e中规定的十九种不同次矩阵大小,并提供无拥挤之平行讯息绕线。在九十奈米1P9M CMOS制程下,此晶片在码率为5/6码字长度为2304的规格下,在20个重复次数解码可达到吞吐量105Mb/s。为了达到IEEE 802.16e的最大吞吐量规格,此晶片可执行在频率109MHz并在电压1伏特下消耗186mW。
针对IEEE 802.16e规格,我们亦提出单一处理器的回旋涡轮解码器,并可符合所有模式。常态化最大对数解码演算法用来降低运算复杂度同时可以维持纠正错误效能。在架构上,使用了反向滑动视窗,无除法可调式交插器以及两阶段读取记忆体来降低记忆体使用量和硬体复杂度。在实现于90奈米1P9M CMOS制程下,此回旋涡轮解码器只有303K的逻辑个数,并可达到吞吐量30Mb/s,而只消耗23.mW在0.9伏特,以达到能量效率0.78nJ/bit。为了达到IEEE 802.16e进阶版IEEE 802.16m的高吞吐量规格,我们提出了多个处理器的回旋涡轮码解码器,可支援39种码字长度。在使用混合平行处理方式,我们提出的平行回旋涡轮解码器可以达到更高的解码吞吐量。根据对于近规律排列交插器的分析,无竞争特性的四阶段位移网络可以用来降低绕线复杂度以及临界路径延迟。循环排列以及两阶段读取记忆体可以降低多工器和记忆体的使用量,进而达到50%的多工器减少及21%记忆体面积减少。在实现于90奈米制程下,所提出的平行回旋涡轮解码器可以达到最大吞吐量515Mb/s,同时硬体效率0.936Mbps/K-gate和能量效率0.44nJ/bit。
我们也提出应用于IEEE 802.15.3c标准之四模式低密度同位元低密度同位元检查区块码晶片。藉由列基底层级排程,常态化最小-累加演算法,可在维持相同的效能下减少一半的循环解码。根据同位元检查矩阵的特性,我们设计可参数共用之八-十六-三十二输入排序器来处理四种不同码率的低密度同位元检查区块码。排序输入重新分配以及预先编排绕线网络用来减轻绕线复杂度,可达到64%的多工器输入个数降低。此外,在编码器的部分,加法累加器位移暂存器电路可减少硬体复杂度。实现在65奈米1P10M CMOS制程下,我们提出的低密度同位元检查区块码解码器晶片可达到最高吞吐量5.79Gbps,同时硬体效率为3.7Gbps/mm2及能量效率为62.4pJ/bit。
针对不同需求的数位通讯系统,有低功率和高吞吐量,我们提供了多模式的低密度同位元检查区块码及回旋涡轮码解码器。此外,这些成果都展现了相当好的硬体及能量效率,所以都可以用来使用在有类似通道编码规格的架构上。
This dissertation investigates LDPC and turbo decoders for digital communication systems. For variant channel conditions and multiple users issue, the channel decoders are designed to support multi-modes consisting multiple code-rates and code lengths. Based on the specification in the standards, we proposed appropriate decoding methodology and architecture to achieve better hardware/energy efficiency while satisfying the data rate of specifications.
An LDPC decoder chip fully compliant to IEEE 802.16e applications is presented. Since the parity check matrix is decomposed into several rows consisting of cyclic-shifted sub-matrices, a phase-overlapping message passing scheme is applied to update messages immediately, leading to enhanced decoding throughput. With only one shifter-based permutation structure, a self-routing switch network is proposed to merge 19 different sub-matrix sizes as defined in IEEE 802.16e and enable parallel message routing without congestion. Fabricated in the 90nm 1P9M CMOS process, this chip achieves 105Mb/s at 20 iterations while decoding the rate-5/6 2304-bit code under 150MHz operation frequency. To meet the maximum data rate in IEEE 802.16e, this chip operates under 109MHz frequency and dissipates 186mW at 1.0V supply.
A single-MAP-based convoulutional turbo codes (CTC) decoder fully compliant to IEEE 802.16e application is proposed. The normalized Max-Log MAP decoding algorithm is utilized to lower computation complexity while maintaining the BER performance. The architecture uses reversed sliding window, division-free reconfigurable interleaver and two-phase extrinsic memory to reduce memory usage and hardware complexity. After fabricated in 90nm 1P9M CMOS process, the CTC decoder chip with 303K gate count can achieve 30Mb/s and consumes only 23.4mW at 0.9V with energy efficiency 0.78nJ/bit. For the high throughput requirement for IEEE 802.16m standard, which is the advanced version of IEEE 802.16e, a multiple-MAP-based CTC decoder supporting 39 block sizes is presented. By using the hybrid parallel methodology, our parallel CTC decoder can achieve higher throughput for different block sizes. According to the analysis of almost-regular-permutation (ARP) interleaver, the 4-stage barrel shift network with the contention-free property is proposed to reduce the routing complexity and critical path delay. Both circular permutation compensation and two-phase memory accessing are proposed to decrease the usage of multiplexers and memory, leading to 50% reduction of multiplexers and 21% area reduction of memories. After implemented in a 90-nm CMOS process, the proposed parallel CTC decoder can achieve maximum 515Mbps throughput with the hardware efficiency of 0.936Mbps/K-gate and energy efficiency of 0.44nJ/bit, respectively.
A LDPC codec chip supporting four code rates of IEEE 802.15.3c applications is presented. After utilizing row-based layered scheduling, the normalized min-sum (NMS) algorithm can reduce half the iteration number while maintaining similar performance. According to the unique code structure of parity check matrix, a reconfigurable 8/16/32-input sorter is designed to deal with LDPC codes in four different code rates. Both sorter input reallocation and pre-coded routing switch are proposed to alleviate routing complexity, leading to 64% input reduction of multiplexers. In addition, an adder-accumulator-shift register (AASR) circuit is proposed for the LDPC encoder to reduce hardware complexity. After implemented in 65-nm 1P10M CMOS process, the proposed LDPC decoder chip can achieve maximum 5.79Gbps throughput with the hardware efficiency of 3.7Gbps/mm2 and energy efficiency of 62.4pJ/bit, respectively.
These three works provide multi-modes LDPC and CTC decoders for different issues as low power, high throughput for digital communication systems. Besides, these works also show the great hardware/power efficiency such that the proposed architectures can be applied to other standards with similar channel codings.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079511828
http://hdl.handle.net/11536/41062
显示于类别:Thesis