標題: 多模式通道解碼器應用於無線通訊系統之設計與實現
Design and Implementation of Multi-Mode Channel Decoders For Wireless Communication Systems
作者: 嚴紹維
Yen, Shao-Wei
Jou, Shyh-Jye
Chang, Hsie-Chia
關鍵字: 低密度同位元檢查碼;渦輪碼;多模式;LDPC Codes;Turbo Codes;Multi-Mode
公開日期: 2012
摘要: 本論文探討低密度同位元檢查區塊碼解碼器與渦輪解碼器應用於數位通訊系統。針對可變的通道環境及多重使用者需求,通道解碼器設計可支援多重模式,包含多重碼率及多重碼字長度。根據標準中制定的規格,在符合規格制定的吞吐量下,我們提出適當的解碼方式及架構以達到更好的硬體與能量效率。 我們提出應用於IEEE 802.16e標準的全模低密度同位元檢查區塊碼解碼器晶片。由於同位元檢查矩陣可分解成數條包含循環位移的次矩陣,相位重疊的訊息傳遞解碼被提出用來即刻更新訊息,進而提高解碼吞吐量。在僅有一個基於位移的排列架構下,自我繞線交換網絡用來合併IEEE 802.16e中規定的十九種不同次矩陣大小,並提供無擁擠之平行訊息繞線。在九十奈米1P9M CMOS製程下,此晶片在碼率為5/6碼字長度為2304的規格下,在20個重複次數解碼可達到吞吐量105Mb/s。為了達到IEEE 802.16e的最大吞吐量規格,此晶片可執行在頻率109MHz並在電壓1伏特下消耗186mW。 針對IEEE 802.16e規格,我們亦提出單一處理器的迴旋渦輪解碼器,並可符合所有模式。常態化最大對數解碼演算法用來降低運算複雜度同時可以維持糾正錯誤效能。在架構上,使用了反向滑動視窗,無除法可調式交插器以及兩階段讀取記憶體來降低記憶體使用量和硬體複雜度。在實現於90奈米1P9M CMOS製程下,此迴旋渦輪解碼器只有303K的邏輯個數,並可達到吞吐量30Mb/s,而只消耗23.mW在0.9伏特,以達到能量效率0.78nJ/bit。為了達到IEEE 802.16e進階版IEEE 802.16m的高吞吐量規格,我們提出了多個處理器的迴旋渦輪碼解碼器,可支援39種碼字長度。在使用混合平行處理方式,我們提出的平行迴旋渦輪解碼器可以達到更高的解碼吞吐量。根據對於近規律排列交插器的分析,無競爭特性的四階段位移網絡可以用來降低繞線複雜度以及臨界路徑延遲。循環排列以及兩階段讀取記憶體可以降低多工器和記憶體的使用量,進而達到50%的多工器減少及21%記憶體面積減少。在實現於90奈米製程下,所提出的平行迴旋渦輪解碼器可以達到最大吞吐量515Mb/s,同時硬體效率0.936Mbps/K-gate和能量效率0.44nJ/bit。 我們也提出應用於IEEE 802.15.3c標準之四模式低密度同位元低密度同位元檢查區塊碼晶片。藉由列基底層級排程,常態化最小-累加演算法,可在維持相同的效能下減少一半的循環解碼。根據同位元檢查矩陣的特性,我們設計可參數共用之八-十六-三十二輸入排序器來處理四種不同碼率的低密度同位元檢查區塊碼。排序輸入重新分配以及預先編排繞線網絡用來減輕繞線複雜度,可達到64%的多工器輸入個數降低。此外,在編碼器的部分,加法累加器位移暫存器電路可減少硬體複雜度。實現在65奈米1P10M CMOS製程下,我們提出的低密度同位元檢查區塊碼解碼器晶片可達到最高吞吐量5.79Gbps,同時硬體效率為3.7Gbps/mm2及能量效率為62.4pJ/bit。 針對不同需求的數位通訊系統,有低功率和高吞吐量,我們提供了多模式的低密度同位元檢查區塊碼及迴旋渦輪碼解碼器。此外,這些成果都展現了相當好的硬體及能量效率,所以都可以用來使用在有類似通道編碼規格的架構上。
This dissertation investigates LDPC and turbo decoders for digital communication systems. For variant channel conditions and multiple users issue, the channel decoders are designed to support multi-modes consisting multiple code-rates and code lengths. Based on the specification in the standards, we proposed appropriate decoding methodology and architecture to achieve better hardware/energy efficiency while satisfying the data rate of specifications. An LDPC decoder chip fully compliant to IEEE 802.16e applications is presented. Since the parity check matrix is decomposed into several rows consisting of cyclic-shifted sub-matrices, a phase-overlapping message passing scheme is applied to update messages immediately, leading to enhanced decoding throughput. With only one shifter-based permutation structure, a self-routing switch network is proposed to merge 19 different sub-matrix sizes as defined in IEEE 802.16e and enable parallel message routing without congestion. Fabricated in the 90nm 1P9M CMOS process, this chip achieves 105Mb/s at 20 iterations while decoding the rate-5/6 2304-bit code under 150MHz operation frequency. To meet the maximum data rate in IEEE 802.16e, this chip operates under 109MHz frequency and dissipates 186mW at 1.0V supply. A single-MAP-based convoulutional turbo codes (CTC) decoder fully compliant to IEEE 802.16e application is proposed. The normalized Max-Log MAP decoding algorithm is utilized to lower computation complexity while maintaining the BER performance. The architecture uses reversed sliding window, division-free reconfigurable interleaver and two-phase extrinsic memory to reduce memory usage and hardware complexity. After fabricated in 90nm 1P9M CMOS process, the CTC decoder chip with 303K gate count can achieve 30Mb/s and consumes only 23.4mW at 0.9V with energy efficiency 0.78nJ/bit. For the high throughput requirement for IEEE 802.16m standard, which is the advanced version of IEEE 802.16e, a multiple-MAP-based CTC decoder supporting 39 block sizes is presented. By using the hybrid parallel methodology, our parallel CTC decoder can achieve higher throughput for different block sizes. According to the analysis of almost-regular-permutation (ARP) interleaver, the 4-stage barrel shift network with the contention-free property is proposed to reduce the routing complexity and critical path delay. Both circular permutation compensation and two-phase memory accessing are proposed to decrease the usage of multiplexers and memory, leading to 50% reduction of multiplexers and 21% area reduction of memories. After implemented in a 90-nm CMOS process, the proposed parallel CTC decoder can achieve maximum 515Mbps throughput with the hardware efficiency of 0.936Mbps/K-gate and energy efficiency of 0.44nJ/bit, respectively. A LDPC codec chip supporting four code rates of IEEE 802.15.3c applications is presented. After utilizing row-based layered scheduling, the normalized min-sum (NMS) algorithm can reduce half the iteration number while maintaining similar performance. According to the unique code structure of parity check matrix, a reconfigurable 8/16/32-input sorter is designed to deal with LDPC codes in four different code rates. Both sorter input reallocation and pre-coded routing switch are proposed to alleviate routing complexity, leading to 64% input reduction of multiplexers. In addition, an adder-accumulator-shift register (AASR) circuit is proposed for the LDPC encoder to reduce hardware complexity. After implemented in 65-nm 1P10M CMOS process, the proposed LDPC decoder chip can achieve maximum 5.79Gbps throughput with the hardware efficiency of 3.7Gbps/mm2 and energy efficiency of 62.4pJ/bit, respectively. These three works provide multi-modes LDPC and CTC decoders for different issues as low power, high throughput for digital communication systems. Besides, these works also show the great hardware/power efficiency such that the proposed architectures can be applied to other standards with similar channel codings.
Appears in Collections:Thesis