Title: | 應用於NAND型快閃記憶體系統之BCH編解碼器之研究 Research on BCH Codec for NAND Flash Memory Systems |
Authors: | 楊其衡 Yang, Chi-Heng 張錫嘉 Chang, Hsie-Chia 電子工程學系 電子研究所 |
Keywords: | 錯誤更正碼;快閃記憶體;ECC;Flash Memory;BCH code |
Issue Date: | 2014 |
Abstract: | 在 NAND 型快閃記憶體系統中,由於製程科技不斷進步以及多階儲存單元技 術不斷突破,其元件資料儲存密度大幅提升,單位成本也因此大幅降低。伴隨著 些進步而來,NAND 型快閃記憶體的可靠度以及使用壽命也因此大受影響。由於 其元件特性使然,在 NAND 型快閃記憶體的生命週期中,錯誤發生機率隨時間 逐漸提高。因此,支援多種不同的更錯能力被視為 BCH 編解碼器設計上的一大 重點。如此一來,在不同的錯誤發生機率情況下,便能提供適合的更錯能力,以 避免不必要的時間與功耗上的浪費。然而,支援多種不同的更錯能力常常伴隨著 硬體龐大的代價。針對此一需求,本論文藉由運用有限域中的最小多項式作為編 解碼運算的基本單元,提出一系列適用於編碼器、徵兆值計算器、簡氏搜尋等部 分的多更錯能力硬體架構。根據 UMC 65 nm CMOS 製程實作結果與量測數據, 本論文提出的支援 1 至 24 位元更錯能力之 BCH 編解碼器晶片,以及支援 60 至 84 位元更錯能力之 BCH 編解碼器僅需 73.0K 與 168.6K 單位邏輯閘的硬體複雜 度,可分別達到 1.33 Gb/s 與 1.60 Gb/s。
在大多數情況下,應用於 NAND 型快閃記憶體系統之 BCH 編解碼器常被設 計為三級管線化架構以提升解碼器之資料吞吐量。在此情形之下,管線之時間延 遲往往是由徵兆值計算器與簡氏搜尋兩大部分所決定。因此,如何充分運用有限 的時間延遲條件,將關鍵方程求解器這個被視為 BCH 解碼器中最具運算複雜度 的部分運用最低的硬體資源完成運算,是 BCH 解碼器硬體設計上的一大課題。 針對這點,相較於現有文獻以及產業界慣用無倒數 BM(iBM)演算法,本論文提 出運用低複雜度的混合域除法器搭配原始的 BM 演算法,成功地大幅降低運算過 程中需要處理的乘法數目,也因此有效降低所需的硬體資源。根據 UMC 90 nm CMOS 製程實作結果,本論文提出的(9200, 8192, 72)單模式 BCH 編解碼器可在 147.8K 的硬體複雜度之下,達到 3.08 Gb/s 的資料吞吐量。藉由進一步簡化控制 邏輯,本論文提出之 24/48/60/72 位元更錯 BCH 編解碼器晶片更是僅需 124.7K 之硬體複雜度。
針對固態硬碟之極高資料吞吐量的應用特性,本論文提出之適用於低時間延 遲關鍵方程求解器的截斷式簡化無倒數 BM 演算法(Truncated Simplified Inversion-less Berlekamp-Massey, TSiBM)除了比起先前相關文獻大幅降低達 40% 之硬體複雜度外,亦能有效應用於本文提出之多通道 BCH 編解碼器。根據 UMC 90 nm CMOS 製程實作結果,適用於(18244, 16384, 124) BCH 碼之 TSiBM 關鍵方 程求解器僅需 243.3K 單位邏輯閘。將其應用於八通道 BCH 編解碼器環境之下, 平均每個通道之 BCH 編解碼器複雜度僅 264.3K 單位邏輯閘,比起常用的三級管 線化架構,其硬體減少幅度達 20.5%。
本文所提出之針對 NAND 型快閃記憶體應用的各種 BCH 硬體架構及演算 法,據實作結果證明,除了具備極具競爭力的硬體複雜度之外,更能提供強大的 錯誤更錯能力。 This dissertation investigates the BCH codes from algorithms to architecture designs and VLSI circuit implementations for various design targets of NAND Flash memory application. In order to meet the varying requirement of error correction capability, the multi-mode error correcting capability feature is very crucial for BCH codec in NAND Flash memory. By exploiting the properties of minimal polynomial, the proposed minimal-polynomial-based architectures for encoder, syndrome calculator and Chien search logic of BCH codes can not only support multiple error correcting capabilities but also preserve high area-efficiency. In our MPCN-based BCH codec designs with arbitrary error correcting capability, the test chip supports t = 1 ~ 24 bits while the other design supports enhanced t = 60 ~ 84 bits. These designs can respectively achieve 1.33 Gb/s and 1.60 Gb/s at the cost of 73.0K and 168.6K gate-count from the implementation results under 65 nm CMOS technology. For most applications of NAND Flash memory, the BCH decoder is widely designed as 3-stage pipelined structure. By using composite field divider with BM algorithm and dynamically assigning the clock cycles of each iteration, the proposed area-efficient KES with echelon scheduling successfully reduces the usage of hardware components without performance degradation. According to the implementation results, the single-mode (9200, 8192; 72) design can provide 3.08 Gb/s throughput with 147.8K gate-count from the post-APR simulation result. Based on the single-mode design, the revised test chip is able to support multi-mode t=24, 48, 60, 72 error correcting capability with reduced 124.7K gate-count based on the implementation results under 90 nm CMOS technology. Moreover, it is proved that all presented BCH codec designs meet the performance target of industry standard. For the extremely high throughput demand application such as the solid-state drives (SSD), the proposed Truncated Simplified Inversion-less Berlekamp-Massey (TSiBM) algorithm for low-latency key equation solver achieves significant hardware reduction as compared with the previous works and also is able to efficiently serve the computation within the proposed multi-channel BCH decoder. The proposed KES design with TSiBM algorithm requires 243.3K gate-count achieving 41.4% reduction compared with SiBM algorithm. In the proposed scenario of 8-channel BCH decoder, the average gate-count of BCH decoder per channel is 264.3K, resulting in 20.5% reduction as compared with the traditional 3-stage pipelined structure. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079711620 http://hdl.handle.net/11536/76513 |
Appears in Collections: | Thesis |