## 國立交通大學 ## 電子工程學系 電子研究所 碩士論文 實現在 40 奈米製程技術下可操縱在低操縱電壓的 512Kb 8T 靜態隨機存取記憶體 1896 Low VDD<sub>MIN</sub> 512Kb 8T SRAM Design in 40nm CMOS Process 研 究 生: 陳建亨 指導教授:黃 威 教授 中華民國一百年九月 # 實現在 40 奈米製程技術下可操縱在低操縱電壓的 512Kb 8T 靜態隨機存取記憶體 ## Low VDD<sub>MIN</sub> 512Kb 8T SRAM Design in 40nm CMOS Process 研究生:陳建亨 Student: Chien-Hen Chen 指導教授: 黄 威 教授 Advisor: Prof. Wei Hwang Submitted to Department of Electronics Engineering and Institute of Electronics College of Electrical Engineering and Computer Engineering National Chiao Tung University In Partial Fulfillment of the Requirements for the Degree of Master of Science in **Electronics Engineering** Sep. 2011 Hsinchu, Taiwan, Republic of China 中華民國一〇〇年九月 # 實現在 40 奈米製程技術下可操縱在低操縱電壓的 512Kb 8T 靜態隨機存取記憶體 學生:陳建亨 指導教授:黃 威 教授 #### 國立交通大學電子工程學系電子研究所 #### 摘 要 隨著攜帶式電子產品,像是 PDA,筆記型電腦,行動手機越來越廣泛的運 用,減少整個 SoC 晶片的能量消耗變成了一個很重要的課題。在先進的 SoC 晶 片設計中,静態隨機存取記憶體通常佔有最大的面積,所以主宰了效能跟總能量 消耗。降低操縱電壓是一個最有效可以減少總能量消耗的辦法。傳統的 6 顆電晶 體静態隨機存取記憶因為有讀取干擾跟寫入半選擇干擾,所以並不適合操縱在低 電壓。製程跟溫度的變異亦讓傳統的 6T 静態隨機存取記憶的穩定性嚴重下降。 這篇論文提出了一個可操縱在低電壓的 512Kb 的静態隨機存取記憶陣列。此陣 列是使用一個具有無讀取干擾跟資料感測寫入幫助的 8T 静態隨機存取記憶。交 叉結構可以消除寫入半干擾並且可以使用位元交錯結構。可調式讀取/寫入時間 追蹤複製電路,漣波位元線讀取架構跟區域位元線保持電路提高了讀取跟寫入的 穩定性跟能力。借由使用具有資料感測寫入幫助的 8T 静態隨機存取記憶及寫入 /讀取幫助雷路,此記憶體陣列可以操縱在低電壓。一個 512Kb 的測試晶片建立 在 UMC 的 40nm 製程上。經由電路佈局後的模擬顯示,在 1.1 伏特可操縱在 502.5 百萬赫茲以及在 0.6 伏特可操縱在 28.42 百萬赫茲。在 1.1 伏特下的寫入/讀取耗 能分別為 13.5 微瓦/百萬赫茲及 6.87 微瓦/百萬赫茲。最低操縱電壓可達到 0.45 伏特。 ## Low VDD<sub>MIN</sub> 512Kb 8T SRAM Design in 40nm CMOS Process Student: Chien-Hen Chen Advisors: Prof. Wei Hwang Department of Electronics Engineering & Institute of Electronics National Chiao-Tung University #### **ABSTRACT** According to more and more wide-ranging usage of portable electronic devices such as PDA, notebook, cell-phone and so on, reducing the power consumption of whole SoC chip is one of the most important topics. In advanced SoC chip design, SRAM usually occupies the biggest area of SoC design so SRAM can dominate the performance and total power consumption of SoC design. One of the most effective ways to reduce the total power consumption is scaling down the operating voltage. Conventional 6T SRAM is not suitable for low-voltage region because of read-disturb and half-select disturb. Process and temperature variation also severely degrade the stability of conventional 6T SRAM. This thesis presents a 512Kb low VDD<sub>MIN</sub> SRAM design with a disturb-free and data-aware write-assist (DAWA) 8T bit-cell. Cross-point structure of this 8T cell can eliminate the half-select disturb and support bit-interleaving structure. Adaptive read/write time tracing replica circuit, ripple bit-line read scheme and local bit-line keeper design enhance read-stability and write-ability. By this DAWA 8T bit-cell and R/W assist scheme, SRAM array can achieve low-voltage operating voltage. A 512Kb test chip is fabricated in UMC 40nm low-power (LP) CMOS process. Post-layout simulation results demonstrate operating frequency of 502.5 MHz at 1.1V and 28.42MHz at 0.6V. The power consumption of read and write operation are 13.5µW/MHz and 6.87 μW/MHz, respectively. The VDDmin of the proposed 512Kb 8T SRAM array is 0.45V. #### 誌謝 可以完成這篇論文,要感謝的人實在很多很多。首先要感謝我的指導教授黃威教授提供了我研究的環境跟資源,讓我在研究的時候無後顧之憂。另外也教導了我在做研究的時候正確的態度跟方法,以及研究的方向。另外碩二的時候有接下經濟部先進製程 SRAM 的科專計畫,在此也特別感謝計畫主持人 Digital VLSI Lab 的莊景德教授對於研究內容的指導。 接著要感謝指導我的學長楊皓義,在研究的這一路上不停的給了我很多觀點跟方向讓我學習,遇到困難的時候也會不厭其煩的指導我渡過難關。此外也感謝黃柏蒼、張銘宏及謝維致這三位博班學長們的幫助及討論。另外還要感謝同樣在LPMD實驗室的楊博任,杜威宏及林上圓,還有在Digital VLSI Lab的張琦昕跟林耕慶,還有 MSCS Lab 的王紹丞等同居的碩班同學在這一路上的相互扶持跟鼓勵,也是我在碩班研究生活上的一大助力,在此一併感謝。 #### 1896 最後要感謝最親愛的父母親跟弟弟在這一路上對我的鼓勵跟支持,沒有這些親愛的家人,就沒有這本論文的產生。此外特別感謝來新竹讀書這兩年認識的有一群號稱"黃黃人"的好朋友們,在這段時間總是在背後支持我,成為我心靈最重要的支柱。在這邊無法用有限的文字表達無限的感謝,總歸一句,感謝大家。 ## **Contents** | Chapter 1 Introduction | 1 | |---------------------------------------------------------------------------|----| | 1.1 Background | 1 | | 1.2 Motivation | 1 | | 1.3 Thesis Organization | 5 | | Chapter 2 Overview of Recent Low-Voltage SRAM | 6 | | 2.1 Introduction | 6 | | 2.2 Conventional SRAM design | 6 | | 2.2.1 Typical SRAM Array Structure | 6 | | 2.2.2 SRAM Column Circuitry | 7 | | 2.2.3 Conventional 6T SRAM Bit-cell | 8 | | 2.3 SRAM Bit-cell Stability and Write-ability | 10 | | 2.3.1 Static noise margin | 10 | | 2.3.2 Write trip point | 13 | | 2.3.3 The disadvantage of 6T SRAM Bit-cell | | | 2.4 Power Dissipation | 16 | | 2.4.1 Dynamic Power Dissipation | 16 | | 2.4.2 Leakage Power Dissipation. | 17 | | 2.4.3 Short-circuit Power Dissipation | 23 | | 2.4.4 Total Power Dissipation | 24 | | 2.5 Recent Low-voltage SRAM Bit-cell Design | 24 | | 2.5.1 Single-ended 8T SRAM Bit-cell | 24 | | 2.5.2 Differential Data-aware Power-supplied 8T Cell | 25 | | 2.5.3 A Large σVTH/VDD tolerant zigzag 8T SRAM (Z8T) | 27 | | 2.5.4 A Novel Column-Decoupled 8T Cell (CDC-8T) | 28 | | 2.5.5 Schmitt-Trigger-Based SRAM Design (ST cell) | 28 | | 2.5.6 Column Line Assist 10T SRAM cell (CLA-10T) | 29 | | 2.6 Recent Read/Write Assist Circuit Technique in Low-Voltage SRAM Design | 30 | | 2.6.1 Single-ended Sensing Innovation | 30 | | 2.6.2 Negative-biased Read/Write Assist Circuit | 32 | | 2.6.3 Voltage modulation Read/Write Assist Circuit | 34 | | 2.7 Summary | 38 | | Chapter 3 A Low VDD <sub>MIN</sub> Date-aware Write-Assist 8T SRAM wit | h | | Adaptive Write-time Tracing Replica Circuit | 39 | | 3.1 Introduction | 39 | | 3.2 Cell Structure and Basic Operation of Cell. | 40 | | 3.2.1 Pre-charged / Stand-by Mode | 42 | | 3.2.2 Read Mode | 42 | |-----------------------------------------------------------------------------------|------| | 3.2.3 Data-aware Write-assist Write Mode | 43 | | 3.3 Cell Stability | 47 | | 3.3.1 Cell-stability in Hold / Read Operation | 47 | | 3.3.2 Write-ability (WTP) | 48 | | 3.3.3 Cell-stability of Column Half-select Bit-cell | 50 | | 3.4 Adaptive Write-time Tracing Replica Circuit | 53 | | 3.5 Adaptive VVSS Driver and WWL Driver | 55 | | 3.6 Simulation Result | 56 | | 3.7 Summary | 58 | | Chapter 4 Ripple Bit-line Read Scheme with Local Bit-line Ke | eper | | Design | 60 | | 4.1 Introduction | 60 | | 4.2 Prior Art - Cascaded Bit-line Read Scheme | 61 | | 4.3 Ripple Bit-line Read Scheme | 62 | | 4.3.1 Local Evaluation Circuit | 62 | | 4.3.2 Multiplexer with Leakage Current Replica Keeper | 64 | | 4.3.3 Simulation results | 68 | | 4.3.3 Simulation results | 70 | | 4.5 Proposed Local Bit-line Keeper Design | 73 | | 4.5.1 Basic concept of Proposed Local Bit-line Keeper | 73 | | 4.5.2 The Schematic of Proposed Local Bit-line Keeper | 74 | | 4.5.3 Simulation Result | 75 | | 4.6 Summary | 80 | | Chapter 5 Low VDD <sub>MIN</sub> 512Kb 8T SRAM Design in 40nm CM | OS | | process | | | 5.1 Introduction | | | 5.2 Architecture of Proposed Low VDD <sub>MIN</sub> 512Kb 8T SRAM | 81 | | 5.3 Peripheral Circuit | 84 | | 5.3.1 Power-gating Word-line Driver | 84 | | 5.3.2 Finite-state Machine and WL pulse-width Controller | 85 | | 5.3.3 I/O Buffer | 88 | | 5.3.4 Local Bank Selection Circuit | 89 | | 5.3.5 XP and YP Decoder | 91 | | 5.4 Design Implementation & Test-flow of Proposed Low VDD <sub>MIN</sub> 512Kb 87 | SRAM | | | | | 5.5 Post-layout Simulation Result | 94 | | 5 5 1 Performance | 94 | | 5.5.2 Power Consumption | 98 | |-------------------------------------|-----| | 5.6 Summary | 99 | | Chapter 6 Conclusions & Future Work | 101 | | 6.1 Conclusions | 101 | | 6.2 Future Work | 103 | | Reference | 104 | | Chapter 1 | 104 | | Chapter 2 | 104 | | Chapter 3 | 109 | | Chapter 4 | 109 | | Chapter 5 | 111 | | Vita | 113 | ## **List of Figures** | Fig.1. 1 Energy and Delay versus VDD of SRAM and Logic Circuit [1.6] | 2 | |--------------------------------------------------------------------------------|------| | Fig.1. 2 (a) Conventional 6T SRAM cell (b) Alternative 8T SRAM cell [1.7] | 3 | | Fig.1. 3 Hold/Read SNM and write margin on different VDD under $V_T$ variation | ı | | [1.1] | 4 | | Fig.1. 4 256 6T Cells per BL, the ratio of $I_{READ}$ / $I_{LEAK, TOT}$ [1.1] | 4 | | | | | | | | Fig.2. 1 SRAM array structure [2.1] | 7 | | Fig.2. 2 SRAM column circuitry | | | Fig.2. 3 Schematic of conventional 6T SRAM bit-cell | 8 | | Fig.2. 4 Layout view of conventional 6T SRAM bit-cell [2.2] | | | Fig.2. 5 Read operation of conventional 6T SRAM bit-cell | | | Fig.2. 6 Write operation of conventional 6T SRAM bit-cell | 10 | | Fig.2. 7 The standard setup of measuring the Hold SNM | 11 | | Fig.2. 8 Butterfly curve of Hold SNM (Conventional 6T SRAM cell) [2.3] | 11 | | Fig.2. 9 The standard setup of measuring the Read SNM | 12 | | Fig.2. 10 Hold SNM vs. Read SNM (Conventional 6T SRAM cell) [2.5] | 12 | | Fig.2. 11 The standard setup of measuring the WTP | 13 | | Fig.2. 12 Write trip point (WTP) of conventional 6T SRAM bit cell [2.7] | 14 | | Fig.2. 13 The read-disturb of 6T SRAM in different process [2.8] | 15 | | Fig.2. 14 Conflicting requirements between hold/read/write operations [2.9] | 15 | | Fig.2. 15 Circuit diagram of inverter | 17 | | Fig.2. 16 Leakage current in NMOS transistor [2.13] | 18 | | Fig.2. 17 Components of tunneling current [2.13] | 20 | | Fig.2. 18 Gate leakage current vs. gate oxide thickness [2.14] | 21 | | Fig.2. 19 Gate leakage current vs. gate voltage [2.14] | 21 | | Fig.2. 20 Leakage current in conventional 6T SRAM bit-cell [2.12] | 22 | | Fig.2. 21 Single-ended 8T SRAM bit-cell [2.6] | 24 | | Fig.2. 22 Read SNM of conventional 6T-SRAM & single-ended 8T-SRAM [2.6] | ].25 | | Fig.2. 23 (a) The schematic of D2AP 8T cell (b) waveform of write operation (c | ) | | waveform of read operation [2.17] | 26 | | Fig.2. 24 Schematic of Z8T SRAM cell (b) Layout view of Z8T SRAM cell [2.18 | 3] | | | 27 | | Fig.2. 25 Selected and half-selected cell of CDC-8T [2.19] | 28 | | Fig.2. 26 ST SRAM bit-cell schematics [2.2] [2.20] | 29 | | Fig.2. 27 CLA-10T SRAM (a) schematic (b) layout [2.21] | 29 | | Fig. 2. 28 (a) Sense-amplifier redundancy [2.23] (b) Re-configurable sensing | | |-----------------------------------------------------------------------------------|-----| | scheme [2.24] (c) VGND replica scheme [2.25] (d) Ac-coupled sense | | | amplifier [2.26] | .31 | | Fig.2. 29 (a) Cross-point 8T SRAM (b) Read/write negative-biased circuit (c) | | | waveform [2.27] | .32 | | Fig.2. 30 (a) Negative write bias scheme (b) Read boost scheme [2.28] | .32 | | Fig.2. 31 (a) Write driver with boost control (b) Waveform of write cycle [2.29]. | .33 | | Fig.2. 32 (a) Constant-negative-level write-buffer (b) negative BL level [2.31] | .33 | | Fig.2. 33 Level-programmable word-line driver (LPWD) and dynamic array | | | supply control (DASC) [2.32] | .35 | | Fig.2. 34 (a) Boost RWL and WWL in read/write cycle (b) Write back scheme | | | [2.33] | .35 | | Fig.2. 35 (a) Word-line under-drive circuit (b) Write assist circuit [2.34] | .36 | | Fig.2. 36 (a) Modified Word-line under-drive (WLUD) circuit (b) Rise time of | | | WLUD circuit [2.31] | .36 | | Fig.2. 37 The adaptive dynamic word-line under-drive circuit (ADWLUD) [2.35 | ]36 | | Fig.2. 38 Multi-step word-line control technology for word-line drover [2.36] | .38 | | ESA | | | Fig.3. 1 Minimum area comparison between 6T and 8T cells [3.1] | .39 | | Fig.3. 2 Schematic of DAWA 8T cell | .40 | | Fig.3. 3 Layout view of DAWA 8T SRAM bit-cell (exclude M1 and M2) | | | Fig.3. 4 Pre-charged / stand-by mode of DAWA 8T bit-cell | | | Fig.3. 5 Read mode of DAWA 8T bit-cell | | | Fig.3. 6 Column half-select disturb in write operation | | | Fig.3. 7 Write "1" mode and write "0" mode | | | Fig.3. 8 Control signal waveforms | .46 | | Fig.3. 9 Cell array structure | .46 | | Fig.3. 10 RSNM in different PVT Condition (a) VDD=1.1V (b) VDD=0.6V | .48 | | Fig.3. 11 Comparison of SNM between Conventional 6T and DAWA 8T | .48 | | Fig.3. 12 WTP in different PVT condition (a) VDD=1.1V (b) VDD=0.6V | .49 | | Fig.3. 13 Comparison of WTP between Conventional 6T and DAWA 8T | .49 | | Fig.3. 14 The worst cast of local VT shift (a) Column HS-cell (b) Write | .50 | | Fig.3. 15 VSM in different PVT condition (a) SS corner (b) TT corner (c) FF | | | corner (d) PFNS corner (e) PSNF corner | .52 | | Fig.3. 16 Adaptive write-time tracing replica circuit | .53 | | Fig.3. 17 Adaptive VVSS driver and WWL driver | .55 | | Fig. 3. 18 Waveform of adaptive write-time tracing circuit, adaptive VVSS driver | - | | and WWL driver | 56 | |------------------------------------------------------------------------------------|------------| | Fig.3. 19 Pulse width of WWL_EN in different PVT condition (a) VDD=1.1V (b) | | | VDD=0.6V | 37 | | Fig. 3. 20 WWL, WWLB and VVSS pulse width generated by adaptive write-time | <b>7</b> 0 | | tracing circuit in different PVT condition (a) VDD=1.1V (b) VDD=0.6V | 58 | | Fig.4. 1 (a) Hierarchical bit-line scheme (b) Cascaded bit-line scheme [4.1] | 61 | | Fig.4. 2 (a) Schematic of cascaded bit-line read scheme (b) Waveform [4.1] | 62 | | Fig.4. 3 (a) Local Evaluation Circuit (b) Ripple Bit-line read scheme | 62 | | Fig.4. 4 Access time and area overhead vs. LBL length | 64 | | Fig.4. 5 Bit-interleaving multiplexer with leakage current replica keeper | 65 | | Fig.4. 6 Basic concept of leakage current replica (LCR) keeper [4.3] | 66 | | Fig.4. 7 Voltage level of V_KPR in different PVT condition | 66 | | Fig.4. 8 Access time and area overhead depending on # of bit-cells in a column pe | | | multiplexer (a) VDD=1.1V (b) VDD=0.6V | 67 | | Fig.4. 9 Waveform of the ripple BL read scheme and multiplexer | 69 | | Fig.4. 10VGND scheme in read buffer [4.8] | 70 | | Fig.4. 11 Domino local bit-line Keeper in single-ended 8T SRAM [4.11] | 71 | | Fig.4. 12 Local BL keeper controlled by programmable inverter-chain [4.12] | 71 | | Fig.4. 13 Positive feedback sensing keeper [4,13] | 72 | | Fig.4. 14 Marginal bit-line leakage compensation (MBLC) scheme [4.14] | 72 | | Fig.4. 15 Basic concept of proposed local bit-line keeper design | 73 | | Fig.4. 16 Delay signal generated by replica bit-line and discharged path | 74 | | Fig.4. 17 Time of generating delay signal KPR_SIG in different PVT condition ( | (a) | | VDD=1.1V (b) VDD=0.6V | 76 | | Fig.4. 18 Worst case of leakage current problem in local bit-line | 77 | | Fig.4. 19 Read time in different PVT condition (a) VDD=1.1V (b) VDD=0.6V | 79 | | Fig.4. 20 Waveform of proposed local bit-line keeper design | 79 | | Fig.5. 1 The floorplan of Low VDD <sub>MIN</sub> 512Kb 8T SRAM | 82 | | Fig. 5. 2 Pin count and pin definition of proposed Low VDD <sub>MIN</sub> 512Kb 8T | | | SRAM | 83 | | Fig.5. 3 Power-gating word-line driver | | | Fig.5. 4 Finite-state machine. | | | Fig.5. 5 WL pulse-width controller | | | Fig.5. 6 Waveform of finite-state machine | | | Fig.5. 7 Waveform of WL pulse-width controller | .88 | |----------------------------------------------------------------------------------------------------|------| | Fig.5. 8 I/O buffer | .88 | | Fig.5. 9 Global word-line decoder | .89 | | Fig.5. 10 Local bank selection circuit | .90 | | Fig.5. 11 Waveform of local bank selection circuit | .91 | | Fig.5. 12 (a) XP decoder (b) YP decoder | .91 | | Fig.5. 13 Low VDDMIN 512Kb 8T SRAM Design on Test Chip | .93 | | Fig.5. 14 Layout view of low V <sub>DDMIN</sub> 512Kb 8T SRAM | .93 | | Fig.5. 15 Test flow of the proposed low V <sub>DDMIN</sub> 512Kb 8T SRAM | .94 | | Fig.5. 16 Post-layout simulation result: Frequency vs. VDD | .96 | | Fig.5. 17 Power-delay-product of read/write operation | .99 | | Fig.6. 1 Pipeline scheme of SRAM Design1 List of Tables | | | Table.3. 1 Basic operation of LOW V <sub>DDMIN</sub> DAWA 8T Table.4. 1 Sensing time in SS corner | | | | | | Table 4. 2 Misreading time vs. sensing time in FF & PSNF corner | | | Table.4. 3 Leakage time in FF corner & 125°C, transient time = 100ns | . 78 | | | 0.4 | | Table 5. 2 Leakage current and slavy rate in power gating driver | | | Table.5. 2 Leakage current and slew rate in power-gating driver | | | Table.5. 4 Post-simulation result (Access-time and write-time), VDD=1.1 V | | | Table.5. 5 Specification compared to recent low-power SRAM design | | | Table.5. 6 Power consumption in R/W operation and STBY mode, VDD=1.1V | | | Table.5. 7 Power consumption in R/W operation and STBY mode, VDD=0.7V | | | 1aoic.5. / 1 ower consumption in to w operation and 5 fb 1 mode, vDD=0./ v | , 70 | ## Chapter 1 ### Introduction #### 1.1 Background By the Moore's law, we know that the size of single transistor will become half of original size each 18 months. It also means that total number of transistors in a SoC design doubles each 18 months, too. According to Moore's law, we know that the performance and capacity of chip can improve at exponential rates. In modern IC, dynamic power consumption dominates the total power consumption. Furthermore, in advanced process, leakage power becomes more and more critical. So the circuit technique to cope with leakage current problem is very important in modern IC's design. In modern system-on-chip (SoC) design, SRAM is the most common choice for embedded-memory. SRAM macro will occupies the biggest area of whole chip, about 90%, in recent IC design. The area, performance and total power consumption of SRAM will dominate the chip area, performance and total power consumption at all. #### 1.2 Motivation Due to SRAM dominating the total power consumption of whole chip, reducing the total power consumption of SRAM is one of the most effective way to reduce the total power of whole chip. Power consumption can expressed as follow $$P_{total} = P_{dynamic} + P_{leakage} + P_{short-current}$$ (1.1) Where $$P_{dynamic} = \alpha \times f \times C \times VDD^2$$ , $P_{leakage} = VDD \times I_{leakage}$ , $P_{sc} = VDD \times I_{mean}$ According to the equation, dynamic power is proportional to the square of supply voltage. Leakage power and short-current power is proportional to the supply voltage. We can make a conclusion that voltage scaling of whole chip is one of most effective ways to reduce the total power consumption of whole chip. Voltage scaling of SRAM circuit design becomes a most important topic of low-power issue. But when we reduce the operating voltage in SRAM design, it generates some problems such as the degradation of cell stability and write-ability due to threshold voltage variation. Threshold voltage variation is caused by random doping fluctuation (PDF), line-edge toughness and local oxide thickness variation. SRAM is a bottleneck subject in low-voltage modern IC design. As shown in Fig. 1.1, as VDD is reduced to 0.5V~0.7V, the power consumption per operation can dramatically reduce, but delay degrades not too much. As VDD is further decreased, timing delay increases dramatically and power consumption also increases due to leakage power consumption increasing by exponential rate in sub-threshold region. The SRAM circuit has minimum energy-delay-product at VDD =0.5 V~ 0.7V which is different from typical combinational digital circuit. Fig.1. 1 Energy and Delay versus VDD of SRAM and Logic Circuit [1.6] Conventional 6T SRAM cell is the most common SRAM bit-cell (Fig. 1.2 (a)). SRAM bit-cell should be stable during the stand-by, read and write operation. Each transistor of 6T SRAM cell should be adequately sized to promise the cell stability and write-ability. But for cell stability during stand-by or read, we should strengthen the cross-coupled inverter and weaken the pass-gates. But when we try to improve the write-ability, we should weaken the cross-coupled inverter and strengthen the pass-gates. The size decision between improving the cell stability and write-ability are opposite. Because of threshold-voltage variation in advanced process, the cell stability and write-ability are severely degraded in low-voltage operation. In addition, conventional 6T SRAM cell is suffered from read-disturb problem and half-select disturb problem which both can degrade the cell stability. In summary, conventional 6T SRAM cell is not suitable in low-voltage operation. The alternative 8T cell [1.6] is shown at Fig. 1.2(b). Its cell node is decoupled to the read buffer. The read SNM (RSNM) of 8T cell is similar to the Hold SNM of conventional 6T cell. However, the alternative 8T cell is also suffered from half-select disturb problem and it can't be applied in bit-interleaving structure which can resolve soft-error-rate in advanced process. So we should find a more suitable SRAM cell for low-voltage operation. Fig. shows the cell stability during read/hold mode and write-ability at VDD= $0.2V\sim1.0V$ under the $V_T$ variation situation. Fig.1. 2 (a) Conventional 6T SRAM cell (b) Alternative 8T SRAM cell [1.7] Fig.1. 3 Hold/Read SNM and write margin on different VDD under V<sub>T</sub> variation [1.1] As mentioned, leakage current problem can't be ignored in modern IC designs. It cause $I_{on}/I_{off}$ ratio problem in low-voltage SRAM circuit design. When reading data, depending on the data stored in the SRAM array, the total leakage current of un-accessed cell may be larger than the read current of accessed cell. It can cause the error of sensing data. This problem is shown is Fig. 1.4. Fig.1. 4 256 6T Cells per BL, the ratio of I<sub>READ</sub> / I<sub>LEAK, TOT</sub> [1.1] So we know that when we want to design SRAM array circuit in low-voltage region, first we should find a suitable SRAM cell which can have sufficient stability and write-ability in low-voltage operating region. Peripheral circuits which can enhance read-stability and write-ability in low-voltage are important, too. Leakage current problem in BL sensing scheme can't be ignored in low-voltage operating region. Short Local Bit-line structure reducing the BL leakage current or some technique compensating the BL leakage current to ensure the correctness of read operation is also important in low-voltage operating SRAM circuit Design. ### 1.3 Thesis Organization Following is the main contents of this thesis. In Chap 2 we will discuss the recent low-voltage SRAM design, which including some cell topologies and read/write assist circuit in low-voltage region. Basic SRAM operation and the definition of cell stability and write-ability will also be introduced. An 8T SRAM bit-cell with data-aware write-assist (DAWA) scheme will be introduced in Chap 3, including read/write operation and the simulation result of cell stability and write ability in different PVT condition. The adaptive write time tracing circuit will also be introduced in Chap 3, too. In Chap 4, a ripple bit-line read scheme with hierarchical global bit-line will be introduced. We will discuss the leakage current topic in SRAM design in this chapter. A Local bit-line keeper design used in this DAWA 8T cell will also be introduced in Chap 4. In Chap 5, we design a Low-VDD<sub>MIN</sub> 512kb 8T SRAM design in 40nm CMOS process, including the circuits which are refereed in Chap 3 and Chap 4. Performance and power of this low VDD<sub>MIN</sub> 512kb 8T SRAM will be discussed, too. Chapter 6 finally concludes this thesis. ## Chapter 2 ## **Overview of Recent Low-Voltage SRAM** #### 2.1 Introduction This chapter is a study of basic SRAM operation, basic concept of SRAM bit-cell stability, power dissipation of CMOS technology and recent low-voltage SRAM design. Typical SRAM array structure and the schematic and operation of conventional 6T SRAM cell are presented in section 2.2. The basic concept of SRAM bit-cell stability and the measurement of SRAM bit-cell stability / write-ability are presented in section 2.3. Power dissipation, consisted of dynamic power dissipation, leakage dissipation and short-circuit dissipation, are introduced in section 2.4. Some recent low-voltage SRAM bit-cells are presented in section 2.5. Recent read-assist or write-assist circuitry in low-voltage SRAM design is presented in section 2.6. We make a summary in section 2.7. ### 2.2 Conventional SRAM design #### 2.2.1 Typical SRAM Array Structure Fig. 2.1 shows a typical SRAM array structure, with four pages of N-rows by M-bits array. We can see the SRAM cell, row decoder, column decoder, sensing-amplifier, write driver, and timing block. Row decoder is gated by appropriate timing signal generated by timing block. Row decoder decodes the X-address signal and selects one of word-lines turning on. Z-decoder decodes the Z-address signal to select the pages. Column decoder, which decodes the Y-address signal, allows sharing a single sense amplifier of multiple columns. In a word-oriented SRAM, each address points to a word of n bits (common value are 4, 8, 16, 32 or 64 bits). Timing signals in all of SRAM macro are generated by timing block. ### 2.2.2 SRAM Column Circuitry Fig.2. 2 SRAM column circuitry Fig. 2.2 shows the SRAM column circuitry of a normal single-port SRAM. The pre-charged circuit is composed of two pre-charged PMOS and one equalizer PMOS. Pre-charged PMOS can pre-charge both bit-lines to VDD on stand-by mode. Equalizer PMOS equalizes both bit-lines to same voltage to solve voltage offset before read/write operation. Write-driver pulls down one of bit-lines to "0" determined by input data. We can also see the schematic of typical differential-based sensing amplifier consisted of cross-couple inverter-type latch. Once the sensing amplifier is activated, it can sense the different voltage between bit-line pair, latch read data through regenerative feedback. #### 2.2.3 Conventional 6T SRAM Bit-cell Fig.2. 3 Schematic of conventional 6T SRAM bit-cell Fig. 2.3 shows the schematic of conventional 6T SRAM bit-cell. This SRAM cell consists of two cross-coupled inverters (PL, PR, NL and NR), two pass-gate (AXL and AXR) transistors, one word-line and two complementary bit-lines. Two cross-couple inverters store the binary data and pass-gate transistors provide read/write access into the cell. Word-line controls the bit-line pair connecting to cross-couple inverter by turning on pass-gate transistors M5 and M6. Fig. 2.4 shows the layout view of this 6T SRAM bit-cell. Fig.2. 4 Layout view of conventional 6T SRAM bit-cell [2.2] Fig. 2.5 shows the read operation of conventional 6T SRAM bit-cell. When read operation occurs, both BLC and BLT are pre-charged to VDD initially, and then WL is turn on by WL driver addressed by X-address signal, connecting the cell node to the bit-lines. For each bit-cell in a word, determining by storage data, one of two bit-lines will be discharged. Different signal is generated and send to the sensing-amplifier. Sensing-amplifiers transfer the different signal into the full swing signal and latch the data at the read output. Fig.2. 5 Read operation of conventional 6T SRAM bit-cell Fig. 2.6 shows the write operation of conventional 6T SRAM bit-cell. When write operation occurs, both BLC and BLT are pre-charged to VDD initially, and then one of two bit-lines is pulled down by write driver determined by input data and WL is turn on by WL driver addressed by X-address signal, connecting the cell node to the bit-lines. Once one of two bit-lines is pulled down and word-line is turn on, data will be flipped and then latched by cross-coupled inverter. Finally input data is written into the cell node. Fig.2. 6 Write operation of conventional 6T SRAM bit-cell 1896 ### 2.3 SRAM Bit-cell Stability and Write-ability #### 2.3.1 Static noise margin During stand-by mode, the WL of the 6T cell is low so that the pass-gate transistor is off. The cross-coupled inverters must maintain bi-stable operating point to properly hold the data. The best common way to measure the stability of cross-coupled inverters is the static noise margin (SNM) [2.3]. We defined Hold SNM as the maximum DC noise voltage which is placed to the gate and the Q/QB of cross-coupled inverters which can be tolerated by the SRAM cell. In other words, hold SNM is the value of the maximum voltage which is placed between the gate and the Q/QB of cross-coupled inverters which can maintain the storage data of the SRAM cell. Fig. 2.7 shows the setup schematic for measuring the Hold SNM. VN is the DC noise source which is placed to the gate and the Q/QB. When VN is increased, the Hold SNM of cell is changed. Fig.2. 7 The standard setup of measuring the Hold SNM Fig. 2.8 shows the butterfly curve, which is the most common way to represent the SNM graphically. The butterfly curve contains the voltage transfer characteristic (VTC) of one of cross-coupled and inverse VTC of the other inverter. The SNM is defined as the length of the side of the largest square which can be fit into the eyes of the butterfly curve. Fig.2. 8 Butterfly curve of Hold SNM (Conventional 6T SRAM cell) [2.3] During read operation, the WL of the 6T cell is high so that the pass-gate transistor is on. The cross-coupled inverters must maintain bi-stable operating point to properly hold the data when read operation. The best common way to measure the read stability is the read static noise margin (RSNM) [2.3]. The definition of SNM is defined in the previous section. Fig. 2.9 shows the setup schematic for measuring the Read SNM. WL is on for reading access; BLC and BLT are both set to VDD to indicate the initial condition of read access. Fig.2. 9 The standard setup of measuring the Read SNM In conventional 6T cell, read SNM is worse than hold SNM. When read operation, WL is turn on and one of two bit-lines is discharge to a lower voltage. The "0" node will rise a little voltage because of the voltage diving effect between the pass transistor and pull-down transistor. Once the disturb voltage rise near to the trip point of the inverter, data will be flipped. Fig. 2.10 shows the butterfly curve of read SNM and hold SNM of conventional 6T SRAM bit-cell, revealing read SNM is worse than hold SNM in conventional 6T SRAM bit-cell. Fig.2. 10 Hold SNM vs. Read SNM (Conventional 6T SRAM cell) [2.5] #### 2.3.2 Write trip point Although there are many way to measure the write-ability of SRAM bit-cell, find the write trip point (WTP) is the most common and easiest way to measure the write-ability of SRAM bit-cell. WTP is defined as the maximum voltage on the BL which can make the data in the cell be flipped. Fig. 2.11 shows the setup schematic for measuring the WTP. Fig. 2.12 shows the result of finding the WTP. We fix one of the two bit-lines at high voltage and sweep the other bit line from VDD to GND, trying to flip the data in the cell. Once bit-line is lowered to a certain level, data will be flipped, indicating a successful write access. Larger WTP indicates the smaller voltage we need to lower bit-line voltage below VDD for a successful write. If the WTP value is negative, it means that although we lower the voltage of bit-line to GND, the data will not be written in. It is impossible to write data into the cell when WTP is negative, unless we can lower the bit-line voltage to negative level. We make a conclusion that higher WTP represents the better write-ability. Fig.2. 11 The standard setup of measuring the WTP Fig.2. 12 Write trip point (WTP) of conventional 6T SRAM bit cell [2.7] #### 2.3.3 The disadvantage of 6T SRAM Bit-cell In 0.35, 0.18 and 0.13 µm CMOS process, 6T SRAM Bit-cell is the main structure in embedded memory. Due to some disadvantages, 6T SRAM bit-cell is not suitable under 90nm process. It is also not suitable in low-voltage operation. First is read and half-select disturbs. The reason of generating read disturb is introduced in previous section. Furthermore, in advanced process and low-voltage operation, threshold voltage variation maybe makes the disturb voltage larger than the trip voltage of the other inverter, which can cause losing the original data in the bit-cell. In additional, there is a half-select disturb in interleaving SRAM structure. When a read/write operation, one of word-lines is turn on, the half-select cells in the same row are also doing pseudo read operation, where read-disturb also occurs. Fig. 2.13 shows the read-disturb of 6T SRAM bit-cell under different process. Cell-switch point voltage and read-down level voltage may overlap under 90nm process. Fig.2. 13 The read-disturb of 6T SRAM in different process [2.8] The second is the conflicting requirements between different operations. During stand-by mode, if we want to improve the cell stability, we can higher the trip point of inverters by making the pull-down transistors weaker and pull-up transistors stronger. We define this ratio as $\beta 1$ ratio. To improve read SNM and minimize read-disturb, we can make the pull-down transistors stronger and the pass-gate transistors weaker. We define this ratio as $\beta 2$ ratio. To improve the write-ability of SRAM cell, we can make the pull-up transistors weaker and the pass-gate transistors stronger. We define this ratio as $\beta 3$ ratio. We can find that one of three $\beta$ ratios is conflict to each other $\beta$ ratios, as Fig. 2.14 shows. As mentioned, conventional 6T SRAM cell is susceptible of large PVT variation and local $V_T$ mismatch in advanced process. Enlarge $\beta 2$ and $\beta 3$ ratio can stabilize the 6T SRAM bit-cell but increased much more area and consume much more power. Fig.2. 14 Conflicting requirements between hold/read/write operations [2.9] Consequently, the tradition 6T bit-cell stability (Hold SNM and Read SNM) and write-ability will degrade dramatically in low-voltage due to the PVT variation and local $V_T$ mismatch. Fig. 1.3 shows this result. In addition, as mentioned in chap 1.2, $I_{on}/I_{off}$ ratio decrease dramatically when operating voltage is scaled down. In summary, the VDD<sub>min</sub> of 6T SRAM bit-cell is limited to high voltage (e.g. >0.8V at 65nm) #### 2.4 Power Dissipation As equation (1.1), power dissipation in CMOS circuit is composed of three main components, dynamic power dissipation, leakage power dissipation, and short-circuit power dissipation. Each kind of power dissipation will be introduced in the following. ## 2.4.1 Dynamic Power Dissipation Fig. 2.15 shows a CMOS inverter with loading capacitance $C_L$ . The average dynamic power of NMOS and PMOS. The primary dynamic dissipation component is charging and discharging the load capacitance. Suppose the operating frequency of inverter is f and the input Vin is a square wave with a period T, the load capacitance $C_L$ will be charged and discharged T \* f times. In one complete charge and discharge cycle, a total charge of $Q = C_L *VDD$ will be charged or discharged in the $C_L$ . The average dynamic power of this inverter is given by $$P_{D} = \frac{1}{T} \int_{0}^{T/2} i_{N}(t) V_{out} dt + \frac{1}{T} \int_{0}^{T/2} i_{P}(t) (VDD - V_{out}) dt = \frac{V_{DD}}{T} \int_{0}^{T} i_{DD}(t) dt \quad (2.1)$$ Where we can replace $\int_{0}^{T} i_{DD}(t)dt$ to the total charge of the loading capacitance between the period T \* f \*C<sub>L\*</sub>VDD, the equation 2.1 can be simplified to $$P_D = C_L * V_{DD}^2 * f$$ (2.2) Because gates usually do not switch every cycle, we must consider switching probability; thus we add a switch factor $\alpha$ into equation 2.2. Dynamic power can be expressed as $$P_D = \alpha * C_L * V_{DD}^2 * f (2.3)$$ From equation 2.3, we know that dynamic power of logic gates is proportional to the square of supply voltage, switch factor, operating frequency and loading capacitor. Fig.2. 15 Circuit diagram of inverter #### 2.4.2 Leakage Power Dissipation As shown in Fig. 2.16 I<sub>1</sub> is Reverse-bias PN-junction current; I<sub>2</sub> is sub-threshold current; I<sub>3</sub> is gate oxide tunneling current; I<sub>4</sub> is gate hot-carrier injection current; I<sub>5</sub> is gate-induced drain current and I<sub>6</sub> is channel punch-through current. The mentioned six current are composed of leakage current in CMOS transistors. Sub-threshold current, gate-induced drain current and punch-through current are off-state leakage mechanisms, while Reverse-bias PN-junction current and oxide tunneling current are on-state leakage mechanisms. Gate hot-carrier injection current can occurs either in off-state or during the transistor bias states in transition. Each source of leakage current will be introduced in the followings. Fig.2. 16 Leakage current in NMOS transistor [2.13] #### **Reverse-bias PN-junction Current** Drain and source to well junctions are typically reversing biased, causing PN junction leakage current. There are two main components of a reverse-bias PN junction current, one is minority carrier diffusion/drift near the edge of the depletion region; the other is due to electron-hole pair generation in the depletion region of reversed-biased junction. In nano-scale MOSFETs, due to the use of high junction doping, large junction band-to-band tunneling (BTBT) occurs with drain at VDD and substrate at ground. The junction BTBT exponentially increases with an increase in the drain-to-substrate bias. We model Reverse-bias PN-junction current as following $$I_{in} = I_{in0} \exp(-\beta_{JN} (V_{DD} - |V_{db}|))$$ (2.4) Where $I_{jn0}$ is the junction leakage at $|V_{db}| = V_{DD}$ and $\beta_{JN}$ is a doping dependent factor. The area of the drain diffusion and the leakage current density has impact on Reverse-bias PN-junction current, which are determined by the doping concentration. #### **Sub-threshold Current** Sub-threshold or weak inversion conduction current flowing from drain to source during the $V_{gs}$ is below the threshold voltage (off-state). In the weak inversion, the minority carrier is small, but not zero. For the NMOS transistor, even if $V_{gs}$ = 0V, there is still a current path in the channel of the NMOS transistor due to the $V_{DD}$ potential of the $V_{DS}$ . Unlike the strong inversion region in which the drift current dominates, the sub-threshold current is dominated by the diffusion current. Due to short-channel effect, the sub-threshold current increases with an increase in the drain bias (Drain Induced Barrier Lowering) and a reduction of channel length ( $V_{TH}$ -roll off). Due to the body effect, the sub-threshold current reduces with the application if the reverse body-bias. We model the sub-threshold current as following $$I_{sub} = I_{sub0} \exp(\frac{V_{gs} - \eta_{DIBL}(V_{DD} - V_{bs}) + \lambda_{body}V_{bs}}{mkT/q}) \quad (2.5)$$ Where, $I_{sub0}$ is the sub-threshold current of a transistors at $V_{gs} = 0V$ , $V_{ds} = V_{DD}$ and $V_{bs} = 0$ , $\eta_{DIBL}$ is the DIBL coefficient, $\lambda_{body}$ is the body-effect coefficient and m is the sub-threshold swing factor. Sub-threshold current roughly increases by a factor of five at each new technology. Such increase is due to the scaling of sub-threshold voltage and short-current effect, caused by gate length reduction. In summary, sub-threshold current becomes the biggest source of leakage current in modern transistors. #### **Gate oxide Tunneling Current** Gate oxide tunneling current in transistors with ultra-thin gate oxide is due to the direct tunneling of electrons (or holes) through the gate dielectric. Oxide tunneling current increases exponentially with reduction in the oxide thickness and increase in the electric field across the oxide. Fig. 2.17 shows the components of Oxide tunneling current in a scaled NMOS transistor. Fig.2. 17 Components of tunneling current [2.13] Gate oxide tunneling current is composed to the three elements: - 1. Major components of oxide tunneling current are gate to source/drain overlap region current ( $I_{gdo}$ and $I_{gso}$ ). - 2. Gate-to-channel-current ( $I_{gc}$ ), which goes to the source ( $I_{gcs}$ ) or to the drain ( $I_{gcd}$ ) - 3. Gate-to-Substrate current (Igb) Therefore, the gate oxide tunneling current can be divided into the following components - 1. Gate-to-source $(I_{gs} = I_{gso} + I_{gcs})$ - 2. Gate-to-drain $(I_{gd} = I_{gdo} + I_{gcd})$ - 3. Gate-to-substrate (I<sub>gb</sub>) The overlap current dominates the gate oxide tunneling current in the "OFF" state whereas gate-to-channel dominates the gate oxide tunneling current in the "ON" state. We model gate oxide tunneling current as following $$I_{gOFF} = I_{gOFF0} e^{-\alpha_{gOFF}(V_{DD} - |V_{gd})|}$$ $$I_{gON} = I_{gON0} \left[ e^{-\alpha_{gON}(V_{DD} - |V_{gd})|} + e^{-\alpha_{gON}(V_{DD} - |V_{gs})|} \right]$$ (2.6) Where $I_{gOFF0}$ is the OFF state overlap tunneling leakage at $|V_{gd}| = V_{DD}$ and $I_{gON0}$ is the ON state gate-to-drain leakage at $|V_{gs}| = V_{DD}$ . The magnitude of the gate leakage current increases exponentially with the gate oxide thickness Tox and the $V_{gs}$ as shown in Fig. 2.18 and Fig. 2.19, respectively. [2-14] Fig.2. 18 Gate leakage current vs. gate oxide thickness [2.14] Fig.2. 19 Gate leakage current vs. gate voltage [2.14] #### **Gate hot-carrier Injection Current** In the short-channel transistor, because of high electric field near the $Si-SiO_2$ interface, electrons or holes can get sufficient energy from the electric field to field to cross the interface potential barrier and enter into the oxide layer. This is known as the gate hot-carrier injection current. #### **Gate-induced Drain Current** This current from drain to bulk is caused by high electrical fields in the gate-drain overlap region. Gate-induced drain current occurs in large $V_{DB}$ and generates carriers into the substrate and drain from surface traps or band-to-band tunneling. Thinner oxide thickness and higher $V_{DD}$ enhance the electric field and therefore increase GIDL. In addition, at low drain doping, the electric field is not too enough to cause tunneling. By contrast, at very high doping, the depletion width and tunneling volume is restricted, causing less GIDL. In summary, GIDL is worse for moderated drain doping. #### **Channel Punch-through Current** In short-channel devices, due to the proximity of the drain and the source, the depletion regions at the drain-substrate and source, the depletion regions at the drain-substrate and source-substrate junctions extended into the channel, As the channel length is reduced, if the doping is kept constant, the separation between the depletion region boundaries decreases. An increase in the reverse bias across the junctions (with increase in V<sub>DS</sub>) also pushes the junctions nearer to each other. When the combination of channel length and reverse bias leads to the merging of the depletion regions, channel punch-through current have occurred. #### Leakage current in Tradition 6T SRAM bit-cell Fig.2. 20 Leakage current in conventional 6T SRAM bit-cell [2.12] Fig. 2.20 shows all kind of leakage current in conventional 6T SRAM bit-cell, including sub-threshold leakage, gate leakage and junction leakage. Since most of the SRAM bit-cell in SRAM array is usually on the stand-by mode, leakage power dominates the total power consumption of SRAM. Considering the different leakage components of all transistors, we can count the total leakage of the cell as following $$I_{sub} = I_{subAXR} + I_{subNL} + I_{subPR}$$ $$I_{jn} = 2I_{jnAXL} + I_{jnAXR} + I_{jnNL} + I_{jnPR}$$ $$I_{gate} = 2I_{gOFF\_AXL} + I_{gOFF\_AXR} + I_{gON\_PR} + I_{gON\_NE} + 2I_{gOFF\_PL} + I_{gOFF\_NL}$$ $$I_{leak} = I_{sub} + I_{jn} + I_{gate}$$ $$(2.7)$$ #### 2.4.3 Short-circuit Power Dissipation Short circuit power dissipation occurs as both pull-up and pull-down networks are partially ON while the input switches, existing a direct path current flowing from the power supply to the ground. It increases as edge rates become slower because both networks are ON for more time. It decreases as load capacitance increases because with large loads the output only switches a small amount during the input transition, leading to a small $V_{DS}$ across one of the transistors. Short current power dissipation can be expressed as $$P_{SC} = I_{mean} V_{DD} \quad (2.8)$$ Where $I_{mean}$ is the mean value of the short circuit current, and $I_{mean}$ is modeled as [2-15]: $$P_{SC} = \frac{1}{12} \frac{\beta}{V_{DD}} (V_{DD} - 2V_T)^3 \frac{\tau}{T} \quad (2.9)$$ Where $\beta$ is the gain factor of a transistor, $\tau$ is the input rise/fall time. Although this is a simplified model, it reveals the fact that short current is affected by operating voltage, rising time or falling time of input signal, threshold voltage and operating frequency. In summary, decreasing operating voltage and rising/falling time of input signal and increasing threshold voltage decrease short-circuit power dissipation. #### 2.4.4 Total Power Dissipation As we mentioned in the previous section, we can make a conclusion of total power as following $$P_{total} = P_{leakage} + P_{dynamic} + P_{short-circuit}$$ (2.10) All kinds of power dissipation are relative to the operating voltage $V_{DD}$ . Last but not the least, because of operating voltage $V_{DD}$ dominating total power consumption, lowering the operating voltage $V_{DD}$ is the most effective way to reduce the total power consumption. ### 2.5 Recent Low-voltage SRAM Bit-cell Design ### 2.5.1 Single-ended 8T SRAM Bit-cell Fig.2. 21 Single-ended 8T SRAM bit-cell [2.6] Fig. 2.21 shows this single-ended 8T SRAM bit-cell. This cell added two extra transistors as read buffer, which make cell node decoupled from RBL. Consequently, this cell is read-disturb free and the read SNM of the cell is much better than conventional 6T cell. Because of the separation of read-port and write-port like register files, we can improve read-stability and write-ability without conflicting requirement. By using single-ended read port and hierarchical BL scheme. This cell is designed in a high-performance 32kb sub-array in 65nm PD-SOI CMOS process and can operate at 5.3GHz in 1.2V and 295MHz at 0.41V. Fig. 2.22 shows the improvement of Read SNM between this 8T cell and conventional 6T cell Fig.2. 22 Read SNM of conventional 6T-SRAM & single-ended 8T-SRAM [2.6] #### 1896 One of ISSCC 2010 paper about Core Implemented contributed by AMD [2.16], it said that single-ended 8T SRAM is commonly used in recent single-VCC microprocessor core for its performance critical low-level caches and multi-ported register-file arrays. However in write operation, once one WWL is pulled-up, all of pass-gate transistors in the same row are turn on. Because of full VDD on WBL and WBLB which are pre-charged in stand-by mode, so once WWL is pulled-up, stored node will be affected by BL, called pseudo read or Half-select disturb. In summary, this 8T cell eliminates read-disturb, improving the read SNM. But this 8T cell still suffers from Half-select disturb. #### 2.5.2 Differential Data-aware Power-supplied 8T Cell Fig. 2.23 (a) shows this Differential Data-aware power-supplied 8T cell. Unlike conventional SRAM cell, the cross-coupled inverter of this cell is supplied by bit-line pair instead of sharing the same power lines. Fig. 2.22(b) and (c) shows the waveform of read and write operations. Fig.2. 23 (a) The schematic of D2AP 8T cell (b) waveform of write operation (c) waveform of read operation [2.17] In stand-by mode, ZWL=0, both BL and BLB are pre-charged to VDD and VDDL and VDDR are pre-charged to VDD through PSWL and PSWR. In write-0 operation, ZWL = 0, WL = 1 and BL is pulled-down to 0, VDDL is reduced by PSWL, which improve the write-ability in write-0 operation. In contrast, in write-1 operation ZWL = 0, WL = 1 and BLB is pulled-down to 0, VDDR is reduced by PSWR. In read operation, ZWL and WL are VDD. There is an additional discharge path through PSW and PU transistors either read-1 or read-0. Due to additional discharge path and differential read scheme, the noise immunity and read access time is better than single-ended 8T cell. In half-select cell, self-negative feedback can reduce VDDL or VDDR, lowering the trigger point of inverter and providing a better cell stability. The disadvantage of this cell is floating "1" on un-selected row because of ZWL = 1. A 39Kb sub-array is designed in 45nm process. The VDD<sub>min</sub> is 540mV, 200~240mv better than single-ended 8T SRAM in same array structure. #### 2.5.3 A Large σVTH/VDD tolerant zigzag 8T SRAM (Z8T) Fig.2. 24 Schematic of Z8T SRAM cell (b) Layout view of Z8T SRAM cell [2.18] Fig. 2.24 shows the schematic and layout of $\sigma VTH/VDD$ tolerant zigzag 8T SRAM. In stand-by mode WWL=0 and RWL=1 so both BL and BLB is clamped to VDD through un-selected NR0 and NR1, reducing the BL leakage current. This Z8T cell can have long BL structure. When read-operation, WWL =0 and RWL = 0. RBL and RBLB will be discharged according to data in the selected cell. Because of cell node decoupled from RBL, the Read SNM of Z8T cell will be improved. The write-operation is similar to conventional 6T cell. A 32Kb sub-array with this Z8T cell is designed in 65nm process. Hierarchical WL structure and differential read and write-back sense amplified are used. The VDDmin of this sub-array is 440mV [2-18] #### 2.5.4 A Novel Column-Decoupled 8T Cell (CDC-8T) Fig.2. 25 Selected and half-selected cell of CDC-8T [2.19] Fig. 2.25 shows the selected cell and half-selected cell of this CDC-8T, which can eliminate half-select condition. On selected cell, GWLE is 0 and BDT0 is 1, so LWLE0 is 1 during read or write operation. On the half-selected cell, GWLE is 0 but BDT1 is. LWLE1 is 0. By this column-decoupled scheme, read half-select disturb can be eliminated. This CDC-8T cell can also be interleaved to solve the soft-error-rate problem by using simple ECC. The half-select free design enables further voltage scaling. The VDDmin of this cell is 150mv smaller than conventional 6T cell in 1.6Kb sub-array in 90nm PD/SOI process. [2-19] #### 2.5.5 Schmitt-Trigger-Based SRAM Design (ST cell) Fig. 2.26 shows a Schmitt-trigger-based SRAM bit-cell (ST cell). In hold operation, due to stack pull-down transistors, the hold SNM is better than conventional 6T cell. Input-dependent transfer characteristics of Schmitt-trigger improve both read-stability and write-ability. Furthermore, the storage node is isolated from the BL/BR because the WWL is off during read. It can improve read stability-too. In write operation, there are two discharge paths through AXL1/AXR1 and AXL2/AXR2 which can improve write-ability, too. This ST cell proposed 1.6X read-stability, 2X write-ability and 120mv lower read VDDmin compared to iso-area conventional 6T bit-cell in 130-nm CMOS process [2-2][2-20] Fig.2. 26 ST SRAM bit-cell schematics [2.2] [2.20] 1896 # 2.5.6 Column Line Assist 10T SRAM cell (CLA-10T) Fig.2. 27 CLA-10T SRAM (a) schematic (b) layout [2.21] Fig. 2.27 shows the schematic and layout of CLA-10T cell. The Read SNM is worse than previous bit-interleaving 10T SRAM [2-22] because both outside and inside pass-gate transistors are pulled-up to VDD when read operation and cell-node is not decoupled of BL. When read operation, WL is pulled-up to VDD, BL and BL/ are pre-charged to VDD in advance and CL and CL/ are pulled down to GND. The read current is larger than prior 10T SRAM due to an additional discharging path from BL to CL. In write operation, WL is VDD. One of BL pair will be discharged to GND, same as CL pair. There is an additional path from CL to cell node which improves the write-ability of this CLA-10T cell. A 128Kb CLA-10T SRAM array is designed in 45nm process. The VDDmin is 0.56V. # 2.6 Recent Read/Write Assist Circuit Technique in Low-Voltage SRAM Design In low-voltage SRAN Design, sometimes creating novel bit-cells in low-voltage SRAM is not sufficient. We also need some read or write assist circuit technique, such as single-ended sensing improvement circuit, negative bias on BL or VSS of bit-cell and suppressing or under-driving either the VDD of bit-cells or the voltage of word-lines. We will discuss recent R/W assist circuit technique as following section. #### 2.6.1 Single-ended Sensing Innovation Some alternative SRAM bit-cells, such as single-ended 8T SRAM [2.6], use the single-ended sensing scheme. Compared to differential sensing scheme, single-ended sensing scheme is slower due to full VDD swing. As mentioned, single-ended sensing scheme is more sensitive to have read failure in low-voltage SRAM Design. There are some single-ended sensing innovations in Fig. 2.28. Fig. 2.28 (a) shows sense-amplifier redundancy scheme by selecting a backup sense amplifier if the original one does not work [2.23]. Fig. 2.28 (b) shows the re-configurable sensing scheme for DVS, including NMOS input sense-amplifier and PMOS sense-amplifier. In high voltage operation, we choose NMOS input SA because inputs with a higher common-mode voltage results in faster solution of the outputs. In contrast, in low-voltage operation we choose PMOS input SA because PMOS input SA has faster resolution when common-mode inputs are close to GND [2.24]. Fig. 2.28 (c) shows the VGND replica scheme to generate the virtual GND voltage of sensing inverter in read buffer. The trip point of sensing inverter in read buffer is automatically adjusted to the midpoint between the high voltage and the low voltage of BLs [2.25]. Fig. 2.28 (d) shows the AC-coupled sense-amplifier to distinguish the sensing time between true "1" and false "1" at low-voltage operation [2.26] Fig.2. 28 (a) Sense-amplifier redundancy [2.23] (b) Re-configurable sensing scheme [2.24] (c) VGND replica scheme [2.25] (d) Ac-coupled sense amplifier [2.26] #### 2.6.2 Negative-biased Read/Write Assist Circuit The basic concept of negative-biased read/write assist circuit is pulling-down the BL to negative voltage in write operation [2.30] and pulling-down the VSS node of bit-cell in read operation [2.27]. Fig. 2.29 shows the negative-biased Read/Write Assist circuit (schematic and waveform) of cross-point 8T SRAM. In read operation, the VSM is pulled-down to negative voltage. In write operation, one of BL pair is pulled-down to negative voltage, as shown in Fig. 2.29 (b). A 1Mb SRAM array is designed in 45nm bulk LSTP CMOS process with cross-point 8T cell and negative-biased read/write assist circuit [2.27]. The VDDmin can be reached to 0.6V. The similar concept which is used to enhance read/write ability for dual-port SRAM is also shown in Fig. 2.29 [2.28]. A 1Mb SRAM array is designed in 45nm process. The VDDmin can be improved by 120mV. Fig.2. 29 (a) Cross-point 8T SRAM (b) Read/write negative-biased circuit (c) waveform [2.27] Fig. 2. 30 (a) Negative write bias scheme (b) Read boost scheme [2.28] Fig. 2.31 shows another way of negative-biased BL for write-ability improvement. The boost node Nboost is connects to 8 BL pairs and pre-charged to GND at the end of write cycle. The capacitor Cboost is charged to VDD when WS1n is VDD before the write cycle. In a write cycle, WS1n and WS0n are discharged to GND. In high-voltage operation, it needs not too high negative boosting efficiency. WS1n is pulled down to GND, first. The boosting efficiency is not too high. In contrast, in low-voltage operation, it needs high negative boosting efficiency. WS0n is pulled down to GND, first. The boosting efficiency is high. Fig. 2.30 (b) shows the result. The 64Mb SRAM is built from 128\*512Kb Macro in 32nm High-K metal gate SOI process. The operation voltage can be scaled down to 0.7V [2-29] Fig.2. 31 (a) Write driver with boost control (b) Waveform of write cycle [2.29] Fig. 2. 32 (a) Constant-negative-level write-buffer (b) negative BL level [2.31] Fig. 2.32 shows the constant-negative-level write buffer, the charge in C\_boost is proportional to the BL capacitance. It means that the charge in C\_boost is proportional to the cells per bit-line. It generates a constant-negative-level in BL during write operation. The target bias level is -0.15V±0.05V, which is not too negative to hold the data in un-selected cell and not too low to write the data into the selected cell. Fig. 2.31(b) shows the automatically optimized constant negative BL level for 4 cells per bit-line and 512cells per bit-line [2.31] #### 2.6.3 Voltage modulation Read/Write Assist Circuit The basic concept of voltage modulation R/W Assist Circuit is that we can boost WL voltage or suppress the VDD voltage of the cell array to enhance the write-ability. In contrast, we can suppress the WL voltage or boost the VDD voltage of the cell array to enhance the read-stability. Fig. 2.33 shows level-programmable word-line driver (LPWD) and dynamic array supply control which can tune the WL voltage and array supply voltage in read and write cycle, respectively [2.32]. In single-ended 8T SRAM, due to the separation of read port and write port, we can boost RWL in read cycle and boost WWL in write cycle, respectively, as shown in Fig. 2.34 (a). It also use write-back scheme to solve the half-select read disturb problem. A 64Kb 8T-SRAM is designed in 90nm process. VDDmin can reach to 0.42V and area overhead is 8.5% [2.33] Fig.2. 33 Level-programmable word-line driver (LPWD) and dynamic array supply Fig.2. 34 (a) Boost RWL and WWL in read/write cycle (b) Write back scheme [2.33] Fig. 2.35 shows the R/W assist circuits, respectively. Fig. 2.35 (a) shows the WL under-drive circuit to improve the read stability. Fig. 2.35 (b) shows the write assist circuit that can lower the voltage of SRAM array supply during write cycle. A 512Kb conventional 6T SRAM array is designed in 45nm bulk process. The VDDmin is improved from1.13V to 0.96V [2.34]. Due to suppressing the VDD source of WL driver, the rise time of WL is slow in Fig. 2.35(a). Fig. 2.36(a) shows a modified WL under-drive circuit. The rise time of Fig. 2.36 (a) is 60% faster than Fig. 2.35(a) [2.31]. Fig. 2. 35 (a) Word-line under-drive circuit (b) Write assist circuit [2.34] Fig.2. 36 (a) Modified Word-line under-drive (WLUD) circuit (b) Rise time of WLUD circuit [2.31] Fig.2. 37 The adaptive dynamic word-line under-drive circuit (ADWLUD) [2.35] Fig. 2.37 shows another way of word-line under-drive circuit called as the adaptive dynamic word-line under-drive (ADWLUD) circuit which consists of WLUD module, controller and 6T SRAM bit-cell based sensor. It can adaptively tune the strength of WLUD circuit on different PVT condition. Conventional 6T cell based Vtp/Vtn sensor will adaptively track the different PVT condition to generate the Vsensor. Vsensor is compared to Vref1 and Vref2. If Vsensor < Vref1, the small PMOS P3 is turn on for applying a strong WLUD. If Vsensor < Vref2 the large PMOS P4 is turn on for applying a weak WLUD. The reference voltage generation circuit consists of a resistive divider with a multiplexer for controller calibration. A 3.4Mb SRAM macro is designed in 32nm high-k metal gate process. The VDDmin is improved by 130mV [2.35]. Fig. 2.38 shows the schematic and waveform of multi-step word-line control technology. This technique improves not only SNM and also write-ability. First we pull up P1 and P2 to VDD by asserting the CLK\_WL. The WL capacitance C is gradually charged to VDD. The slow WL rise improve the SNM of half-select cell by suppressing the read-disturb voltage. After WL voltage is reached to 1.0V, CLK\_PU is activated and the WL capacitance C is boosted to 1.1V with the pumping capacitor Cp. The write-back speed of half-select cell can be speed up. We need to determine the PMOS resistor R and the pumping capacitor Cp. If R is too small, the worst SNM of half-select cell can't retain the data because WL voltage will reach 1.0V before BL voltage is discharged to a level at which SNM will remain true. If R is too big, the operating speed of SRAM will be too late. Cp is determined that can generate a sufficiently high voltage for stable write operations. [2.36] Fig.2. 38 Multi-step word-line control technology for word-line drover [2.36] ### 2.7 Summary This chapter first introduces the structure and basic operation of conventional 6T SRAM and then introduces the basic concept and measurement of stability and write-ability in SRAM bit-cell. Power dissipation, such as dynamic power dissipation, leakage power dissipation and short-circuit power dissipation are discussed. We know that leakage power dissipation becomes a critical issue of SRAM design in advanced process. Some alternative SRAM cells in low-voltage consisted of 8T or 10T are also introduced in this chapter. Finally we introduce some read/write assist circuit to improve the cell stability, read/write speed and write ability in low-voltage SRAM design. The R/W assist circuitry is usually based on the concept of negative-biased voltage and voltage modulation on BL/WL voltage, the VDD or VSS of the cell array. Obviously the conventional 6T SRAM can't satisfy the demands in low-voltage region. Following chapter will present an 8T SRAM bit-cell with data-aware write-assist topology and ripple BL sensing scheme and local BL keeper design. ## Chapter 3 # A Low VDD<sub>MIN</sub> Date-aware Write-Assist 8T SRAM with Adaptive Write-time Tracing Replica Circuit #### 3.1 Introduction As we discussed in Chap 1 and Chap 2, conventional 6T is not suitable in nano-scale process due to the critical variation of process and temperature. In [3.1], authors point out that the area of 6T bit-cell will be larger than 8T bit-cell in advanced process because we need large beta and gamma ratio to cope with process and temperature variation, as shown in Fig. 3.1. Conventional 6T is not suitable in low-voltage region due to read-disturb problem, half-select disturb problem and not sufficient SNM. It is difficult to lower the VCC<sub>MIN</sub> of conventional 6T bit-cell. Consequently, it is necessary to find an alternative SRAM bit-cell topology to work in advanced process and low-voltage Fig.3. 1 Minimum area comparison between 6T and 8T cells [3.1] In this chapter, an 8T SRAM bit-cell with data-aware write-assist (DAWA) will be presented. Cell structure and basic operation will be introduced, first. And then the cell-stability and write-ability of this DAWA 8T cell will be discussed in detail. The architecture of this 8T cell-array will be presented. Finally an adaptive write-time tracing replica circuit, adaptive VVSS driver and WWL driver using in this 8T cell will be presented. Following simulation and analysis are based on UMC 40nm LP process. This projected are discussed with supported by professor Ching-Te Chuang of Digital VLSI Lab, Hao-I Yang of LPMD Lab and the IPD department of *Faraday Technology Corporation*. #### 3.2 Cell Structure and Basic Operation of Cell Fig.3. 2 Schematic of DAWA 8T cell Fig. 3.2 shows the schematic of 8T cell with data-aware write-assist (DAWA) scheme, this cell have outer layer pass-gate *MR1* and inner layer pass-gates *MS1/MS2*. Outer pass-gate MRI is controlled by row-based WL. Inner pass-gates MSI and MS2 are controlled by column-based signal WWL and WWLB, respectively. WWL, WWLB and VVSS are determined by input data when write operation. M1 and M2 are power switches of this 8T cell which are shared by several cells per column. There is only one BL in this 8T cell, so this cell is single-port structure that can reduce power consumption. In addition to data-aware write-assist scheme, this 8T cell uses High- $V_T$ PMOS to weaken PMOS transistors of cross-coupled inverters, improving the write-ability. Furthermore the High- $V_T$ PMOS can reduce the leakage current in bit-cell. In summary, this 8T bit-cell is Dual- $V_T$ and single-port bit-cell. The layout views of this 8T cell in UMC 40nm LP process is in Fig. 3.3. The column-based signal such as WWL, VVSS, VVDDI, VVDD2, BL and WWLB are routed in metal 2. The row-based signal such as WL, GND and inter-node NI are routed in metal 3. One 8T bit-cell size in UMC 40nm LP process is $1.44\mu m \times 0.59\mu m = 0.85\mu m^2$ Fig.3. 3 Layout view of DAWA 8T SRAM bit-cell (exclude M1 and M2) #### 3.2.1 Pre-charged / Stand-by Mode In pre-charged or stand-by mode, since *WL*, *WWL* and *WWLB* are turned off the inner pass-gates *MS1/MS2* and outer pass-gates *MR1* are turned off, improving the stability of the bit-cell, as shown in Fig. 3.4. Since *WWL* & *WWLB* are logic "0", both power switch PMOS transistors *M1/M2* are turned on, keeping *VVDD1* & *VVDD2* on full VDD. In pre-charged mode, the voltage level of *BL* is VDD and the voltage level of *VVSS* is GND during pre-charge or stand-by operation Fig.3. 4 Pre-charged / stand-by mode of DAWA 8T bit-cell #### 3.2.2 Read Mode In read operation, *BL* is pre-charged to VDD in advanced. The voltage of *VVSS* is GND. Column-based signal *WWL* and *WWLB* are GND. The row-based signal *WL* is pulled-up to VDD so that the outer pass-gate *MR1* is turned on. *BL* is kept at VDD if *QB* is logic "0" and discharged to GND through *MR1* and *MR2* if *QB* is logic "1". Read operation in the DAWA 8T bit-cell is same as conventional 8T bit-cell. Read disturb is eliminated. Both power switch PMOS transistors *M1/M2* are turned on, keeping *VVDD1 & VVDD2* on full VDD. Fig. 3.5 shows the read operation. Fig.3. 5 Read mode of DAWA 8T bit-cell 1896 #### 3.2.3 Data-aware Write-assist Write Mode In write operation, *BL* is discharged to GND first. The row-based signal *WL* is pulled-up to logic "1" so that the outer pass-gate *MR1* is turned on. The column-based signal either *WWL* or *WWLB* is pulled-up to VDD determined by the data which we want to write into the selected cell. When we want to write logic "1" into the cell node, *WWL* is pulled-up to VDD. When *WWL* is logic "1", the power switch *M1* is turned off, making the node *VVDD1* floating. Floating *VVDD1* weakens PMOS transistors MP2 and then logic "1" is easier to write in the selected cell. In contrast, when we want to write logic "0" into the cell node, *WWLB* is pulled-up to VDD. When *WWLB* is logic "1", the power switch *M2* is turned off, making the node *VVDD2* floating. Floating *VVDD2* weakens PMOS transistors *MP1* and then logic "0" is easier to write in the selected cell. This is data-aware write-assist write operation. Next we consider the column-based signal *VVSS*. In write "1" operation, if *VVSS* is logic "0", disturb happened on the column half-selected cells whose cell node *QB* store data "1" in the same column. The discharging path is through from *QB*, *MS2* and *MR2* to *VVSS* and then the logic "1" in *QB* of the column half-selected cells may be discharged to logic "0". In contrast, in write "0" operation, if *VVSS* is logic "1", disturb happened on the column half-selected cells whose cell node *Q* store data "0" in the same column. The discharging path is through *VVSS*, *MR2* and *MS1* to *Q* and then the logic "0" in Q of the column half-selected cells may be charged to logic "1". As shown in Fig 3.6. Thus we must set *VVSS* to logic "1" when write "1" operation and set *VVSS* to logic "0" when write "0" operation. Fig. 3.7 show the write "1" operation and write "0" operation when *VVSS* is connected to logic "1" and logic "0", respectively. In addition, Write "1" is more critical than write "0" Fig.3. 6 Column half-select disturb in write operation Fig.3. 7 Write "1" mode and write "0" mode Fig. 3.8 shows the control signal waveform of read, write and stand-by signal, Table 3.1 list what each signal is during different operation. Due to the read buffer decoupling cell node from *BL*, the read-disturb can be eliminated; due to the cross-coupled cell topology, the half-select disturb on the same row can be eliminated, too. In cross-coupled cell topology, we can only write data into the selected cell whose row-based *WL* and column-based *WWL/WWLB* are turned on simultaneously. So we can apply bit-interleaving structure which can solve soft-error. In summary, we can use this DAWA 8T in low-voltage region, lowering the VDD<sub>MIN</sub>, as shown in Fig 3.9. However, because of data-aware write-assist scheme, the SNM of column half-selected cell will be degraded due to the difference between *VVDD1* and *VVDD2*, as shown in Fig. 3.7 We will discuss this topic in Chapter 3.4 Fig.3. 8 Control signal waveforms Fig.3. 9 Cell array structure Table.3. 1 Basic operation of LOW V<sub>DDMIN</sub> DAWA 8T | | STBY | Read | Write "0" | Write "1" | |------|------|----------|---------------------------------|---------------------| | WL | 0 | 1 | 1 | 1 | | WWL | 0 | 0 | 0 | 1 | | WWLB | 0 | 0 | 1 | 0 | | RBL | 1 | floating | 0 | 0 | | VVSS | 0 | 0 | 0 | 1 | | VDD1 | VDD | VDD | VDD | <vdd< td=""></vdd<> | | VDD2 | VDD | VDD | <vdd< td=""><td>VDD</td></vdd<> | VDD | #### 3.3 Cell Stability ## 3.3.1 Cell-stability in Hold / Read Operation In this chapter, we define the static-noise-margin (SNM) as cell stability and write-trip-point (WTP) as write ability. The definition and measurement of SNM and WTP are discussed in Chap. 2.3. Fig. 3.10 shows the Read SNM of this DAWA 8T cell in different PVT Condition. Fig. 3.10 (a) shows the SNM in high-voltage region (VDD=1.1V) and Fig. 3.10 (b) shows the SNM in low-voltage region (VDD=0.6V). Because of read buffer in this DAWA 8T cell, there is no read-disturb and Read SNM is same as Hold SNM. Read SNM doesn't degrade when WL is turned on like as conventional 6T bit-cell. In Fig. 3.10, obviously we can find that the worst process corner and temperature of RSNM is PSNF corner & 125°C. Fig. 3.11 shows the Hold SNM and Read SNM of conventional 6T bit-cell and DAWA 8T bit-cell. *Conv.6T* is from UMC 0.303μm² and DAWA 8T is from the bit-cell which we proposed in Chap 3.2 with size tuning. From Fig. 3.11, we can find that Conventional 6T bit-cell has about 1.2X hold SNM compared to DAWA 8T bit-cell but DAWA 8T bit-cell has about 2.6X Read SNM compared to conventional 6T bit-cell at PFNS & 125°C, which is the worst case of Read SNM. Fig.3. 10 RSNM in different PVT Condition (a) VDD=1.1V (b) VDD=0.6V Fig.3. 11 Comparison of SNM between Conventional 6T and DAWA 8T #### 3.3.2 Write-ability (WTP) Fig. 3.12 shows the WTP of this DAWA 8T cell in different PVT Condition and VVDD1/VVDD2 are full VDD. We consider write "1" condition because write "1" is more critical than write "0, as shown in Fig. 3.7. Fig. 3.12 (a) shows the WTP in high-voltage region (VDD=1.1V) and Fig. 3.10 (b) shows WTP in low-voltage region (VDD=0.6V). Obviously we can find that the worst process corner and temperature of WTP is PFNS & -40°C. The WTP is negative in PFNS & -40°C below VDD=0.8V. Fig.3. 12 WTP in different PVT condition (a) VDD=1.1V (b) VDD=0.6V Fig.3. 13 Comparison of WTP between Conventional 6T and DAWA 8T Because of two layer pass-gates the WTP of DAWA 8T bit-cell is worse than conventional 6T bit-cell. Fig. 3.13 shows the WTP of conventional 6T bit-cell and DAWA 8T bit-cell. *Conv.* 6T is from UMC 0.303μm<sup>2</sup> and DAWA 8T whose *VVDD1=VVDD2=*VDD and DAWA 8T whose *VVDD1* = (90, 85, 80) % \* *VVDD2* = (90, 85, 80) % \* VDD. We can see that conventional 6T bit-cell has about 2~3X WTP compared to DAWA 8T bit-cell whose *VVDD1=VVDD2=VDD*. We also can see that data-aware write-scheme can significantly improve the write ability, especially in low-voltage region. #### 3.3.3 Cell-stability of Column Half-select Bit-cell Fig.3. 14 The worst cast of local VT shift (a) Column HS-cell (b) Write In data-aware write-assist scheme, the column half-select cells suffer from the degradation of cell-stability due to the difference between VVDD1 and VVDD2. VVDD1 should be as low as possible that can improve the write-ability but should be as high as possible than can improve the cell-stability of the column half-select cells in the same column. So if there is an appropriate voltage level that can meet the requirement of successful write and maintaining the storage data of column half-select bit-cells is an important issue. In advanced process, we should consider the local $V_T$ shift. Fig. 3.14 (a) is the worst case of cell-stability in local $V_T$ mismatch situation and Fig. 3.14(b) is the worst case of write-ability in local $V_T$ mismatch situation (F: fast, S:slow). In this simulation we consider $3*\sigma V_{TH} \sim 90$ mV. First we define VDD\_W is the maximum VVDD1 that we can have a successful write in Fig. 3.14(b), considering the worst case of $3\sigma V_T$ local $V_T$ mismatch. Next we define $VDD_R$ is the minimum VVDD1 that the column half-select bit-cells can maintain the storage data in Fig. 3.14(a) considering the worst case of $3\sigma V_T$ local $V_T$ mismatch. Finally we define $VSM = VDD_W - VDD_R$ . If VSM > 0, it means that here is an appropriate voltage level that can meet the requirement of successful write and maintaining the storage data of column half-select bit-cells even in the worst case of 3 sigma local $V_T$ mismatch. Fig. 3.15 shows the VSM in different PVT condition. We can see that VSM >0 in wide range VDD (1.2V $\sim$ 0.6V) in different process corner and temperature, considering the worst 3 sigma local $V_T$ mismatch of write and hold case except PFNS corner. Fig.3. 15 VSM in different PVT condition (a) SS corner (b) TT corner (c) FF corner (d) #### 3.4 Adaptive Write-time Tracing Replica Circuit In advanced process SRAM circuit design, we often use replica circuitry to trace the read-time or write time, determining the pulse width of word-line that can solve the misreading issue or reduce the power consumption [3.5] [3.6]. Since we use data-aware write-assist scheme in the proposed 8T cell, *VVDD1* will be floating when *WWL* is turned on and *VVDD2* will be floating when *WWLB* is turned on. If the pulse width of *WWL* and *WWLB* is too wide, the difference of voltage level between *VVDD1* and *VVDD2* will too large to maintain the storage data of column half-select bit-cells. In this scheme we will trace the write-time in different PVT condition to control the pulse width of column-based signal *WWL*, *WWLB* and *VVSS*. Fig. 3. 16 Adaptive write-time tracing replica circuit Fig. 3.16 shows the adaptive write-time tracing replica circuit. It works in write-cycles. In this scheme we will trace the time of write "1" operation because write "1" is more critical than write "0". In stand-by mode the node Q of the tracing cell will be initialized to 0, preparing to trace the time of write "1" operation. WLE is positive clock edge-triggered, determined by external CLK signal. When external CLK signal is pulled up to logic "1", WLE is pulled up to logic "1", too. WWL\_EN is logic "1" in a write cycle and "0" in a read cycle. When WLE and WWL EN are logic "1", it means that a write cycle happens. WWL\_Enable signal is pulled up to logic "1" after WLE and WWL\_EN are asserted to logic "1", enabling the WWL Driver to generated column-based signal WWL, WWLB and VVSS determined by DI signal and col\_EN signal. We trace the write "1" time of the tracing cell in the Replica Column. The number of bit-cells on the Replica Column is same as the number of bit-cells on one local bit-line. If the QB of the tracing cell is flipped to logic "0", we send a signal to WWL Driver which can disable the WWL Driver and turn off column-based signal WWL, WWLB and VVSS. The pulse width of WWL, WWLB and VVSS can be adaptively tuned in different PVT condition. Transistors M1~M6 is word-line under-drive circuit controlled by external signal W0 and W1. These transistors can tune the voltage level of Dummy\_WL that can tune the write-ability of the tracing cell. If [W1,W0] = [0,0], the write-ability of the tracing cell is the best and the write-time is the shortest, generating shorter pulse width of WWL, WWLB and VVSS. In contrast, if [W1,W0] = [1,1], the write-ability of the tracing cell is the worst and the write-time is longer, generating the longest pulse width of WWL, WWLB and VVSS. We can use these transistors to cope with the issue of local $V_T$ mismatch #### 3.5 Adaptive VVSS Driver and WWL Driver Fig.3. 17 Adaptive VVSS driver and WWL driver Fig. 3.17 shows the adaptive VVSS driver and WWL driver. When write "0" operation, $D\_inb$ is logic "1", once $WWL\_Enable$ is pulled-up to VDD, WWLB is pulled-up to VDD. $WWL\_Enable$ is pulled down to GND once the QB of tracing cell is flipped to logic "0". WWLB is pulled down to GND after $WWL\_Enable$ is pulled down to GND. When write "1" operation, $D\_in$ is logic "1"and once $WWL\_Enable$ is pulled-up to VDD, VVSS is pulled up to VDD. WWL is pulled up to VDD after VVSS is pulled up to VDD. $WWL\_Enable$ is pulled down to GND once the QB of tracing cell is flipped to logic "0", too. When $WWL\_Enable$ is pulled down to GND, first WWL is pulled down to GND and then VVSS is pulled down to GND. When write "1" operation, the pulse-width of VVSS is wider than the pulse-width of WWL, eliminating the column half-select disturb as we shown in Fig. 3.6. Fig. 3.18 shows the waveform of adaptive write-time tracing replica circuit, adaptive VVSS driver and WWL driver. Fig.3. 18 Waveform of adaptive write-time tracing circuit, adaptive VVSS driver and WWL driver #### 3.6 Simulation Result Fig. 3.19 shows the pulse width of $WWL\_EN$ signal sending to adaptive VVSS driver and WWL driver which are generated by adaptive write-time tracing circuit in different PVT condition and different value of [W1,W0]. We can see that when [W1,W0] = [0,0], the pulse width of WWL\_EN signal is the narrowest. In contrast, we can see that when [W1,W0] = [1,1], the pulse width of $WWL\_EN$ signal is the widest. Consequently, the external DC signal W0 and W1 can help us to tune the pulse width of $WWL\_EN$ signal. Fig.3. 19 Pulse width of *WWL\_EN* in different PVT condition (a) VDD=1.1V (b) VDD=0.6V Fig. 3.20 shows the pulse width of WWL, WWLB and VVSS which are generated by adaptive write-time tracing circuit in different PVT condition. We can see that the pulse width of WWL, WWLB and VVSS in SS corner and PFNS corner are wider than the other process corners, especially in low-voltage region because the timing delay of SS corner is the slowest and the write-ability of bit-cell is the most critical of all process corners. Fig.3. 20 *WWL*, *WWLB* and *VVSS* pulse width generated by adaptive write-time tracing circuit in different PVT condition (a) VDD=1.1V (b) VDD=0.6V #### 3.7 Summary In this chapter, first we introduced the data-aware write-assist (DAWA) 8T SRAM bit cell, including the cell topology and the basic operation. Next we discussed the cell-stability and write-ability of DAWA 8T bit-cell. Because of read buffer and two layer pass-gates controlled by row-based *WL* signal and column-based *WWL/WWLB* signal, this DAWA 8T bit-cell can eliminate the read-disturb and row half-select disturb and support bit-interleaving structure. The Read SNM of the DAWA 8T bit-cell is same as Hold SNM of conventional 6T bit-cell. But due to two layer pass-gates, the write ability of DAWA 8T bit-cell is worse than conventional 6T bit-cell when the voltage level of power supply of two cross-coupled inverters are identical. Data-aware write-assist scheme significantly improves the write-ability. The cell-stability of column half-select cells is also discussed considering the local V<sub>T</sub> mismatch. An adaptive write-time tracing replica circuit is shown in Chap 3.4 which can control the pulse width of *WWL*, *WWLB* and *VVSS*. Chap. 3.5 shows the adaptive VVSS driver and WWL driver which can eliminating the column half-select write disturb. Chap. 3.6 shows the simulation result. In summary, this DAWA 8T cell can operate in wide-range operating voltage (VDD=1.2V-0.45V). # **Chapter 4** # Ripple Bit-line Read Scheme with Local Bit-line Keeper Design #### 4.1 Introduction In this chapter, we will introduce the read scheme in proposed DAWA 8T SRAM. First it will introduce a prior art of cascaded bit-line scheme that don't need the extra metal layer of global bit-lines in Chap 4.2. In Chap 4.3, we proposed a ripple bit-line scheme with hierarchical global bit-line which can reduce the parasitic capacitor and leakage current of global bit-line. Leakage current replica keeper will be used in the multiplexer of ripple bit-line data-sensing scheme. Since bit-line leakage is become one of the most important issue of SRAM circuit design in advanced process, leakage current problem in SRAM design should be discussed. It will introduce bit-line leakage current reduction by recently modified cell topology or some peripheral circuit in Chap 4.4. Some recent bit-line keeper design to compensate the leakage current will be discussed in Chap 4.4, too. In Chap 4.5 a local bit-line keeper design will be proposed. Following simulation and analysis are based on UMC 40nm LP process. This projected are discussed with supported by professor Ching-Te Chuang of Digital VLSI Lab, Hao-I Yang of LPMD Lab and the IPD department of Faraday Technology Corporation. Also this design was taped out in June, 2011 supported by Faraday Technology Corporation. #### 4.2 Prior Art - Cascaded Bit-line Read Scheme Fig.4. 1 (a) Hierarchical bit-line scheme (b) Cascaded bit-line scheme [4.1] Fig. 4.1 shows two kinds of bit-line read scheme. Fig. 4.1 (a) shows the conditional hierarchical bit-line read scheme. Fig. 4.1 (b) shows the cascaded bit-line read scheme. In hierarchical bit-line read scheme, the sensing inverters transfer the sensing data from local bit-line to global bit-line. In ultra-high-density cells design, global bit-lines need an additional metal layer because there is no space to place the global bit-lines whose layer is same as local bit-lines. Fig. 4.1 (b) shows the cascaded bit-line read scheme. Bit-lines of each sub-array are connected by the *capacitance separator*. Data are transferred through cascaded bit-line either in read operation or in write operation. Because of no requirement of global bit-lines, we don't need an additional metal layer to lay global bit-lines in ultra-high-density cells design. Fig. 4.2 shows the schematic and waveform of cascade bit-line read scheme. In read cycle the cell data will be latched in the nearest sense amplifier and then transferred to the adjacent local bit-lines. The cell data will be transferred in this way to I/O buffer. In contrast, in write cycle, data will be transferred from I/O buffer to local bit-lines and sense amplifier work as a buffer connecting the adjacent local bit-lines. [4.1] Fig.4. 2 (a) Schematic of cascaded bit-line read scheme (b) Waveform [4.1] ## 4.3 Ripple Bit-line Read Scheme Multiplexer #### 4.3.1 Local Evaluation Circuit 32 bits per LBL 32 bits per LBL 896 LEV\_UP LEV\_UP 32 bits per LBL LReset\_wi+1 32 bits per LBL GBL LEV\_DN LEV\_DN Ripple buffer 32 bits per LBL 32 bits per LBL LEV\_DN LEV\_DN LBL<sub>i+1</sub> 32 bits per LBL 32 bits per LBL LEV\_DN LEV\_DN LEV\_DN LBL\_out[0] LBL\_out[15] Multiplexer Converts data to GBL LReset\_w<sub>i+</sub> Col\_en[15] Col\_en[0] Co<u>l\_en</u> Ripple Serveral sets buffer Multiplexer Converts data to GBL LEV\_DN Fig. 4. 3 (a) Local Evaluation Circuit (b) Ripple Bit-line read scheme Col\_en[0] Col\_en[15] (b) I/O buffer Fig. 4.3 (a) is the schematic of ripple buffer and local-evaluation (LEV) in Ripple bit-line read scheme. This circuit is composed of a pre-charged PMOS transistor, a push-pull inverter in ripple buffer which transfers the previous stage data to next-stage local bit-line and two cascaded NMOS transistors for write operation. We have two type of pre-charged circuit (LEV\_UP & LEV\_DN in Fig. 4.3) because the pre-charged signal for each bit-line segment must refer to the previous stage data output. In stand-by mode, all of *LReset* are set to logic "0" and each segment of local bit-lines is pre-charged to VDD. In each read cycle we select one of local bit-lines and the signal LReset which is connected to the selected local bit-lines are pulled-up to logic "1", making the local bit-lines floating. According to storage data of the selected cell, LBL will be kept on VDD when the storage data of the selected cell is logic "1" and will be discharged to GND when the storage data of the selected cell is logic "0". If we want to transfer "logic 1" out, push-pull inverter in ripple-buffer will cut-off the data transformation. In the long run we transfer logic "0" into the multiplexer. On the other hand if we want to transfer "logic 0" out, ripple buffer can discharge the local bit-line of next adjacent segment to GND. Finally we transfer "logic 1" into the multiplexer. In write operation, due to the structure of DAWA 8T SRAM, *LBL* should be discharged to GND whether we want to write "logic 1" or "logic 0" into the selected bit-cell. In this local evaluation circuit, if the column is selected for write operation, both signal *LReset\_w* and *col\_en* are pulled up to VDD and selected local bit-line is discharged to GND. Bit-Interleaving structure can be adapted in this scheme. The bit-interleaving multiplexer is describe in Chap 4.3.2 Fig. 4.4 shows the area overhead and access time in the ripple bit-line read scheme in SS corner & -40°C, which is the worst case of access time. We want to find an optimized local bit-line length between the demand of area overhead and access time. Obviously we can find that the optimized local bit-line length is 32 bit-cells per local bit-line and then we decide the local bit-line length is this scheme is 32. Fig.4. 4 Access time and area overhead vs. LBL length #### 4.3.2 Multiplexer with Leakage Current Replica Keeper Fig. 4.5 shows the multiplexer with leakage current replica keeper. The multiplexer can support bit-interleaving structure. As shown in Fig, 4.5, it is a bit-interleaving-16 structure. It means that each 16 columns will transfer one bit to the I/O buffer. In this multiplexer, according to the signal $col\_en$ [0] $\sim col\_en$ [15], we can determine which data we want to transfer to GBL from $LBL\_out$ [0] to $LBL\_out$ [15]. In stand-by mode the signal PCGBL is logic "0" and then $mux\_out$ is pre-charged to VDD. GBL is also pre-charged to VDD in stand-by mode (the pre-charged PMOS transistor is not here). In read operation, the signal WE is logic "0". If both signal $col\_en[n]$ and $LBL\_out[n]$ are pulled up to VDD, the node mux\\_out is discharged to GND and then the MI discharge GBL to GND, transferring logic "0" to I/O buffer. In contrast, if the signal $col\_en[n]$ is pulled up to VDD but $LBL\_out[n]$ is GND, the GBL is kept on VDD, transferring logic "1" to I/O buffer. Fig. 4. 5 Bit-interleaving multiplexer with leakage current replica keeper Next we discuss about leakage current problem. If the leakage current on mux\_out is too large, it might incorrectly discharge *GBL* to GND and transfer logic "0" into I/O buffer in read-1 operation. To cope with leakage current problem, we implement transistors *MK3* and *MK4* as keeper circuit. The keeper circuit can ensure that *mux\_out* is kept on VDD in read-1 operation. The gate of PMOS transistor MK4 is connected to the feedback of the node mux\_out and the voltage level of gate of PMOS transistor MK3 is determined by Leakage Current Replica circuit. Another two cascaded PMOS transistors *MK1* and *MK2* are the keeper circuit of *GBL*. Similarly the gate of PMOS transistor *MK2* is connected to the feedback of *GBL* and the voltage level of gate of PMOS transistor *MK1* is determined by Leakage Current Replica circuit. The basic concept of Leakage Current Replica (LCR) circuit is shown in Fig. 4.6 [4.3]. This LCR keeper circuit must source enough current to compensate the leakage current in the N-fast corner and read-0 operation. In contrast, the LCR keeper circuit must be weak enough so mux\_out can be correctly discharge to GND in the N-slow corner and read-1 operation. Fig. 4.7 shows the voltage level of *V\_KPR* in different PVT condition. Fig.4. 6 Basic concept of leakage current replica (LCR) keeper [4.3] Fig.4. 7 Voltage level of V KPR in different PVT condition Next we consider the number of multiplexer in SRAM array design. Fig. 4.8 shows the number of multiplexer depending on area overhead and access time in SS corner & -40°C, which is the worst case of access time. Local bit-line length is fixed to 32 bit-cells per local bit line. As shown in Fig. 4.8, we can find that there is a multiplexer per 64 bit-cells in a column is optimized point and per 128 bit-cells in a column is near by the optimized point. For less area overhead, we make a decision that there is a multiplexer per 128 bit-cells in same column, as shown in Fig. 4.3 (b). Fig.4. 8 Access time and area overhead depending on # of bit-cells in a column per multiplexer (a) VDD=1.1V (b) VDD=0.6V #### 4.3.3 Simulation results Table 4.1 and 4.2 show **sensing time** and **misreading time** in previous mentioned multiplexers with LCR keeper. We define **sensing time** as the time from *PCGBL* is pulled-up to VDD to *GBL* is discharged to GND when both *col\_en[n]* and *LBL\_out [n]* is logic "1" (read-0 operation). We also define **misreading time** as the time from *PCGBL* is pulled-up to VDD to *GBL* is incorrectly discharged to GND when *col\_en[n]* is logic "1" but *LBL\_out [n]* is logic "0" (read-1 operation). Both cases consider the worst case of 3 sigma local V<sub>T</sub> mismatch. From Table 4.2, we know that it can easily distinguish sensing time and misreading time even in FF or PSNF corner so we can ensure the correctness of read operation. Fig. 4.9 shows the waveform of the ripple bit-line read scheme and multiplexer with LCR keeper Table.4. 1 Sensing time in SS corner | Operating Voltage (V) | Sensing time (125°C) (ns) | Sensing time (-40°C) (ns) | |-----------------------|---------------------------|---------------------------| | 1.2 | 0.43 | 0.41 | | 1.1 | 0.48 | 0.49 | | 1.0 | 0.57 | 0.70 | | 0.9 | 0.79 | 1.36 | | 0.8 | 1.37 | 4.16 | | 0.7 | 3.30 | 20.5 | | 0.6 | 11.2 | 198 | Table.4. 2 Misreading time vs. sensing time in FF & PSNF corner | Operating | Sensing | Misreading | Sensing | Misreading | |-------------|--------------|--------------|--------------|--------------| | Voltage (V) | time | time | time | time | | | (PFNF 125°C) | (PFNF 125°C) | (PSNF 125°C) | (PSNF 125°C) | | | (ns) | (ns) | (ns) | (ns) | | 1.2 | 0.36 | 11.5 | 0.38 | 16.5 | | 1.1 | 0.38 | 11.3 | 0.40 | 16.6 | | 1.0 | 0.42 | 11.0 | 0.44 | 16.6 | | 0.9 | 0.47 | 10.7 | 0.51 | 16.3 | | 0.8 | 0.59 | 10.2 | 0.66 | 15.9 | | 0.7 | 0.88 | 9.62 | 1.03 | 15.4 | | 0.6 | 1.76 | 9.04 | 2.20 | 14.9 | Fig.4. 9 Waveform of the ripple BL read scheme and multiplexer ### 4.4 Bit-line Leakage Current in SRAM Design As we discussed in Chap 2, leakage current problem becomes more and more critical in nano-scale SRAM design. Furthermore, at low-voltage region, the $I_{on}/I_{off}$ ratio of transistors decreases exponentially and that may let total bit-line leakage current be larger than read current and then cause the incorrectly read operation. In SRAM array circuit, leakage power consumption becomes the major part of total power consumption because inactive bit-cells are much more than active bit-cells. Recently, some modified SRAM bit-cells that modify the read buffer of bit-cell, such as using stack structure to reduce the leakage current [4.4] or using an additional PMOS in read buffer to create a data-independent leakage path [4.5]. Another modified SRAM bit-cell separate read port and write port and super cut-off scheme to reduce the leakage current [4.6]. VGND scheme which is used in read buffer is also an effective way to reduce bit-line leakage current [4.7] [4.8], as shown in Fig. 4.10. Fig.4. 10VGND scheme in read buffer [4.8] Another way to cope with the bit-line leakage current is adding keeper circuit connected to the local bit-line. Fig. 4.11 shows the typical domino keeper using in the single-ended 8T bit-cells. It uses a PMOS transistor as keeper circuit whose gate is connected to the feedback of local bit-line. Some keeper circuitry use two cascade PMOS transistors, one's gate is connected to the feedback of local bit-line and another's gate is connected to a programmable delay signal[4.11]. The timing-delay signal can be generated by programmable inverter-chain or another ways, as shown in Fig. 4.12 [4.12]. Fig.4. 11 Domino local bit-line Keeper in single-ended 8T SRAM [4.11] Fig.4. 12 Local BL keeper controlled by programmable inverter-chain [4.12] Fig. 4.13 shows positive feedback sensing keeper design for single-ended read scheme. In stand-by mode, SA\_EN is logic "1" and then M2 is turn on and M3 is turn off. The positive feedback loop is cut off when M3 is turn off. In read operation, *SA\_En* is logic "0". M1 is turn on to sense the bit-line signal. When the bit-line is pulled down to GND, M3 is off and positive feedback loop is cut off. In contrast, when the bit-line is kept on VDD, M3 is turn on to enable the positive feedback loop. The *SA\_En* is pulled up to VDD at the end of every cycle. Fig.4. 13 Positive feedback sensing keeper [4.13] Fig.4. 14 Marginal bit-line leakage compensation (MBLC) scheme [4.14] Fig. 4.14 shows the marginal bit-line leakage compensation. This scheme generates a compensation current Icmp. The Icmp should be large enough to keep RBL on VDD for the worst case of data-dependence leakage current. In contrast, the Icmp should not be too large to discharge the RBL to logic "0 for the smallest data-dependence leakage current. Icmp is generated by the replica bit-line and then it can decide the signal cmp < 0.3> #### 4.5 Proposed Local Bit-line Keeper Design #### 4.5.1 Basic concept of Proposed Local Bit-line Keeper Fig.4. 15 Basic concept of proposed local bit-line keeper design This chapter will introduce the proposed local bit-line keeper design using in the DAWA 8T SRAM. Fig. 4.15 shows the upper local evaluation circuit (*LEV\_UP*) and lower local evaluation circuit (*LEV\_UP*). The keeper circuit is composed of one sensing inverter whose gate is controlled by the local bit-line signal and two cascaded PMOS transistors *MP1* and *MP2*. The length of PMOS transistors *MP1* and *MP2* is 2X of the minimum length because we don't want the strength of PMOS keeper is too strong to correctly sense the data of selected cells. Another reason to enlarge the length of PMOS transistors *MP1* and *MP2* is that we don't want the local process variation has much impact on this two transistors. The gate of PMOS transistor *MP1* is controlled by the feedback signal of local bit-line and the gate of PMOS transistor *MP2* is controlled by the timing-delay signal *KPR\_SIG*. #### 4.5.2 The Schematic of Proposed Local Bit-line Keeper Fig. 4. 16 Timing-delay signals generated by replica bit-line and discharged path Fig. 4.16 shows how we generate the timing-delay signal *KPR\_SIG* by replica bit-lines and some discharged path. The external DC option bits [K1, K0] can adjust the speed of generating the signal *KPR\_SIG*. There are 32 replica DAWA 8T bit-cells connected to each *Dummy\_LBL*. The row-based signal WL of replica bit-cells is logic "0" and the column-based signal WWL, WWLB and VVSS of replica bit-cells is logic "0", too. The *VVDD1* and *VVDD2* of replica bit-cells are full VDD. We can use replica bit-cells to monitor the bit-line leakage current of unselected bit-cells. In stand-by mode, LReset = 0 and each $Dummy\_LBL$ is pre-charged to VDD and then $KPR\_SIG$ is logic "1", too. The cascaded PMOS transistors MP1 and MP2 are turned off and then there is no compensated current flowing through local bit-line. In read operation, LReset is pulled up to logic "1". If [K1, K0] = [0,0], $Dummy\_LBL$ has no additional discharged path consisted of stack NMOS transistors controlled by the signal KP2, KP1 and KP0. The signal $KPR\_SIG$ becomes to logic "0" only when $Dummy\_LBL$ is discharged to GND by the leakage current of replica bit-cells. The external DC option bits [K1, K0] generate the signal KP2, KP1 and KP0, which can enable the additional discharged path of $Dummy\_LBL$ . In summary, if [K1, K0] = [0,0], the speed of generating $KPR\_SIG$ is slowest which is only determined by the leakage current of $Dummy\_LBL$ . In contrast, if [K1, K0] = [1,1], the speed of generating $KPR\_SIG$ is fastest. We can tune the speed of generating the delay signal by the external DC option bits [K1, K0]. This circuit can adaptively trace the suitable time to turn on the cascaded PMOS transistors MP1 and MP2 and generated the compensated current to local bit-line. #### 4.5.3 Simulation Result Fig. 4.18 shows the speed of generating delay signal $KPR\_SIG$ on different [K1, K0] and in different PVT condition. We can see that the fastest speed of generating delay signal *KPR\_SIG* is in FF corner because leakage current is most critical in FF corner. In contrast, because of less leakage current in SS corner, the slowest speed of generating delay signal *KPR\_SIG* is in SS corner. Fig.4. 17 Time of generating timing-delay signals in different PVT condition (a) VDD=1.1V (b) VDD=0.6V Next we observe the leakage current in the DAWA 8T SRAM bit-cells before and after we add proposed local bit-line keeper. The local bit-line length is 32 bits per local bit-line. We find that the worst case of bit-line leakage current in read-0 operation is all QB nodes of unselected bit-cells are logic "1". The local bit-line will suffer from the most critical leakage current. We define the "leakage time" as the time BL discharging from 100% VDD to 80% VDD in FF corner & 125°C and read-1 operation. We also consider the worst case of 3 sigma local $V_T$ mismatch in the transistors MR1 and MR2. Simulation results are listed in Table 4.3. We can find that after we add the keeper circuit connected to local bit-line and set the external DC option bits [K1, K0] = [0,1], we can ensure leakage time > 100ns even we consider the worst case of 3 sigma local $V_T$ mismatch in the transistors MR1 and MR2 in FF corner & 125°C. Fig.4. 18 Worst case of leakage current problem in local bit-line Table.4. 3 Leakage time in FF corner & 125°C, transient time = 100ns | Operating | Leakage time | Leakage time | Leakage time | Leakage time | |-------------|----------------|----------------|---------------|---------------| | Voltage (V) | (ns) | Worst case of | (ns) | Worst case of | | | Without keeper | local VT | With keeper | local VT | | | | mismatch (ns) | [K1,K0]=[0,1] | mismatch (ns) | | | | Without keeper | | With keeper | | | | | | [K1,K0]=[0,1] | | 1.2 | 13.4 | 1.96 | >100 | >100 | | 1.1 | 13.3 | 1.90 | >100 | >100 | | 1.0 | 13.2 | 1.86 | >100 | >100 | | 0.9 | 13.1 | 1.8144 | >100 | >100 | | 0.8 | 12.9 | 1.74 | >100 | >100 | | 0.7 | 12.6 | 1.67 | >100 | >100 | | 0.6 | 12.1 | 1.57 | >100 | >100 | Finally we discuss the speed of read-0 operation (discharge local bit-line) after we add keeper circuit connected to local bit-line. From Table 4.3, we know that the keeper design can effectively keep the local bit-line on VDD when read-1 operation when we set the external DC option bits [K1, K0] = [0,1]. We define "**read time**" as the time from WL is pulled up to VDD to BL is discharged to GND in read-0 operation. Fig. 4.19 shows the **read time** in [K1, K0] = [0,0] and [K1, K0] = [0,1]. We can see that **read time** does not degrade when [K1, K0] = [0,1]. In summary, this local bit-line keeper design can effectively keep the local bit-line on VDD when read-1 operation and has less impact on the speed of discharging local bit-line when read-0 operation. This local bit-line keeper can ensure correct read-operation. Fig. #### 4.20 shows the basic waveform of proposed local bit-line keeper Fig.4. 19 Read time in different PVT condition (a) VDD=1.1V (b) VDD=0.6V Fig.4. 20 Waveform of proposed local bit-line keeper design #### 4.6 Summary In this chapter, first we introduced pervious cascaded bit-line read scheme and proposed ripple bit-line read scheme in Chap 4.2 and Chap 4.3, respectively. Ripple bit-line read scheme can be easily adapted in single-ended large-signal sensing scheme and simple inverters can serve as local sense-amp, enhancing the leakage and variation tolerance. Ripple bit-line read scheme can also reduce the parasitic capacitance of global bit-line and improve the area efficiency. We also discussed the optimized local bit-line length is 32 bit-cells per local bit-line in proposed DAWA 8T bit-cells. A multiplexer which can support bit-interleaving structure and transfer data from local bit-line to global bit-line and I/O buffer is proposed, too. Leakage current replica keeper is used in the proposed multiplexer to cope with leakage current problem and ensure read operation. Next we introduce some previous local bit-line keeper design using in single-ended sensing scheme in Chap. 4.4. Finally in Chap. 4.5, we introduced the basic concept of proposed local bit-line keeper design and detailed schematic of local bit-line keeper design, including the keeper PMOS transistors and the circuit which can generate programmable delay signal. Simulation results shows that proposed local bit-line keeper can ensure the correctness of read operation. ## Chapter 5 # **Low VDD<sub>MIN</sub> 512Kb 8T SRAM Design in 40nm CMOS process** ### 5.1 Introduction We design a low V<sub>DDMIN</sub> 512Kb 8T SRAM array in this chapter. In Chap 5.2, a 512kb 8T SRAM array is designed using the read/write assist technique we discussed in Chap 3, Chap 4 and Chap 5.3. The floor plan, pin count, pin definition and specification of this 512Kb 8T SRAM array will also be introduced. In Chap 5.3 we will introduce some peripheral circuitry, such as WL pulse width control circuitry, power-gating WL driver, finite state machine, I/O buffer, etc. Chap 5.4 shows the design implement and test flow of this 512Kb 8T SRAM array. Chap 5.5 shows the post-simulation results compared to recently low VDDmin SRAM design and power consumption of this 512Kb 8T SRAM array. Following simulation and analysis are based on UMC 40nm LP process. This projected are discussed with supported by professor Ching-Te Chuang of Digital VLSI Lab, Hao-I Yang of LPMD Lab and the IPD department of *Faraday Technology Corporation*. Also this design was taped out in June, 2011 supported by *Faraday Technology Corporation*. # 5.2 Architecture of Proposed Low VDD<sub>MIN</sub> 512Kb 8T SRAM Fig. 5.1 shows the floorplan of proposed low VDD<sub>MIN</sub> 512Kb 8T SRAM. In nano-scale advanced process, metal line becomes much thinner. It makes the effect of parasitic capacitance and resistance more and more significant, rising the difficulty of SRAM array design. We decide to place I/O buffer circuit (DIDO) in the bottom of proposed low VDD<sub>MIN</sub> 512kb 8T SRAM in order to reduce the difficulty and complexity of metal line routing. We use short local bit-line with ripple bit-line read scheme to cope with bit-line leakage current problem Each 32 bit-cells in the same column need one-bit local evaluation and pre-charged circuit. Each 64 bit-cells in the same column need a one-bit adaptive VVSS/WWL driver and each 128 bit-cells in the same column need a Column Buffer and a Multiplexer. We use hierarchical WL structure as shown in Fig. 20. Global WL Decoder and Local Bank Selection circuit (LBS) select the local bank where we do the read or write operation. Each Local-WL decoder (LDEC) contains 32 power-gating WL drivers to generate row-based WL signal. The length of each global bit-line is 512 bit-cells and the width of each global word-line is 1024 bit-cells so the capacity of the SRAM array is 512Kb. Data-width is 64 bits. Fig. 5. 1 The floorplan of Low VDD<sub>MIN</sub> 512Kb 8T SRAM As shown in Fig. 5.1, the input latch is places in the middle bottom. The input signals, such as address signals A[12:0] and WEB signals are latched in input latch. The data-in signals DI[63:0] are latched in the I/O buffer (DIDO). This SRAM array is enabled by CSB signal. Address signals A[7:4] are decoded to select the local bank, as the blue lines in Fig. 5.1. Address signals A[12:8] are decoded to XP[31:0] to select the word-line in the selected local bank as the green lines in Fig. 5.1. Address signals A[3:0] are decoded to YP[15:0] to select the column, enabling the column-based signal such as WWL, etc, as the purple lines in Fig.5.1. XP signals and YP signals are also gated by timing signal WLE which is generated by CLK signal and CSB signal. The timing signal WLE is generated in local finite-state machine (LFSM). WLE enables row-based signal like as WL and column-based signal like as WWL in the DAWA 8T bit-cells. The pulse width of WLE can be controlled by peripheral circuitry which we will introduce in Chap 5.3. The red-colored area place the replica columns of local bit-line keeper; the blue-colored area place the replica column of WL-pulse-width controller and the green-colored area place the replica column of adaptive write-time tracing circuit. Fig. 5.2 shows the pin count and pin definition of proposed low VDD<sub>MIN</sub> 512Kb 8T SRAM. The specification is listed in Table 5.1 Fig. 5. 2 Pin count and pin definition of proposed Low VDD<sub>MIN</sub> 512Kb 8T SRAM Table.5. 1 The specification of proposed low $VDD_{MIN}$ 512Kb 8T SRAM | Macro Size | 512K bits (8192*64*1) | | |----------------------------|--------------------------------------------------|--| | Process Technology | UMC 40nm Low-power CMOS process | | | Data-width | 64-bit | | | Address | 13-bit | | | Interleaving | 16-bit | | | Local BL length | 32-bit | | | Local WL width | 128-bit | | | Cell size | 1.44μm x 0.59μm=0.85μm <sup>2</sup> (Logic Rule) | | | Chip Size | 0.9932mm <sup>2</sup> | | | Access time @ 1.1V TT 25°C | 1.69ns (post-sim) | | | Cycle time @ 1.1V TT 25°C | 1.99ns (post-sim) | | | Write power @ 1.1V TT 25°C | 13.5μw/MHz | | | Read power @ 1.1V TT 25°C | 6.87μw/MHz | | | VDDmin (post-sim) | 0.45V @ 1.1V TT 25°C (post-sim) | | ## 5.3 Peripheral Circuit ### **5.3.1 Power-gating Word-line Driver** Fig.5. 3 Power-gating word-line driver Fig. 5.3 shows power-gating word-line driver in *XDEC* in Fig. 5.1, if the *XP[n]* and select-bank signal *SELE* are logic "1". Word-line signal *WL[n]* is pulled up to logic "1". When *XP[n]* is pulled-down to logic "0" then *WL[n]* is pulled down to logic "0". Because the last-stage inverter in the word-line driver needs large size, it suffers from a large amount of leakage current power consumption in stand-by mode if we don't use power-gating technique. We use power-gating technique in the last stage large-sized inverter of the word-line driver. Each 32 inverters share a power-gating PMOS transistor *MP1*. Table 5.2 lists the leakage current in stand-by mode and slew-rate in active mode whether we use power-gating technique or not in FF corner, 125°C. In summary, we decide using RVT PMOS transistor in MP1 because the leakage current will reduce about 55X and the slew rate of WL signal will not degrade too much (about 7%). Table.5. 2 Leakage current and slew rate in power-gating driver | FF corner 125°C | Without 18 | 96 With | With | |------------------------|--------------|--------------|--------------| | @ VDD=1.1v | power-gating | power-gating | power-gating | | | | MP1 is RVT | MP1 is HVT | | Leakage current | 427nA | 7.75nA | 1.94nA | | Per WL driver | | | | | Slew rate of WL signal | 106ps | 114ps | 130ps | ### 5.3.2 Finite-state Machine and WL pulse-width Controller Fig. 5.4 shows the finite-state machine (*LFSM* in Fig. 5.1). It can generate the timing signal *WLE* that can enable row-based signal like as *WL* and column-based signal like as *WWL*. When the *CSB* is logic "0", the SRAM array is enabled. Positive-edge trigger of *CLK* signal can generate a pulse at the node *CK2* and then the node *CC* is discharged to GND. Once *CC* is discharged to GND, *WLE* is pulled up to logic "1". *CK\_2* becomes to logic "0" when *CKB* lately becomes to VDD through the inverter-chain. Once *BLT* is logic "0", *CC* is pre-charged to VDD and then *WLE* is pulled-down to GND, disabling the row-based signal like as *WL* and column-based signal like as *WWL*. The signal *BLT* is generated by WL pulse-width controller as shown in Fig. 5.5. Fig. 5.5 shows the WL pulse-width controller. The signal $Dummy\_WLE$ , $Dummy\_GBL\_PC$ , and $Dummy\_XP$ are synchronously controlled by WLE signal. In stand-by mode, WLE signal is logic "0" and $Dummy\_WLE$ , $Dummy\_GBL\_PC$ , and $Dummy\_XP$ are logic "0", too. $Dummy\_LBL$ and BLT are pre-charged to VDD. Once WLE is pulled up to VDD, $Dummy\_WLE$ , $Dummy\_GBL\_PC$ , and $Dummy\_XP$ are synchronously pulled-up to VDD. Once $Dummy\_WL$ is pulled up to logic "1", $Dummy\_LBL$ is discharged to GND by $Dummy\_BLOCK$ and then BLT is discharged to GND. The external DC option bits [C1,C0] can tune the speed of discharging $Dummy\_LBL$ and BLT. If [C1,C0]=[0,0], the speed is slowest. In contrary, if [C1,C0]=[1,1], the speed is fastest. Fig. 5.6 shows the waveform of finite-state machine and Fig. 5.7 show the waveform of WL pulse-width controller Fig.5. 5 WL pulse-width controller Fig.5. 6 Waveform of finite-state machine Fig.5. 7 Waveform of WL pulse-width controller Fig.5. 8 I/O buffer Fig. 5.8 shows the I/O buffer (*DIDO* in Fig. 5.1). This I/O buffer dumps data which are transferred through local bit-line and global bit-line to I/O buffer in read operation and dumps the *DI\_in* data in write operation. In stand-by mode, *WLE* is logic "0" and *GBL* is pre-charged to VDD. In read operation, *WLE* is logic "1" and WE\_3 is logic "0" so NMOS transistors *MN1* and *MN2* are turned off. Data in *GBL* is transferred to the cross-couple NAND-based SR latch which can do dynamic-to-static conversion, improving the stability. In write operation, *WLE* is logic "1" and WE\_3 is logic "0" so NMOS transistors *MN1* is turned on. Whether *MN2* is turned on or not is determined by the *DI\_through*. *DI\_through* is determined by the signal from *DI\_in* through a positive-edge D-flip-flop. In write cycle, when *DI\_in* is logic "0" *MN2* is turned on which can discharge the *GBL* through NMOS transistors *MN1* and *MN2* and then *DO\_out* is logic "0". In contrast, in write cycle when *DI\_in* is logic "1" *MN2* is turned off and *GBL* can be kept on VDD and then *DO\_out* is logic "1". Fig.5. 9 Global word-line decoder Fig. 5.9 shows the *Global WL Decoder* in Fig. 5.1. It can generate the local-bank-selected signals *SELE\_L* and *SELE\_R*. If *IN\_A* and *IN\_B* are both logic "1", local-bank-selected signal $SELE\_L$ and $SELE\_R$ are pulled up to logic "1". Signal $IN\_A$ is determined by $\{e11, e10, e01, e00\}$ which are decoded by address signal A[7] and A[6] and Signal $IN\_B$ is determined by $\{d11, d10, d01, d00\}$ which are decoded by address signal A[5] and A[4]. Fig.5. 10 Local bank selection circuit Fig. 5.10 shows the local bank selection circuit (*LBS* in Fig. 5.1). In either read or write operation, both timing-controlled signal *WLE* and bank-selected signal *SELE* are logic "1". *LReset* is pulled up to VDD that can turn off the pre-charged PMOS transistors connected to local bit-lines. Furthermore, in write operation, *LWE* is logic "1". *LReset\_w* is also pulled up to VDD that can turn on the stacked NMOS transistors to discharge local bit-line to GND in write operation. When *WLE* is pulled down to logic "0", both *LReset* and *LReset\_w* are pulled down to logic "0 and then turn on the pre-charged PMOS transistors and turn of the discharged NMOS transistors connected to local bit-line in local evaluation circuit. Fig. 5.11 shows the waveform of local bank selection circuit. Fig.5. 11 Waveform of local bank selection circuit ## 5.3.5 XP and YP Decoder Fig.5. 12 (a) XP decoder (b) YP decoder Fig. 5.12 (a) shows the XP decoder in Fig. 5.1 and Fig. 5.12 (b) shows the YP decoder. Signals $TP_x[31:0]$ are decoded by address signals A[12:8] and Signals $TP_y[31:0]$ are decoded by address signals A[3:0]. In either read or write operation, if both WLE and $TP_x[n]$ are pulled up to logic "1" then XP[n] is pulled up to logic "1". Similarly in either read or write operation, if both WLE and $TP_y[n]$ are pulled up to logic "1" then YP[n] is pulled up to logic "1". Once WLE is pulled down to logic "0", all of XP signals and YP signals are logic "0". # 5.4 Design Implementation & Test-flow of Proposed Low VDD<sub>MIN</sub> 512Kb 8T SRAM Fig. 5.13 shows the test chip and Fig. 5.14 shows the layout view of the proposed low $V_{DDMIN}$ 512Kb 8T SRAM macro. The proposed low $V_{DDMIN}$ 512Kb 8T SRAM array is fabricated using UMC 40nm LP process. The area of bit-cell is 1.44 $\mu$ m x 0.59 $\mu$ m = 0.80 $\mu$ m<sup>2</sup> (Standard Logic Rule) and the chip size is 1807.7 $\mu$ m x 549.52 $\mu$ m = 0.9932mm<sup>2</sup>. Below is all of improved technique of this design - 1. Power-gating word-line driver - 2. Data-aware write-assist (DAWA) scheme - 3. Ripple bit-line read scheme - 4. Adaptive write-time tracing circuit - 5. Bit-interleaving (16-bits) - 6. Adaptive word-line pulse width controller - 7. Hierarchical word-line structure - 8. Leakage current replica keeper in multiplexer - 9. Local bit-line keeper - 10. Adaptive VVSS driver and WWL driver: VVSS enclose WWL pulse Fig.5. 13 Low VDDMIN 512Kb 8T SRAM Design on Test Chip Fig. 5. 14 Layout view of low V<sub>DDMIN</sub> 512Kb 8T SRAM In chapter 3.4, chapter 4.5 and chapter 5.3 we have introduces the external DC optional control pin such as [W1,W0], [K1,K0] and [C1,C0]. The pulse width of row-based WL signal, the pulse width of column-based signal and the timing of delay signal of local bit-line keeper can have impact on the correct function of this SRAM array. Fig. 5.15 shows the test flow of the proposed low $V_{DDMIN}$ 512Kb 8T SRAM. Fig. 5. 15 Test flow of the proposed low $V_{DDMIN}$ 512Kb 8T SRAM ## 5.5 Post-layout Simulation Result #### **5.5.1 Performance** Based on post-layout simulation result, this proposed 512Kb 8T SRAM array can operate at 502.5MHz in VDD=1.1V, TT corner & 25°C. This SRAM array also can operate at 28.41MHz in VDD=0.6V, TT corner & 25°C. Table 5.3 shows the access-time and write-time in high-voltage region (VDD=1.1V) and different process corner/temperature condition. Table 5.4 shows the access-time and write-time in low-voltage region (VDD=0.6V) and different process corner/temperature condition Fig. 5.16 shows the post-simulation result about maximum operating frequency versus operating voltage. Table.5. 3 Post-simulation result (Access-time and write-time), VDD=1.1V | Temperature | 12: | 5°C 25°C | | -40°C | | | |-------------|-----------|-----------|-----------|-----------|-----------|-----------| | Process | Write | Access | Write | Access | Write | Access | | corner | time (ns) | time (ns) | time (ns) | time (ns) | time (ns) | time (ns) | | PSNS | 1.37 | 2.05 | 1.41 | 2.12 | 1.43 | 2.13 | | PTNT | 1.07 | 1.62 | 1.09 | 1.65 | 1.08 | 1.63 | | PFNF | 0.87 | 1.33 | 0.86 | 1.31 | 0.85 | 1.28 | | PFNS | 1.15 | 1.72 | 11.15 | 1.74 | 1.14 | 1.73 | | PSNF | 1.04 | 1.55 | El-97 | 1.58 | 1.08 | 1.57 | Table.5. 4 Post-simulation result (Access-time and write-time), VDD=0.6V | Temperature | 125°C 25°C | | -40°C | | | | |-------------|------------|-----------|-----------|-----------|-----------|-----------| | Process | Write | Access | Write | Access | Write | Access | | corner | time (ns) | time (ns) | time (ns) | time (ns) | time (ns) | time (ns) | | PSNS | 17.73 | 28.47 | 54.36 | 82.93 | 183.7 | 268.3 | | PTNT | 7.83 | 8.52 | 18.48 | 28.51 | 45.51 | 72.5 | | PFNF | 4.07 | 6.35 | 7.74 | 12.73 | 16.17 | 26.85 | | PFNS | 11.39 | 17.74 | 28.24 | 46.27 | 80.6 | 129.63 | | PSNF | 7.87 | 10.36 | 20.59 | 26.48 | 58.22 | 77.24 | Fig.5. 16 Post-layout simulation result: Frequency vs. VDD Table 5.5 shows the specification of the proposed 512Kb SRAM compared to recent low-power SRAM Table.5. 5 Specification compared to recent low-power SRAM design | Company | Toshiba | Toshiba | Matsushita | NEC | |----------------------|--------------------|---------------------|--------------------|--------------------| | Reference | JSSC 2009<br>[5.3] | ISSCC 2009<br>[5.4] | JSSC 2008<br>[5.5] | JSSC 2011<br>[5.6] | | Technology | 65nm CMOS | 40nm CMOS | 45nm LSTP<br>CMOS | 40nm CMOS | | Operating<br>Voltage | 0.7V | 0.8V~1.0V | 0.75V~1.6V | 1.0V~1.2V | | SRAM cell | 6T | 6T | Conv. 8T | 6T | | Capacity | 256Kb | 512Kb | 64Kb | 2.0Mb | | Circuit | Cascaded BL, | Dual-Supply, | Divided Read | Hierarchical | | Techniques SRAM Speed | Self-write-back<br>SA 28ns Cycle<br>time @ 0.7V | Dynamic R/W<br>Supply, Level<br>programmable<br>WL driver<br>2.4ns Access<br>Time @ 1.0V | | BL with shared local SA, Read end detecting replica circuit 1.9ns Access Time @ 1.0V | | cell architecture, Multi-step WL control 4.0ns Access time @ 1.0V, SS Corner 0°C | |------------------------|--------------------------------------------------|------------------------------------------------------------------------------------------|------------------------------------------------------------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| | Company | Renesas | | TI+MIT | | Proposed SRAM Design | | | Reference | VLSI 2008 [5.7] | | ISSCC 2011 [5.8] | | *post-layout<br>simulation result | | | Technology | 45nm LSTP CMOS | | 28nm HKMG | | 40nm LP CMOS | | | Operating<br>Voltage | 0.7V~1.3V | | 0.6V~1.0V | | 0.6V~1.2V | | | SRAM cell | 6T | | 6T | | Proposed DAWA 8T | | | Capacity | 4.5Mb | | 128Kbit | | 512Kbits | | | Circuit<br>Techniques | Variation tolerant Suppressed WL and NBL | | Large-signal sensing, CC-PMOS, hierarchical bit-line, boosting circuit | | DAWA, Ripple BL<br>scheme, LBL<br>Keeper, Adaptive<br>read/write time<br>tracing circuit,<br>power-gating WL<br>driver | | | SRAM Speed | 3.2ns Access Time<br>@1.1V | | 2.5ns Cycle time @<br>1.0V, 50ns Cycle<br>time @ 0.6V | | 2.0ns Cycle time @<br>1.0V, 35.2ns Cycle<br>time @ 0.6V, 2.11ns<br>Access Time @1.0V | | # **5.5.2 Power Consumption** Table 5.6 shows the power consumption of read/write operation and stand-by mode in different process corner and high-voltage region (VDD=1.1V) and Table 5.7 shows the power consumption of read/write operation and stand-by mode in different process corner and low-voltage region (VDD=0.7V) Table.5. 6 Power consumption in R/W operation and STBY mode, VDD=1.1V | VDD=0.7V, | PSNS | PTNT | PFNF | PFNS | PSNF | |-----------|--------|--------|---------|--------|--------| | T=125°C | | | | | | | | | | | | | | Read | 4.23mW | 4.66mW | 5.22mW | 4.66mW | 4.69mW | | operation | | 11111 | | | | | | | | | | | | Write | 12.6mW | 13.2mW | 14.4mW\ | 13.2mW | 13.3mW | | operation | | | 8 | | | | | | | | 7 | | | STBY | 0.53mW | 0.69mW | 0.80mW | 0.71mW | 0.61mW | | mode | | | | | | | | | | 11111 | | | Table.5. 7 Power consumption in R/W operation and STBY mode, VDD=0.7V | VDD=0.7V, | PSNS | PTNT | PFNF | PFNS | PSNF | |-----------|--------|--------|--------|--------|--------| | T=125°C | | | | | | | | | | | | | | Read | 1.73mW | 1.69mW | 1.79mW | 1.74mW | 1.70mW | | operation | | | | | | | | | | | | | | Write | 4.60mW | 4.98mW | 5.20mW | 4.86mW | 5.06mW | | operation | | | | | | | | | | | | | | STBY mode | 0.32mW | 0.22mW | 0.29mW | 0.28mW | 0.21mW | | | | | | | | Fig. 5.17 shows the power-delay-product of read/write operation and stand-by mode under VDD=1.2V~0.7V, FF corner & 125°C Fig.5. 17 Power-delay-product of read/write operation # **5.6 Summary** A 512Kb 8T SRAM array design is presented in this chapter. The SRAM bit-cell is data-aware write-assist 8T SRAM bit-cell which we introduced in Chap 3. Adaptive write-time tracing replica circuit and adaptive VVSS and WWL driver are used in this SRAM array design which we presented in Chap 3. Ripple bit-line read scheme and local bit-line keeper which we presented in Chap 4 are also used in this SRAM array design. In this chapter first we introduced the floorplan and specification of this 512Kb SRAM array design. We also presented adaptive WL pulse-width controller to adaptively control the WL pulse-width. Power-gating WL driver technique is also used to reduce the leakage current. Design implement and test-flow of this 512Kb SRAM array are introduced in Chap 5.4. By post-layout simulation result, this 512Kb 8T SRAM can operate under wide operating voltage (VDD=1.2V~0.6V) that can cover all process and temperature variation. This 512Kb 8T SRAM array can operate at 502.5MHz at VDD=1.1V, TT corner and 25°C and it also can operate at 28.41MHz at VDD=0.6V, TT corner and 25°C. The power consumption of read operation and write operation in VDD=1.1V, TT corner and 25°C are 13.5 $\mu$ w/MHz and 6.87 $\mu$ w/MHz per operation, respectively. The VDDmin is 0.45V in TT corner and 25°C # Chapter 6 # **Conclusions & Future Work** #### **6.1 Conclusions** In modern IC design, low-power topic becomes more and more significant especially in more and more wide-ranging usage of portable devices such as PDA, notebook, cell-phone, etc. By the equation (1.1), we know that one of the most effective ways to reduce the total power consumption of chip is scaling down the operating voltage. In modern IC design, SRAM occupies the biggest area of whole SoC design so SRAM can dominate the performance and power consumption of whole SoC design. In summary, voltage scaling in SRAM circuit design is one of the most important topics in low-power IC design. In previous SRAM circuit design, conventional 6T SRAM bit-cell is the most popular. Due to advanced process scaling, the cell stability and write-ability of conventional 6T SRAM bit-cell is degraded due to global and local process variation. Furthermore, conventional 6T SRAM bit-cell is not suitable in low-voltage region due to read-disturb problem and half-select disturb problem. Consequently, we should find an alternative bit-cell that can correctly work in wide-range operating voltage (especially in low-voltage region) and some read/write assist circuitry that can improve the stability and write-ability is presented in this thesis, too. First, an 8T SRAM bit-cell is presented that can eliminate the read-disturb problem and write half-select disturb due to two layer pass-gates. One of pass-gate is controlled by row-based WL signal and the other pass-gate is controlled by column-based WWL signal that can eliminate write half-select disturb and support bit-interleaving structure. Bit-interleaving structure can decrease soft-error-rate (SER) which is more and more critical in advanced process. However, due to two-layer pass-gates, write-ability is degraded. We use data-aware write-assist scheme in our 8T SRAM bit-cell. Adaptive write-time tracing technique and adaptive VVSS/WWL driver are used to improve the stability of write-operation. Next we introduce ripple bit-line read scheme that can enhance the tolerance of leakage current and process variation. This read scheme can be easily adapted to single-ended large-signal sensing scheme and a simple inverter can be used to local sense-amp to reduce the area overhead. Ripple bit-line read scheme can also reduce the parasitic capacitance and leakage current of global bit-line. We also find an optimized local bit-line length, 32 bit-cells per local bit-line. To enhance the read stability, how we cope with leakage current in bit-line is very important. A local bit-line keeper design with delay signal generation circuit is also introduced. By simulation result, we can ensure the correctness of read operation even in the worst case of bit-line leakage current problem. Finally, a 512Kb 8T SRAM array fabricated in UMC 40nm Low-power (LP) CMOS process is presented. Adaptive WL pulse-width controlled is also used in this SRAM array. Power-gating WL driver is used to reduce the leakage current and then reduce the dynamic power consumption. The 512Kb 8T SRAM can operate under wide operating voltage (VDD=1.2V~0.6V). This 512Kb 8T SRAM array can operate at 502.5MHz in VDD=1.1V, TT corner & 25°C and it also can operate at 28.41MHz in VDD=0.6V, TT corner & 25°C. The power consumption of read operation and write operation in VDD=1.1V are 13.5μw/MHz and 6.87μw/MHz per operation, respectively. The VDDmin is 0.45V in TT corner & 25°C #### **6.2 Future Work** We can consider using boosting WL scheme. We can boost the row-based WL signal to strengthen the outer pass-gate *MR1*, improving the read-stability and write-ability simultaneously. We can also use voltage detector to detect the operating voltage to boost WL effectively in low VDD region and not to boost WL in high VDD region. Pipeline scheme can be used in this 512Kb 8T SRAM design, in order to improve the operating frequency. If we use pipeline scheme, we need to insert master-slave latches in somewhere to latch the data. Fig. 6.1 shows the pipeline scheme of SRAM design Fig.6. 1 Pipeline scheme of SRAM Design ### Reference ### Chapter 1 - [1.1] K. Zhang (ed.) Embedded Memories for Nano-Scale VLSI, Series on Integrated Circuits and Systems. Springer Science+Business Media, LLC 2009. - [1.2] M. Qazi, M.E. Sinangil, A.P. Chandrakasan, "Challenges and Directions for Low-Voltage SRAM," *IEEE Design & Test of Computers*, vol.28, no.1, pp.32-43, Jan-Feb 2011. - [1.3] F. Hamzaoglu, Y. Wang, P. Kolar, W. Liqiong, Y.G. Ng, U. Bhattacharya and K, Zhang, "Bit Cell Optimizations and Circuit Techniques for Nanoscale SRAM Design," *IEEE Design & Test of Computers*, vol.28, no.1, pp.22-31, Jan.-Feb. 2011. - [1.4] D. Markovic, C.-C. Wang, L.P. Alarcon, T.-T. Liu and J.M. Rabaey, "Ultralow-Power Design in Near-Threshold Region," *IEEE Proceedings*, vol.98, no.2, pp.237-252, Feb. 2010. - [1.5] R.G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester and T. Mudge, "Near-Threshold Computing: Reclaiming Moore's Law through Energy Efficient Integrated Circuits," *IEEE Proceedings*, vol.98, no.2, pp.253-266, Feb. 2010. - [1.6] G. Chen, D. Sylvester, D. Blaauw and T. Mudge, "Yield-Driven Near-Threshold SRAM Design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol.18, no.11, pp.1590-1598, Nov. 2010. - [1.7] L. Chang, R.K. Montoye, Y. Nakamura, K.A. Batson, R.J. Eickemeyer, R.H. Dennard, W. Haensch, D. Jamsek, "An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches," *IEEE Journal of Solid-State Circuits*, vol.43, no.4, pp.956-963, April 2008. #### **Chapter 2** [2.1] A. Pavlov and M. Sachdev, "CMOS SRAM Circuit Design and Parametric Test in Nano-Scaled Technologies," *Springer Verlag*, June 2008. - [2.2] J. P. Kulkarni, K. Roy, "Ultralow-Voltage Process-Variation-Tolerant Schmitt-Trigger-Based SRAM Design," *IEEE Transactions on Very Large Scale Integration Systems*, 2011. - [2.3] E. Seevinck, F.J. List and J. Lohstroh, "Static-Noise Margin Analysis of MOS SRAM Cells", *IEEE Journal of Solid-State Circuits*, Vol. 22, No. 5, pp.748-754, 1987. - [2.4] K, Takeda, H. Ikeda, Y. Hagihara, M. Nomura, H. Kobatake, "Redefinition of Write Margin for Next-Generation SRAM and Write-Margin Monitoring Circuit," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.2602-2611, 6-9 Feb. 2006. - [2.5] A.-T. Do, J.Y.S. Low, J.Y.L. Low, Z.-H. Kong, X. Tan, K.-S. Yeo, "An 8T Differential SRAM with Improved Noise Margin for Bit-Interleaving in 65 nm CMOS," *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 58, No. 6, pp.1252-1263, June 2011. - [2.6] L. Chang, R.K. Montoye, Y. Nakamura, K.A. Batson, R.J. Eickemeyer, R.H. Dennard, W. Haensch, D. Jamsek, "An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches," *IEEE Journal of Solid-State Circuits*, vol.43, no.4, pp.956-963, April 2008. - [2.7] E. Grossar, M. Stucchi, K. Maex, and W. Dehaene, "Read Stability and Write-Ability Analysis of SRAM Cells for Nanometer Technologies," *IEEE Journal of Solid-State Circuits*, Vol. 41, No.11, pp. 2577-2588, Nov. 2006. - [2.8] S. Natarajan, A. Shubat, "SE5 SRAM design in the nanoscale era," *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp.366-367, 10-10 Feb. 2005. - [2.9] C.-T. Chuang, S. Mukhopadhyay, J.-J. Kim, K. Kim, and R. Rao, "High-Performance SRAM in Nanoscale CMOS: Design Challenges and Techniques," *IEEE International Workshop on Memory Technology, Design, and Testing*, pp. 4-12, 3-5 Dec. 2007. - [2.10] C.-T. Chuang, "Tutorial 12: Challenges and Opportunities of Digital Design in Nanoscale CMOS," *IEEE International Symposium on Circuits and Systems*, 27-30 May 2007 - [2.11] S. Leomant, A. Turier, A.; L.B Ammar, A. Amara, "SRAM dedicated PCMs for leakage characterization in nanometer CMOS technologies," *Design and International Conference on Test of Integrated Systems in Nanoscale* - Technology, pp.316-321, 5-7 Sept. 2006 - [2.12] Q. Chen; S. Mukhopadhyay, A. Bansal, K. Roy, "Circuit-aware Device Design Methodology for Nanometer Technologies: A Case Study for Low Power SRAM Design," *Proceedings, Design, Automation and Test in Europe*, vol.1, pp.1-6, 6-10 March 2006. - [2.13] K. Roy, S. Mukhopadhyay, H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," *IEEE Proceedings*, vol.91, no.2, pp. 305- 327, Feb 2003. - [2.14] K. Nii, Y. Tsukamoto, T. Yoshizawa, S. Imaoka, Y. Yamagami, T. Suzuki, A. Shibayama, H. Makino, S. Iwade, "A 90-nm low-power 32-kB embedded SRAM with gate leakage suppression circuit for mobile applications," *IEEE Journal of Solid-State Circuits*, vol.39, no.4, pp. 684-693, April 2004. - [2.15] H.J.M. Veendrick, "Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," *IEEE Journal of Solid-State Circuits*, vol.19, no.4, pp. 468- 473, Aug 1984. - [2.16] R. Jotwani, S. Sundaram, S. Kosonocky, A. Schaefer, V. Andrade, G. Constant, A. Novak and S. Naffziger "An x86-64 core implemented in 32nm SOI CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.106-107, 7-11 Feb. 2010. - [2.17] M.-F. Chang, J.-J. Wu, K.-T. Chen, Y.-C. Chen, Y.-H. Chen, R. Lee, H.-J. Liao and H. Yamauchi, "A Differential Data-Aware Power-Supplied (D<sup>2</sup>AP) 8T SRAM Cell With Expanded Write/Read Stabilities for Lower VDDmin Applications," *IEEE Journal of Solid-State Circuits*, vol.45, no.6, pp.1234-1245, June 2010. - [2.18] J.-J. Wu, Y.-H. Chen, M.-F. Chang, P.-W. Chou, C.-Y. Chen, H.-J, Liao, M.-B. Chen, Y.-H, Chu, W.-C. Wu, H. Yamauchi "A Large σV<sub>TH</sub>/VDD Tolerant Zigzag 8T SRAM with Area-Efficient Decoupled Differential Sensing and Fast Write-Back Scheme," *IEEE Journal of Solid-State Circuits*, vol.46, no.4, pp.815-827, April 2010. - [2.19] R.V. Joshi, K. Rouwaida, V. Ramadurai, "A Novel Column-Decoupled 8T Cell for Low-Power Differential and Domino-Based SRAM Design," *IEEE Transactions on Very Large Scale Integration Systems*, vol.19, no.5, pp.869-882, May 2011. - [2.20] J.P. Kulkani, K. Kim, S.P. Park and K. Roy, "Process Variation Tolerant - SRAM Array for Ultra Low Voltage Applications", *Design Automation Conference*, pp.108-113, 8-13 June 2008. - [2.21] S. Okumura, Y. Iguchi, S. Yoshimoto, H. Fujiwara H. Noguchi, K, Nii, H. Kawaguchi, M. Yoshimoto, "A 0.56-V 128kb 10T SRAM using column line assist (CLA) scheme," *IEEE Quality of Electronic Design*, pp.659-663, 16-18 March 2009. - [2.22] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, "A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read-scheme in 90 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 2, pp. 650–658, Feb. 2009. - [2.23] N. Verma, and A. P. Chandrakasan, "A 256 kb 65 nm 8T sub-threshold SRAM employing Sense-amplifier Redundancy," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 1, pp. 141–14, Jan. 2008. - [2.24] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, "A Reconfigurable 8T Ultra-Dynamic Voltage Scalable (U-DVS) SRAM in 65nm CMOS", *IEEE Journal of Solid-State Circuits*, vol. 44, no. 11, pp. 3163–317, Nov. 2009. - [2.25] T.-H. Kim, J. Liu, J. Kean, and C. H. Kim, "A 0.2 V, 480 kb sub-threshold SRAM with 1 k cells per bit-line for ultra-low -voltage computing," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008. - [2.26] M. Qazi, K. Stawiasz, L. Chang, A.P. Chandrakasan, "A 512kb 8T SRAM Macro Operating Down to 0.57 V With an AC-Coupled Sense Amplifier and Embedded Data-Retention-Voltage Sensor in 45 nm SOI CMOS," *IEEE Journal of Solid-State Circuits*, vol.46, no.1, pp.85-96, Jan. 2011. - [2.27] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, Y. Nakase, H. Shinohara, "A 45nm 0.6V cross-point 8T SRAM with negative biased read/write assist," *IEEE Symposium on VLSI Circuits*, pp.158-159, 16-18 June 2009. - [2.28] D.P. Wang, H.J. Liao, H. Yamauchi, Y.H. Chen, Y.L. Lin, S.H. Lin, D.C. Liu, H.C. Chang, W. Hwang, "A 45nm dual-port SRAM with write and read capability enhancement at low voltage," *IEEE International SOC Conference*, pp.211-214, 26-29 Sept. 2007. - [2.29] H. Pilo, I. Arsovski, K. Batson, G. Braceras, J. Gabric, R. Houle, S. Lamphier, F. Pavlik, A. Seferagic, L.-Y. Chen, S.-B. Ko, C. Radens, "A 64Mb SRAM in 32nm High-k metal-gate SOI technology with 0.7V operation enabled by stability, write-ability and read-ability enhancements," *IEEE International* - Solid-State Circuits Conference Digest of Technical Papers, pp.254-256, 20-24 Feb. 2011. - [2.30] S. Mukhopadhyay, R.M. Rao, J.J. Kim; C.T. Chuang; , "SRAM Write-Ability Improvement With Transient Negative Bit-Line Voltage," *IEEE Transactions on Very Large Scale Integration Systems*, vol.19, no.1, pp.24-32, Jan. 2011. - [2.31] Y. Fujimura, O. Hirabayashi, T. Sasaki, A. Suzuki, A. Kawasumi, Y. Takeyama, K. Kushida, G. Fukano, A. Katayama, Y. Niki, T. Yabe, "A configurable SRAM with constant-negative-level write buffer for low-voltage operation with 0.149μm² cell in 32nm high-k metal-gate CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.348-349, 7-11 Feb. 2010. - [2.32] O, Hirabayashi, A. Kawasumi, A. Suzuki, Y. Takeyama, K. Kushida, T. Sasaki, A. Katayama, G. Fukano, Y. Fujimura, T. Nakazato, Y. Shizuki, N. Kushiyama, T. Yabe, "A process-variation-tolerant dual-power-supply SRAM with 0.179μm² Cell in 40nm CMOS using level-programmable word-line driver," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.458-459,459a, 8-12 Feb. 2009. - [2.33] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, M. Yoshimoto, "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," *IEEE Symposium on VLSI Circuits*, pp.256-257, 14-16 June 2007. - [2.34] K. Nii M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi S. Imaoka, H. Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, S. Okazaki, K. Satomi, H. Akamatsu, H.Shinohara, "A 45-nm Bulk CMOS Embedded SRAM With Improved Immunity Against Process and Temperature Variations," *IEEE Journal of Solid-State Circuits*, vol.43, no.1, pp.180-191, Jan. 2008. - [2.35] P. Kolar, E. Karl, U. Bhattacharya, F. Hamzaoglu, H. Nho, Y.-G. Ng, Y. Wang, K. Zhang, "A 32 nm High-K Metal Gate SRAM With Adaptive Dynamic Stability Enhancement for Low-Voltage Operation," *IEEE Journal of Solid-State Circuits*, vol.46, no.1, pp.76-84, Jan 2011. - [2.36] K. Takeda, T. Saito, S. Asayama, Y. Aimoto H. Kobatake, S. Ito, T. Takahashi, M. Nomura, K. Takeuchi, Y. Hayashi, "Multi-Step Word-Line Control Technology in Hierarchical Cell Architecture for Scaled-Down High-Density SRAMs," *IEEE Journal of Solid-State Circuits*, vol.46, no.4, pp.806-814, ## **Chapter 3** - [3.1] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, M. Yoshimoto, "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," *IEEE Symposium on VLSI Circuits*, pp.256-257, 14-16 June 2007. - [3.2] R.E. Aly, M.A. Bayoumi, "Low-Power Cache Design Using 7T SRAM Cell," *IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS*, vol. 54, no. 4, pp.318 322, Apr. 2007. - [3.3] M.-T. Chang, W. Hwang, "A fully-differential sub-threshold SRAM cell with auto-compensation," *IEEE Asia Pacific Conference on Circuits and Systems*, pp.1771-1774, Nov. 30 2008-Dec. 3 2008. - [3.4] A. Sil, S. Ghosh, M. Bayoumi, "A novel 8T SRAM cell with improved read-SNM," *IEEE Northeast Workshop on Circuits and Systems*, pp.1289-1292, 5-8 Aug. 2007. - [3.5] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and T. Kawahara, "90-nm Process-Variation Adaptive Embedded SRAM Modules With Power-Line-Floating Write Technique," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 3, pp.706–711, March. 2006. - [3.6] S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara and H. Akamatsu, "A 45 nm 2-port 8T-SRAM Using Hierarchical Replica Bit-line Technique With Immunity From Simultaneous R/W Access Issues," *IEEE Journal of Solid-State Circuits*, vol.43, no.4, pp.938-945, April 2008. ### **Chapter 4** [4.1] K. Kushida, A. Suzuki, G. Fukano, A. Kawasumi, O. Hirabayashi, Y. Takeyama, T. Sasaki, A. Katayama, Y. Fujimura, T. Yabe, "A 0.7 V Single-Supply SRAM With 0.495 μm² Cell in 65 nm Technology Utilizing Self-Write-Back Sense Amplifier and Cascaded Bit Line Scheme," *IEEE* - Journal of Solid-State Circuits, vol.44, no.4, pp.1192-1198, April 2009. - [4.2] S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K.Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara and H. Akamatsu, "A 45 nm 2-port 8T-SRAM Using Hierarchical Replica Bit-line Technique With Immunity From Simultaneous R/W Access Issues," *IEEE Journal of Solid-State Circuits*, vol.43, no.4, pp.938-945, April 2008 - [4.3] Y. Lih, N. Tzartzanis, W.W. Walker, "A Leakage Current Replica Keeper for Dynamic Circuits," *IEEE Journal of Solid-State Circuits*, vol.42, no.1, pp.48-55, Jan. 2007. - [4.4] B. Calhoun and A. Chandrakasan, "A 256 kb sub-threshold SRAM in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007. - [4.5] T.-H. Kim, J. Liu, J. Kean, and C.H. Kim, "A 0.2 V, 480 kb sub-threshold SRAM with 1 k cells per bit-line for ultra-low-voltage computing," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008. - [4.6] Z. Liu, Kursun, V., "High Read Stability and Low Leakage SRAM Cell Based on Data/Bit-line Decoupling," *IEEE International SOC Conference*, pp.115-116, 24-27 Sept. 2006. - [4.7] C.-H. Lo, S.-Y. Huang, "P-P-N Based 10T SRAM Cell for Low-Leakage and Resilient Sub-threshold Operation," *IEEE Journal of Solid-State Circuits*, vol.46, no.3, pp.695-704, March 2011. - [4.8] N. Verma, and A. P. Chandrakasan, "A 256 kb 65 nm 8T sub-threshold SRAM employing Sense-amplifier Redundancy," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 1, pp. 141–14, Jan. 2008. - [4.9] A. Alvandpour, R.K. Krishnamurthy, K. Soumyanath, S.Y. Borkar, "A sub-130-nm conditional keeper technique," *IEEE Journal of Solid-State Circuits*, vol.37, no.5, pp.633-638, May 2002. - [4.10] A. Raychowdhury, B. Geuskens, J. Kulkarni, J. Tschanz, K. Bowman, T. Karnik, S-L. Lu, V. De, M.M Khellah, "PVT-and-aging adaptive word-line boosting for 8T SRAM power reduction," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.352-353, 7-11 Feb. 2010. - [4.11] R. Jotwani, S. Sundaram, S. Kosonocky, A. Schaefer, V. Andrade, G. - Constant, A. Novak and S. Naffziger "An x86-64 core implemented in 32nm SOI CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.106-107, 7-11 Feb. 2010. - [4.12] A. Agarwal, S. Hsu, S. Mathew, M. Anders, H. Kaul, F. Sheikh, R. Krishnamurthy, "A 32nm 8.3GHz 64-entry × 32b variation tolerant near-threshold voltage register file", *IEEE Symposium on VLSI Circuits*, pp.105-106, 16-18 June 2010. - [4.13] M.-H. Tu, J.-Y. Lin, M.-C. Tsai; S.-J. Jou, C.-T. Chuang, "Single-Ended Sub-threshold SRAM With Asymmetrical Write/Read-Assist," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol.57, no.12, pp.3039-3047, Dec. 2010 - [4.14] T.-H. Kim, J. Liu, and C. H. Kim, "A Voltage Scalable 0.26 V, 64 kb 8T SRAM With V<sub>min</sub> Lowering Techniques and Deep Sleep Mode," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 6, pp. 1785–1795, June. 2009. ## Chapter 5 - [5.1] M.-T. Chang, P.-T. Huang, W. Hwang, "A robust ultra-low power asynchronous FIFO memory with self-adaptive power control," 2008 IEEE International SOC Conference, pp.175-178, 17-20 Sept. 2008. - [5.2] J. Pille, C. Adams, T. Christensen, S.R. Cottier,, S. Ehrenreich, F. Kono, D. Nelson, O, Takahashi, S. Tokito, O. Torreiter, O. Wagner, D. Wendel, "Implementation of the Cell Broadband Engine™ in 65 nm SOI Technology Featuring Dual Power Supply SRAM Arrays Supporting 6 GHz at 1.3 V," *IEEE Journal of Solid-State Circuits*, vol.43, no.1, pp.163-171, Jan. 2008. - [5.3] K. Kushida, A. Suzuki, G. Fukano, A. Kawasumi, O. Hirabayashi, Y. Takeyama, T. Sasaki, A. Katayama, Y. Fujimura, T. Yabe, "A 0.7 V Single-Supply SRAM With 0.495 μm² Cell in 65 nm Technology Utilizing Self-Write-Back Sense Amplifier and Cascaded Bit Line Scheme," *IEEE Journal of Solid-State Circuits*, vol.44, no.4, pp.1192-1198, April 2009. - [5.4] O, Hirabayashi, A. Kawasumi, A. Suzuki, Y. Takeyama, K. Kushida, T. Sasaki, A. Katayama, G. Fukano, Y. Fujimura, T. Nakazato, Y. Shizuki, N. Kushiyama, T. Yabe, "A process-variation-tolerant dual-power-supply SRAM with 0.179μm² Cell in 40nm CMOS using level-programmable word-line driver," *IEEE International Solid-State Circuits Conference* - - Digest of Technical Papers, pp.458-459,459a, 8-12 Feb. 2009. - [5.5] S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara and H. Akamatsu, "A 45 nm 2-port 8T-SRAM Using Hierarchical Replica Bit-line Technique With Immunity From Simultaneous R/W Access Issues," *IEEE Journal of Solid-State Circuits*, vol.43, no.4, pp.938-945, April 2008. - [5.6] K. Takeda, T. Saito, S. Asayama, Y. Aimoto H. Kobatake, S. Ito, T. Takahashi, M. Nomura, K. Takeuchi, Y. Hayashi, "Multi-Step Word-Line Control Technology in Hierarchical Cell Architecture for Scaled-Down High-Density SRAMs," *IEEE Journal of Solid-State Circuits*, vol.46, no.4, pp.806-814, April 2011. - [5.7] K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, Y. Oda, K. Usui, T. Kawamura, N. Tsuboi, T. Iwasaki, K. Hashimoto, H. Makino, H. Shinohara, "A 45-nm single-port and dual-port SRAM family with robust read/write stabilizing circuitry under DVFS environment," *IEEE Symposium on VLSI Circuits*, pp.212-213, 18-20 June 2008. - [5.8] M.E. Sinangil, H. Mair, A.P Chandrakasan, "A 28nm high-density 6T SRAM with optimized peripheral-assist circuits for operation down to 0.6V," IEEE International Solid-State Circuits Conference Digest of Technical Papers pp.260-262, 20-24 Feb. 2011. ### Vita #### Chien-Hen Chen 陳建亨 #### PERSONAL INFORMATION Birth Date: Aug. 16, 1987 Birth Place: Kaohsiung, TAIWAN Email: kcnevo4@gmail.com Address: Department of Electronics Engineering National Chiao Tung University 1001 Ta-Hsueh Road Hsin-Chu, Taiwan 30010, R.O.C #### **EDUCATION** 08/2009 – 09/2011 M.S. in Electronics Engineering, National Chiao Tung University Thesis: Low VDD<sub>MIN</sub> 512Kb 8T SRAM Design in 40nm Process 1896 09/2005 – 06/2009 B.S. in Electrics Engineering, National Cheng Kung University #### **PUBLICATIONS** C.-H. Chen, H.-I. Yang, C.-T. Chuang, W. Hwang, "A Data-Aware Write-Assist 8T SRAM Design in Low VDD<sub>MIN</sub> Region with Ripple Bit-line Read Scheme", (Will be submitted) #### **PATENTS** C.-T. Chuang, H.-I. Yang, C.-Y. Lu, C.-H. Chen, C.-S. Chang, M.-H. Tu, W. Hwang, S.-J. Jou, "Ripple Bit-Line Schemes for Improving Leakage/Variation Tolerance and Density/Performance of Nanoscale SRAM", Joint NCTU/Faraday Patent, to be filed through Faraday