# 國立交通大學

# 電子工程學系 電子研究所

## 碩士論文

近/次臨界靜態隨機存取記憶體為基礎的先進先出記 憶體設計於近身無線網路的設計和實作

# Design and Implementation of Near-/Sub-threshold SRAM-based First-In-First-Out (FIFO) Memory for WBAN Application

研究生杜威宏

指導教授:黃 威 教授

中華民國一百年九月

# 近/次臨界靜態隨機存取記憶體為基礎的先進先出記 憶體設計於近身無線網路的設計和實作

# Design and Implementation of Near-/Sub-threshold SRAM-based First-In-First-Out (FIFO) Memory for WBAN Application

研究生:杜威宏 Student:Wei-Hung Du 指導教授:黃 威 教授 Advisor:Prof.Wei Hwang 國 立 交 通 大 學 電子 工 程 學 系 電子 研 究 所 碩 士 論 文

Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical Engineering and Computer Engineering National Chiao Tung University in partial Fulfillment of the Requirements for the Degree of

Master in

**Electronics Engineering** 

June 2011

Hsinchu, Taiwan

中華民國一百年九月

近/次臨界靜態隨機存取記憶體為基礎的先進先出記

憶體設計於近身無線網路的設計和實作

學生:杜威宏 指導教授:黃 威 教授

### 國立交通大學電子工程學系電子研究所

### 摘 要

因為功率消耗的增減會以平方倍的方式隨著電源電壓的縮減,因此超低電壓 電路設計成為一個重要研究課題。然而,在低電壓的區域,大幅增加的電路參數 敏感度會限制靜態隨機存取記憶體的運行。因此,在極低電壓下,穩定性是最需 要被關注的。在本論文中,首先提出一個 10T 近/次臨界隨機存取記憶體存儲單 元,比起傳統的隨機存取記憶體,它有 1.9 倍的讀取靜態雜訊邊界, 3.2 倍的寫 入邊界及較好的變異抵抗力,並且降低資料相依的位線漏電流與更好的溫度變化 容忍度,以90 奈米技術下十萬次的 Monte Carlo 模擬結果顯示,比起其他等面 積的雙埠隨機存取記憶體存儲單元,這個 10T 的記憶體單元可具備最低的 Vmin 能力。接著以聯電 90 奈米技術實現以隨機存取記憶體為基礎的一個極低功率 16kb 先進先出記憶體應用於近身無線網路,這個先進先出記憶體有高變異抵抗 力可容許操作在極低電壓區域,他採用自適應功率控制和電源開系統,以計數器 為基礎的指標,和一個智能仿製讀寫控制裝置來達到極低功耗。在0.5V 電壓下, 此設計每讀寫一次僅需平均消耗 1.646 微瓦。同時,以台積電 65 奈米技術來實 作一個 2kb 內置排控動態電源調整 9T 隨機存取記憶體為基礎的先進先出記憶體。 一個內置排控的自適應電源切換控制電路會轉換兩種操作電壓,0.5V和 0.3V, 來達成高性能和低功率的應用。此提出的內置排控動態電源調整先進先出記憶體 消耗 0.529 微瓦的平均功率,在接收時段和傳輸時段分別可以和無使用此技術相 比省下 49.3% 和 18.5% 的功率。

# Design and Implementation of Near-/Sub-threshold SRAM-based First-In-First-Out (FIFO) Memory for WBAN Application

Student: Wei-Hung Du Advisors: Prof. Wei Hwang

Department of Electronics Engineering & Institute of Electronics National Chiao-Tung University

### **ABSTRACT**

The power consumption is reduced quadratically by supply voltage scaling down. However, the effect of environment variations on MOSFETs characteristics is too severe to degrade the SRAM operation robustness in low voltage regime. Hence, reliability is the major concern in ultra-low voltage. In this thesis, a 10T sub/near-threshold SRAM bit-cell is proposed firstly, which has 1.9X read SNM, 3.2X write margin, and reduces data-dependent bit-line leakage with better temperature variation tolerance. Simulation results on 100,000 times Monte Carlo simulation in UMC 90nm CMOS technology show that our 10T bit-cell gives the minimum V<sub>min</sub> compared to the other iso-area successful bit-cells. Secondly, an ultra-low power 16Kb SRAM-based first-in first-out (FIFO) memory is proposed for wireless body area networks (WBANs). With high variation immunity, this FIFO memory is capable of operating in ultra-low voltage regime, which features adaptive power control circuit, counter-based pointers, and a smart replica read/write control unit. At 0.5V supply voltage, the proposed design consumes 1.646µW in average per read/write operation. Thirdly, a 2Kb built-in row-controlled DVS 9T SRAM-based FIFO memory is implemented in TSMC 65nm technology. A row-based adaptive power switch control system transforms two supply voltages, 0.5V and 0.3V, for energy-constrained applications. The proposed built-in row-controlled DVS FIFO consumes 0.529µW average power, which can provide 49.3% power savings in receiving time and 18.5% power saving in transmitting time compared to those without it.

#### 致謝

走出 311 實驗室,才發現自己是真的畢業了。這兩年下來,謝謝**黃威**老師對 我的照顧和教導,讓我有信心再繼續做下去,終於我達成了老師的期待,也完成 了自己的目標。

實驗室的學長們也都很照顧我,張銘宏學長能夠在我需要幫助的時候,想出 許多解決的方法,並且分享我一些待人處事的訣竅,讓我在碩士兩年得到的不只 是學業,還有與他人相處的方法。接下來是黃柏蒼學長,感謝他在學業上的提點 和幫助,並且分享一些人生的愛情觀,讓我對愛情有新的見解。還有謝維致和楊 皓義學長,在學習和課業都有給予我相當大的幫助。實驗室的同學陳建亨謝謝你 在學業上能夠共同勉勵與討論,讓我能夠順利完成研究。楊博任同學,感謝你的 幫助和陪伴,讓我能夠走過低潮期。林上圓同學謝謝你。感謝賴淑琳學妹、林弘 璋學弟、陳美維學妹,江客霆學弟的鼓勵,陪伴我度過碩二辛苦的一年。

還有室友林佳緯, 王耀駿在我回家的時候能適時的給予我勉勵,並在我感情 失落的時候,安慰我,謝謝你們。這六年來陪伴我最多的人,蔡依萍謝謝妳在身 邊的陪伴,在我最需要的時候出現在電話的另一頭,你帶給我的其實比妳想像的 還要多,這兩年真的很謝謝妳。

謝謝我的爸爸杜坤柱先生和媽媽邱淑英小姐辛苦的工作,資助我讀完研究所, 回到家常常看到爸媽工作完疲憊的身軀,心中總有不捨,謝謝妳們二十幾年來的 栽培。杜威廷哥哥謝謝你常常打電話來關心我,讓我感到家庭的溫暖。

最後謝謝所有幫助我的人,有你們的幫助讓我能夠順利完成這篇論文,謝謝。

III

# Contents

| Chapter 1 Introduction                                | 1  |
|-------------------------------------------------------|----|
| 1.1 Background                                        | 1  |
| 1.2 Challenges                                        | 2  |
| 1.3 Motivation                                        | 3  |
| 1.4 Thesis Organization                               | 4  |
| Chapter 2 Previous Low-Power SRAM Designs             | 5  |
| 2.1 Introduction                                      | 5  |
| 2.2 Overview of SRAM Operation                        | 6  |
| 2.2.1 SRAM Column Circuitry                           | 7  |
| 2.2.2 Conventional Symmetric 6T SRAM Bit-Cell         | 8  |
| 2.3 Power Dissipation                                 | 10 |
| 2.3.1 Dynamic Power                                   | 11 |
| 2.3.2 Leakage Power                                   | 11 |
| 2.3.3 Short-Circuit Power                             | 14 |
| 2.3.4 Low power SRAM design technology                | 14 |
| 2.4 SRAM Bit-Cell Stability                           | 17 |
| 2.4.1 Static Noise Margin (SNM)                       | 17 |
| 2.4.2 Write Margin (WM)                               | 18 |
| 2.4.3 Impact of Variation on SRAM in Low Voltage      | 19 |
| 2.5 Previous Low Voltage SRAM design                  | 23 |
| 2.5.1 Previous Read/Write Assist Peripheral Circuit   | 24 |
| 2.6 Previous Low Power Single-Port SRAM               | 27 |
| 2.7 Previous Low Voltage Dual-Port SRAM               | 33 |
| 2.8 Summary                                           | 39 |
| Chapter 3 A Near-/Sub-threshold 10T SRAM Cell Design  | 40 |
| 3.1 Introduction                                      | 40 |
| 3.1.1 Conventional Dual-Port SRAM Bit-Cell            | 40 |
| 3.1.2 Conventional Dual-Port SRAM limitation          | 41 |
| 3.2 A Near-/Sub-threshold 10T SRAM Bit-Cell           | 42 |
| 3.2.1 Layout Consideration                            | 43 |
| 3.3 Dual-Port SRAM Bit-cell Analysis                  | 44 |
| 3.3.1 Read ability Improvement                        | 45 |
| 3.3.2 Write-ability improvement                       | 46 |
| 3.3.3 Bit-line Leakage Reduction                      | 47 |
| 3.4 Dual-Port SRAM Bit-Cell V <sub>MIN</sub> Analysis | 50 |
| 3.4.1 Iso-Area Bit-cells                              | 50 |

| 3.4.2 Read-Failure Probability                                     | 53      |
|--------------------------------------------------------------------|---------|
| 3.4.3 Hold-Failure Probability                                     | 54      |
| 3.4.4 Write-Failure Probability                                    | 56      |
| 3.4.5 Iso-Area V <sub>min</sub> Comparison                         | 57      |
| 3.5 Power Consumption                                              | 58      |
| 3.5.1 Read Power Consumption                                       |         |
| 3.5.2 Write Power Consumption                                      | 59      |
| 3.5.3 Leakage Power Consumption                                    | 60      |
| 3.6 Summary                                                        | 61      |
| Chapter 4 A 16Kb Near-threshold SRAM-Based FIFO in 90n             | m       |
| CMOS for WBANs                                                     | 62      |
| 4.1 Introduction                                                   | 62      |
| 4.1.1 Wireless Sensor Node for WBANs                               | 63      |
| 4.2 Ultra-Low Power FIFO Memory                                    | 64      |
| 4.2.1 10T SRAM Storage Element                                     | 67      |
| 4.2.2 Adaptive Power Control                                       | 68      |
| 4.2.3 Counter-based Pointer Structure                              | 69      |
| 4.2.4 Adaptive Replica Read/Write Control Unit                     | 72      |
| 4.3 A 16kb 0.5V SRAM-Based FIFO in 90nm CMOS                       | 76      |
| 4.3.1 Power Consumption Analysis                                   | 77      |
| 4.3.2 Post-Layout Simulation Result                                | 78      |
| 4.4 Summary                                                        | 80      |
| Chapter 5 A 2kb Built-in Row-Control Dynamic Voltage Scal          | ing     |
| Near-/Sub-threshold FIFO memory in 65nm CMOS for WBA               | Ns81    |
| 5.1 Introduction                                                   | 81      |
| 5.2 Built-in Row-Control DVS FIFO Memory                           | 81      |
| 5.3 Storage Element                                                | 84      |
| 5.3.1 A 9T Sub/Near-threshold SRAM Bit-cell                        | 84      |
| 5.3.2 Layout Consideration                                         | 85      |
| 5.3.3 Operation Analysis                                           |         |
| 5.4 Adaptive Replica Control Circuits                              | 94      |
| 5.4.1 Write Pulse Control Circuit                                  | 94      |
| 5.4.2 Read Pulse Control Circuit                                   | 95      |
| 5.5 Adaptive Power Switch Control System                           | 97      |
| 5.5.1 Adaptive Power Switch Control Circuit                        | 100     |
| 5.5.2 Sub-block Size Design Consideration                          | 102     |
| 5.6 A 2kb Built-in Row-Controlled DVS Near-/Sub-threshold FIFO men | nory in |
| 65nm CMOS                                                          | 103     |

| 5.6.1 Post-Layout Simulation Result           |     |
|-----------------------------------------------|-----|
| 5.6.2 Design Issue of DVS FIFO                |     |
| 5.6.3 Analysis of DVS FIFO Energy Consumption |     |
| 5.7 Summary                                   |     |
| Chapter 6 Conclusions                         | 114 |
| 6.1 Conclusions                               |     |
| 6.2 Future Work                               |     |
| References 117                                |     |
| Vita 126                                      |     |



# **List of Figures**

| Fig. 1.1 Trend in minimum energy point of a 32b adder with process scaling using                              |
|---------------------------------------------------------------------------------------------------------------|
| predictive models [1.3]1                                                                                      |
| Fig. 2.1 Three low-power applications culminating in SRAM consuming 69% of                                    |
| the chip power [2.1]5                                                                                         |
| Fig. 2.2 SRAM organization [2.2]                                                                              |
| Fig. 2.3 A single-port SRAM column configuration example7                                                     |
| Fig. 2.4 Conventional symmetric 6T SRAM bit-cell circuit                                                      |
| Fig. 2.5 Conventional symmetric 6T SRAM bit-cell layout9                                                      |
| Fig. 2.6 (a) Thincell and (b) straight line layout (c) SNM comparison [2.3]9                                  |
| Fig. 2.7 Read example of 6T SRAM cell10                                                                       |
| Fig. 2.8 Write example of 6T SRAM cell                                                                        |
| Fig. 2.9 Leakage current of deep-submicron transistors                                                        |
| Fig. 2.10 Gate direct tunneling leakage [2.7]                                                                 |
| Fig. 2.11 Standard setup for finding Hold SNM                                                                 |
| Fig. 2.12 Standard setup for finding Read SNM                                                                 |
| Fig. 2.13 Write margin of a SRAM bit-cell                                                                     |
| Fig. 2.14 The β ratio of 6T SRAM bit-cell                                                                     |
| Fig. 2.15 6T SRAM SNM loss at low voltages [2.39]20                                                           |
| Fig. 2.16 6T SRAM write margin [2.39]                                                                         |
| Fig. 2.17 Read-current distribution [2.39]                                                                    |
| Fig. 2.18 I <sub>READ</sub> is less than I <sub>leakage</sub> from un-accessed cells at low voltage. [2.40]22 |
| Fig. 2.19 Half-select disturb during a write operation                                                        |
| Fig. 2.20 The read/write assist circuit in [2.41]24                                                           |
| Fig. 2.21 The read/write assist circuit in [2.43]25                                                           |
| Fig. 2.22 The read/write assist circuit in [2.44]                                                             |
| Fig. 2.23 The write assist circuit in [2.45]27                                                                |
| Fig. 2.24 The single-port SRAM cells view (a) ST 10T bit-cell [2.26]. (b) P-P-N                               |
| 10T bit-cell [2.27]. (c) single-ended 9T bit-cell [2.28]. (d) differential 10T                                |
| bit-cell [2.29]                                                                                               |
| Fig. 2.25 (a) Read SNM comparison (b) Write mode comparisons [2.26]29                                         |
| Fig. 2.26 (a) Read SNM comparison (b) Write mode comparisons [2.27]30                                         |
| Fig. 2.27 Bit leakage reduction scheme [2.27]                                                                 |
| Fig. 2.28 (a) Write mode comparisons (b) Read SNM comparison [2.28]31                                         |
| Fig. 2.29 write-half-select disturbance reduction [2.28]                                                      |
| Fig. 2.30 timing diagram [2.29]                                                                               |
| Fig. 2.31Differential structure and dynamic DCVSL technique [2.29]32                                          |

| Fig. 2.32 The dual-port SRAM cells view (a) single-ended 8T bit-cell [2.34]. (b) |
|----------------------------------------------------------------------------------|
| single-ended 8T bit-cell [2.35]. (c) single-ended 9T bit-cell [2.36]. (d)        |
| single-ended 10T bit-cell [2.37]33                                               |
| Fig. 2.33 8T SRAM bit-cell circuit and Read noise margin of 8T SRAM cell and     |
| 6T SRAM cell [2.31]                                                              |
| Fig. 2.34 Separation optimization of the read and write paths [2.32]35           |
| Since read stability and write-ability of the 8T SRAM bit-cell could both be     |
| improved, yield at low operating voltages can be enhanced. This improved         |
| variability tolerance can also be translated into improved performance and       |
| power                                                                            |
| Fig. 2.35 (a) 8T SRAM cell utilizing RSCE (b) layout [2.33]35                    |
| Fig. 2.36 Circuitry to eliminate leakage from un-accessed read-buffers. [2.34]36 |
| Fig. 2.37 write operation of 8T SRAM [2.35]                                      |
| Fig. 2.38 floating VVDD during write allows robust write operation [2.36]        |
| Fig. 2.39 Effect of data-independent bit-line leakage. [2.37]                    |
| Fig. 2.40 VGND replica scheme for ideal bit-line sensing margin [2.37]           |
| Fig. 3.1Conventional dual-port SRAM bit-cell                                     |
| Fig. 3.2 The proposed 8T subthreshold SRAM bit-cell                              |
| Fig. 3.3 Schematic of the 10T SRAM bit-cell (UMC 90nm tech.)43                   |
| Fig. 3.4Layout of the 10T SRAM bit-cell (UMC 90nm tech.)                         |
| Fig. 3.5 Published dual-port SRAM bit-cell configurations                        |
| Fig. 3.6 The read operation of the proposed 10T45                                |
| Fig. 3.7 Butterfly curve in read mode                                            |
| Fig. 3.8 distributions of read SNM in MC Simulation                              |
| Fig. 3.9 The write operation of the proposed 10T47                               |
| Fig. 3.10 Distributions of write margin in MC Simulation                         |
| Fig. 3.11 The hold operation and hold SNM of the proposed 10T48                  |
| Fig. 3.12 Simplified bit-line schematic                                          |
| Fig. 3.13 Sensing margin comparisons under worst case column pattern             |
| Fig. 3.14 thin-cell layout (a) conventional DP 8T mincell (b) SE 8T mincell51    |
| Fig. 3.15 thin-cell layout (a) Iso-area conventional DP 8T (b) Iso-area SE 8T52  |
| Fig. 3.16 Read-failure occurrence                                                |
| Fig. 3.17 Read-failure probability comparison                                    |
| Fig. 3.18 Hold-failure occurrence                                                |
| Fig. 3.19 Hold-failure probability comparison55                                  |
| Fig. 3.20 Write-failure occurrence                                               |
| Fig. 3.21 Write-failure probability comparison                                   |
| Fig. 3.22 Read power comparison                                                  |

| Fig. 3.23 Write power comparison                                                               | 59  |
|------------------------------------------------------------------------------------------------|-----|
| Fig. 3.24 leakage power comparison                                                             | 60  |
| Fig. 4.1Block diagram of the wireless body network (WBAN) system wireless                      |     |
| sensor node (WSN)                                                                              | 62  |
| Fig. 4.2 WBAS of Intelligent Sensors for Ambulatory Health Monitoring                          | 63  |
| Fig. 4.3 A FIFO memory and the power proportion of FIFO Memory                                 | 65  |
| Fig. 4.4 Block diagram of proposed FIFO memory                                                 | 66  |
| Fig. 4.5 the proposed 9T SRAM bit-cell                                                         | 67  |
| Fig. 4.6 A FIFO operation example [4.3]                                                        | 68  |
| Fig. 4.7 (a) The adaptive power control system (b) (i) <sub>th</sub> word of storage element . | 69  |
| Fig. 4.8 The block diagram of the counter-based pointer                                        | 70  |
| Fig. 4.9 The synchronous counter                                                               | 70  |
| Fig. 4.10 The read pointer                                                                     | 71  |
| Fig. 4.11 Power consumption comparisons                                                        | 72  |
| Fig. 4.12 The write pointer                                                                    | 72  |
| Fig. 4.13 Write delay in different process corner and temperature                              | 73  |
| Fig. 4.14 The smart replica read/write control unit                                            | 74  |
| Fig. 4.15 Write delay versus BL length                                                         | 75  |
| Fig. 4.16 Block diagram of WSN and CPN for WBAN                                                | 76  |
| Fig. 4.17 The contributions of power reduction from energy-efficient techniques                | .77 |
| Fig. 4.18 Layout of 32kb FIFO                                                                  | 79  |
| Fig. 5.1 System block diagram of DVS FIFO and behavior time line of sub-block                  | k83 |
| Fig. 5.2 The proposed 9T sub/near-threshold SRAM bit-cell                                      | 84  |
| Fig. 5.3 Ion/Ioff ratio, and delay versus channel length                                       | 85  |
| Fig. 5.4 Schematic of the 9T SRAM bit-cell                                                     | 85  |
| Fig. 5.5 Layout of the 9T SRAM bit-cell (TSMC 65 tech.)                                        | 86  |
| Fig. 5.6 Body bias of pMOS of the 9T bit-cell                                                  | 87  |
| Fig. 5.7 write margin and hold SNM in different body bias                                      | 88  |
| Fig. 5.8 Power consumption in different body bias                                              | 88  |
| Fig. 5.9 Hold mode of the 9T SRAM cell                                                         | 89  |
| Fig. 5.10 Hold SNM comparison                                                                  | 90  |
| Fig. 5.11 Read mode of the 9T SRAM cell                                                        | 90  |
| Fig. 5.12 Read SNM comparison                                                                  | 91  |
| Fig. 5.13 Disturbance of read SNM in MC Simulation                                             | 91  |
| Fig. 5.14 Write mode of the 9T SRAM cell                                                       | 93  |
| Fig. 5.15 Distributions of write margin in MC Simulation                                       | 93  |
| Fig. 5.16 The replica column for write window control circuit                                  | 94  |
| Fig. 5.17 The replica column for read window control circuit                                   | 96  |

| Fig. 5.18 Sensing margin comparisons under worst case column pattern                 |
|--------------------------------------------------------------------------------------|
| Fig. 5.19 The convention DVS FIFO and current waveform                               |
| Fig. 5.20 The built-in row-control DVS FIFO and current waveform                     |
| Fig. 5.21 A built-in row control DVS FIFO operation example                          |
| Fig. 5.22 (a) Adaptive power switch control circuitry (b) waveform of the signal 100 |
| Fig. 5.23 FSM of operation mode101                                                   |
| Fig. 5.24 The area overhead and power dissipation in different sub-block size 102    |
| Fig. 5.25 Layout of 2kb proposed FIFO103                                             |
| Fig. 5.26 Power reduction by the proposed DVS FIFO104                                |
| Fig. 5.27 The receiving time and transmitting time of FIFO107                        |
| Fig. 5.28 Highlighting parameters critical for determining energy of conventional    |
| FIFO without DVS107                                                                  |
| Fig. 5.29 Supply voltage of words verse access time for conventional FIFO            |
| without DVS108                                                                       |
| Fig. 5.30 Highlighting parameters critical for determining energy of DVS-BR          |
| FIFO109                                                                              |
| Fig. 5.31 Supply voltage of words verse access time for BR-DVS FIFO110               |
| Fig. 6.1 PVT sensors with built-in row-controlled DVS FIFO in 3D integration . 116   |



# **List of Tables**

| Table 3.1 Subarray area analysis of DP 8T/SE 8T/10T                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | .51 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Table 3.2 Device sizing for various bit-cell topology                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | .52 |
| Table 3.3 V <sub>min</sub> comparison of various bit-cell topology                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | .58 |
| Table 4.1 Pin descriptions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | .66 |
| Table 4.2 Summary of the 16kb FIFO memory and comparison                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | .78 |
| Table 5.1 Comparison of various DP SRAM bit-cells                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | .87 |
| Table 5.2 Operation truth table of the 9T SRAM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | .89 |
| Table 5.3 Summary of the 1kb FIFO memory and comparison Image: Description Image: Descrindescription Image: Description <th< td=""><td>104</td></th<> | 104 |



# Chapter 1 Introduction

## **1.1 Background**

The rapidly growing demand for mobile or highly energy constrained applications to maximize the battery lifetime are driving the need for ultra-low-power circuit design. Various device-circuit-architecture-level techniques have been implemented for ultra-low power consumption [1.1]. The total energy for an operation is as shown in equation (1.1), dynamic dissipation is data dependent, and is proportional to load capacitance ( $C_L$ ) and square of the supply voltage. Because energy is quadratically dependent on power supply voltage, power dissipation can be significant reduction by scaling down the supply voltage.



Fig. 1.1 Trend in minimum energy point of a 32b adder with process scaling using predictive models [1.2]

However, as supply voltage approaches the sub-threshold region, longer propagation

delays lead to a rise in leakage energy per operation, and the leakage power must be integrated over clock period that is raised by slower speed. These opposing trends in active and leakage energy give the optimal supply voltage close to minimum energy point, and the minimum energy pointer occurs below the threshold voltage of the device [1.2] as shown in Fig. 1.1. Aggressive scaling of transistor dimensions has resulted in increased integration density and improved performance with each technology generation. Nevertheless, in deep scaling increased leakage power forms a significant portion of total power dissipation as shown in Fig. 1.1.

## **1.2 Challenges**

As the supply voltage is reduced, the effects of  $I_{ON}/I_{OFF}$  ratio and process variation become the prominent challenges, particularly in deeply scaled technologies. This limits circuit operation in the ultra-low voltage regime, particularly for SRAM cells where minimum sized transistors are often used [1,4]. In ultra-low voltage, drive current of the on devices (I<sub>ON</sub>) degrades several orders of magnitude than that in strong inversion. Correspondingly, the ratio of active to idle leakage currents (I<sub>ON</sub>/I<sub>OFF</sub>) is much reduced. In digital logic, this implies that the idle leakage in the off devices may oppose the on devices, such that the on devices may not pull the output of a logic gate fully to logic level. Moreover, these minimum geometry transistors are vulnerable to inter-die as well as intra-die process variations. Intra-die process variation includes random dopant fluctuation (RDF), line edge roughness (LER). This may lead to threshold voltage mismatch between the adjacent transistors in a memory cell giving asymmetrical characteristics [1.5]. With combining effect of reduced supply voltage along with increased process variations, the sub-threshold SRAM may be leaded to various memory failures such as read failure, hold failure, access time failure and write failure [1.6].

## **1.3 Motivation**

In certain emerging application, such as wireless senor nodes, energy efficiency concerns supersede traditional emphasis on speed. In order to prolong battery lifetime and have long-term stability for a WBAN system, robust and ultra-low power designs are indispensable [1.7]. To achieve ultra-low power consumption, voltage scaling is a popular method to reduce energy in digital circuit due to quadratic saving in the  $CV_{DD}^2$  energy. However, the above challenges in last section are the critical issues in ultra-low voltage region. For addressing these challenges, high static noise margin, high write margin and high sensing margin is required. Thus, the conventional 6T SRAM bit-cell is no longer suitable in ultra-low voltage SRAM design.

Except scaling down the supply voltage, reducing the active power and leakage power of peripheral circuitry in SRAM array can also achieve power reduction. Because this system can be operated at much reduce performance level, power consumption is the major concern than speed. Thus, the energy-efficiency peripheral circuitry are used to replace the conventional peripheral circuitry for more energy saving.

In addition to ultra-low power and reliability, high-speed and making scalability is another primary circuit concerns in implanted biomedical and remote wireless sensing. Power-management has evolved from static custom-hardware optimization to highly dynamic run-time monitoring, assessing, and adapting of hardware performance and energy with precise awareness of the instantaneous application demands. These require ultra-dynamic voltage scaling (UDVS) [1.8], where the energy savings and performance can be optimized on balance.

## **1.4 Thesis Organization**

The remainder of this thesis is organized as follow. Chapter 2 presents the low power SRAM memory form basic introductions to detail circuit design methodologies, including SRAM architecture, power reduction, SRAM stability, conventional low voltage SRAM limitations, and previous well-known low power SRAM designs. In chapter 3 A robust 10T near-/sub-threshold SRAM bit-cell has SNM improvement, write-ability improvement and with minimum operation voltage ( $V_{min}$ ). Chapter 4 presents an energy-efficient low power SRAM-based FIFO memory for WBAN application. In Chapter 5, a built-in row-control dynamic voltage scaling FIFO memory is implemented for WBAN application. Chapter 6 finally concludes this thesis.



# Chapter 2 Previous Low-Power SRAM Designs

## **2.1 Introduction**

Comparing with generic logic, SRAM is simultaneously constrained by the need for very high density, low leakage, high performance, and long-term data retention. It is predicted that the power consumption of SRAM makes them be a key concern in severely energy constrained applications such as wireless implantable biomedical device. As shown in Fig. 2.1, the embedded SRAM consumes 69% of the total processor power.



Fig. 2.1 Three low-power applications culminating in SRAM consuming 69% of the

chip power [2.1]

This chapter begins with the overview of SRAM operation in section 2.2. In section 2.3, the analysis of power dissipation of SRAM circuit and technique for leakage reduction will be shown. In section 2.4, stability issues of SRAM cell, including hold stability, read stability, and write ability will be defined and the impact

of variation on SRAM in low voltage will be presented. In section 2.5 and 2.6 the previous SRAM cell design and peripheral circuit technology will be described.

## 2.2 Overview of SRAM Operation

Fig. 2.2 shows a typical SRAM organization. The storage element is constructed with Z-block of N-row by M-bit array. Row decoder is used to decodes X address bits and select appropriate word-line. Y address bits are decoded by column decoder, and choice appropriate column. Sense amplifiers amplify bit-line swing for data sensing. Read/write circuitry control read/write timing properly.



Fig. 2.2 SRAM organization [2.2]

### 2.2.1 SRAM Column Circuitry



Fig. 2.3 A single-port SRAM column configuration example

Fig. 2.3 shows a single-port SRAM column configuration. The precharge circuit is composed of two precharge pMOSes and one equalizer. It precharges bit-line pair to level high and equalizes bit-line pair before read/write operation. Each column contains a write driver for writing input data and sense amplifier for detecting sensed data. Write driver offers complementary voltage levels to bit-line pair during write

operation. On the other hand, sense amplifier amplifies differential signals of bit-line pair by a common latch type sense amplifier. As soon as the sense amplifier is activated, the cross-coupled inverter pair latches the read data through regenerative feedback.

### 2.2.2 Conventional Symmetric 6T SRAM Bit-Cell

Fig. 2.4 shows the schematic of conventional symmetric 6-transistor (6T) SRAM bit-cell. The bit-cell is constructed by two cross-coupled inverters (*PL*, *NL*, *PR* and *NR*) and two passing transistors, *AXL* and *AXR*, providing read/write access to the bit-cell. A single word-line (*WL*) controls the connection of the bit-lines (*BL* and *BLb*) and the cross-coupled inverters by turning *AXL* and *AXR* on or off respectively.



Fig. 2.4 Conventional symmetric 6T SRAM bit-cell circuit

Fig. 2.5 shows the Thincell layout of conventional symmetric 6-transistor (6T) SRAM bit-cell., SRAM bit-cell layout can be optimized to minimize variability by converting the Thincell active pattern to a straight line layout which eliminates jogs and corners thus increasing reliance on metal interconnect as shown in Fig. 2.6. It

facilitates lithography and reduces sensitivity to overlay errors, which improves mismatch and critical dimension control, thus increasing stability of SRAM bit-cell.



Fig. 2.5 Conventional symmetric 6T SRAM bit-cell layout



Fig. 2.6 (a) Thincell and (b) straight line layout (c) SNM comparison [2.3]

For read operation, *BL* and *BLb* are precharged to high. Then, the *WL* is activated, and one of the bit-line will be pulled down by the cell. For example, in Fig. 2.7, *VL*=0 and VR=1, *BL* will be pulled down through transistors AXL-NL, while BLb stay high. A differential signal is generated on the bit-line pairs, and the sense amplifier at the read output end will detect this small signal and transforms it into full swing voltage.



Fig. 2.7 Read example of 6T SRAM cell.

For write operation, on bit-line is driven high and the other low. Then, WL is turned on, and data on bit-line will overpower the cell content with the new value. For example, in Fig. 2.8, VL=0, VR=1, BL=1, and BLb=0, VL will rise high, and VR will be forced to low.



Fig. 2.8 Write example of 6T SRAM cell.

## **2.3 Power Dissipation**

This chapter begins with an analysis of power dissipation of CMOS circuit and circuit technique for power dissipation. Power dissipation combines with dynamic power ( $P_{dynamic}$ ), leakage power ( $P_{leakage}$ ), and short circuit power ( $P_{short-circuit}$ ). Power

could be expressed as

 $P_{total} = P_{dynamic} + P_{leakage} + P_{short-circuit}$  (2.1)

,where  $P_{dynamic} = \alpha C_L V_{DD}^2 f$ ,  $P_{leakage} = V_{DD} I_{leakage}$ , and  $P_{short-circuit} = I_{mean} V_{DD}$ 

According to the equation above, dynamic power dissipation is proportional to square of supply voltage and both leakage power and short-circuit power are proportional to supply voltage.

### 2.3.1 Dynamic Power

The cause of dynamic power is logic transition of CMOS circuits which charges or discharges its load capacitance and parasitic capacitance ( $C_L$ ). As can be seen in (2.1), the dynamic power dissipation is direct proportion to switching activity factor ( $\alpha$ ), capacitance load ( $C_L$ ), squire of supply voltage ( $V_{DD}^2$ ), and operating frequency (f).

### 2.3.2 Leakage Power



Fig. 2.9 Leakage current of deep-submicron transistors

Leakage power is a significant portion of the total power consumption in modern ICs. Integrated circuits including vast numbers of logics which are not actively

switching still consume power because of leakage currents. Fig. 2.9 shows reverse-biased junction leakage, subthreshold leakage, gate direct-tunneling leakage, injection of hot carriers from substrate to gate oxide, gate induced drain leakage, and punchthrough leakage in the deep scaling transistor [2.4] [2.5].

#### **Junction Leakage**

Leakage in reverse biased transistors includes the effects of carrier generation, related to residual damage density and location relative to the junction boundary, as well as structure and bias dependent effects of gate oxide leakage, band-to-band tunneling at the drain junction. Junction leakage current depends on the area of the drain diffusion and the leakage current density. For low power CMOS technology, high channel and halo doping greatly increase junction leakage.

Gate-induced drain leakage (GIDL) occurs at high electric field between drain and gate terminal. Thinner oxide, higher supply voltage and lightly doped drain structures increase GIDL effect.

#### Subthreshold Leakage

When gate voltage is below the threshold voltage, sub-threshold leakage or weak inversion current occurs between source and drain. For example, an off state inverter, although the  $V_{gs}$  of the NMOS is 0V, there is a light current (leakage) flowing from the drain to source due to the voltage  $V_{DD}$  across  $V_{ds}$ .

Sub-threshold behavior can be modeled physically as show in the following [2.6]

$$I_{ds} = \mu \frac{W}{L} \left(\frac{kT}{q}\right)^2 C_{sth} e^{\frac{V_g - V_T + \eta V_{ds}}{mkT/q}} \left(1 - e^{-\frac{V_{ds}}{kT/q}}\right), m = 1 + \frac{C_{sth}}{C_{ox}}$$
(2.2)

Where *W* and *L* denote the transistor width and length, µdenotes the carrier mobility,  $C_{sth} = C_{dep} = C_{it}$  denotes the summation of the depletion region capacitance and the interface trap capacitance both per unit area of the MOS gate,  $\eta$  is the drain induce barrier lowering (DIBL) coefficient, and  $C_{ox}$  denote the gate input capacitance per unit area of the MOS gate.

Sub-threshold leakage increases exponentially with the reduction of the threshold voltage and DIBL would lower threshold make leakage even worse. On the other hand, sub-threshold can be drop with increasing the threshold voltage. In low power technology we can use high  $V_{\rm th}$  technology transistor to reduce sub-threshold leakage in off state.

#### **Gate Direct Tunneling Leakage**

Ultra-thin gate oxide thickness used for effective gate control in deep scaling CMOS technology. However, the high electric field in the low gate oxide thickness result in directing tunneling of electron from substrate to gate and also from gate to substrate through the gate oxide [2.7]. As seen in Fig. 2.10, the components of tunneling current could be classified in to three categories, edge direct tunneling leakage, gate-to-channel leakage, and gate-to-substrate leakage.



Fig. 2.10 Gate direct tunneling leakage [2.7]

There are new device structures and materials such as high-k metal gate [2.8], double-gate device [2.9] and Fin FET [2.10] for alleviating gate tunneling leakage by as much as an order of magnitude.

#### **Punchthrough Leakage**

Finally, in short-channel devices, due to the proximity of the drain and the source, the depletion regions at the drain-substrate and source-substrate junctions extend into the channel. As the channel length is reduced, if the doping is kept constant, the separation between the depletion region boundaries decreases. An increase in the reverse bias across the junctions (with increase in  $V_{DS}$ ) also pushes the junctions nearer to each other. As the combination of channel length and reverse bias leads to the merging of the depletion regions, punchthrough leakage occurs.

### 2.3.3 Short-Circuit Power

Short circuit power ( $P_{short-circuit}=I_{mean}V_{DD}$ ) is due to nonzero rise and fall time of input waveforms in which a direct path current flowing from the power supply to the ground during the switching of a static CMOS gate.  $I_{mean}$  is the mean value of the short circuit current. Assuming a symmetrical inverter and using simple MOS formula, Short circuit power is modeled as [2.11]

$$P_{short-circuit} = \frac{\beta}{12} (V_{DD} - 2V_{\tau})^3 f\tau \qquad (2.3)$$

where  $\beta$  denotes the gain factor of a transistor, *f* denotes the operating frequency, and  $\tau$  is the input rise/fall time.

### 2.3.4 Low power SRAM design technology

For low power systems, the power delay trade-off is not sufficient to achieve the desired power consumption. Generally, such systems do not require high performance. Hence, other methods are used for reducing power dissipation. The features of these low power techniques are reducing above three components as far as possible.

#### **Dynamic Power Reduction**

To reduce the dynamic power consumption, there are some SRAM design strategies. Firstly, [2.12] [2.13]utilize hierarchical bit-lines with short local bit-lines to reduce bit-line load  $(C_L)$  and with global bit-lines typically resetting to logic low in order to deduce the switching activity factor ( $\alpha$ ). Secondly, in a thin-cell layout (Fig. 2.5) approach, the vertical dimension is determined by the poly pitch while lateral dimension is determined by the device sizing. In general, the SRAM bit-cell area is dominated by the contact and the diffusion spacing. Various industrial minimum-sized 6T bit-cell layouts reveal that only 30%-35% of lateral dimension is used for the contact and diffusion spacing [2.14] [2.15]. The vertical dimension along the bit-line is unchanged, thus bit-line capacitance  $(C_L)$  is minimum (2 poly-pitch) for the bit-cell upsizing. For lower power, in some SRAM design, thirdly, the cell supply  $(V_{CS})$  of SRAM bit-cells and critical peripheral circuits is higher than the other peripheral circuits so that the dynamic power of non-critical peripheral circuits could be reduced. Fourthly, [2.13] presents single-end write-bit-line structure, which reduces the switching activity factor ( $\alpha$ ) to less than "0.5" to diminish the dynamic write power since the most of bits in caches are logic low. Fifthly, [2.16] shows a new low-power SRAM using bit-line Charge Recycling (CR-SRAM) for the write operation. The differential voltage swing of a bit-line is obtained by recycled charge from its adjacent bit-line capacitance, instead of the power line. If we assume that the number of CR bit-line pair is N, all bit-lines have the same capacitances, and i=1, 2 ..., N, the voltage in the bit-line pair becomes  $\left[\frac{2N-2i+1}{2N}\right]V_{DD}$ . The N bit-line pairs consume the power of  $\left(\frac{1}{N}\right) C_{BL} V_{DD}^2$  per clock cycle instead of  $N C_{BL} V_{DD}^2$ , Thus significantly reduces write power.

#### Leakage Power Reduction

Although threshold voltage is reduced to achieve higher drive current and hence

better speed, but the cost is the significantly increasing stand-by power. Hence, to suppress the power consumption in low-voltage circuits, it is necessary to reduce the leakage power.

Transistor stacking is effective in leakage reduction [2.17], so [2.18] use power-gating structure (transistor stacking) to reduce the leakage current of sleep or shut-off SRAM cells. In [2.19], the gate of the word-line driver transistors are left floating, and the voltage level discharged by junction leakage in standby, thus reducing gate leakage. In [2.20], read/write bit-line is left floating in sleep mode without being precharged and the bit-line leakage compensation scheme provides the compensation pull-up current to read bit-line in read operation, which will minimize the leakage current on bit-line. The 7T SRAM cell in [2.21] uses multiple V<sub>t</sub> structure in which high-V<sub>t</sub> devices are utilized to reduce leakage current. In [2.22], additionally, dynamic V<sub>t</sub> technique is employed to increase the pull-down current while reduce the leakage during standby.

In conclude to the low power strategies mentioned above, supply voltage scaling is the most effective way to reduce the total power consumption. [2.23] [2.24]present low power and high performance SRAM which operates at high performance below supply voltage of 1V (with 0.7V/0.5V supply). The energy efficient SRAM in [2.25] lowers the supply voltage to the data retention voltage (DRV) in stand-by mode, so that the leakage power saving are maximized.

Sub-threshold voltage operation has been proven to minimize energy per operation for logic. Therefore, ultra-low voltage SRAM operating in the subthreshold region is recommended. However, variation and current ratio become more critical in ultra-low voltage. Different topologies and certain peripheral assist circuits [2.26] - [2.37] are used to address this challenges and details would be discussed in section 2.6. Though these sub-threshold SRAMs achieve very low power consumption, however, they sacrifice high operating frequencies for lowering the power. An SRAM designed for operating in sub-threshold and super-threshold regions is presented in [2.38]. Reconfigurable circuit assists are used to address ultra-low voltage challenges and to minimize their adverse effect at high-voltage operation.

## 2.4 SRAM Bit-Cell Stability

Reliability has always been a major concern for SRAM bit-cell. As technology and scaling down, process, voltage, temperature (PVT) variations are more non-ignorable particularly in ultra-low supply voltage. Therefore, accurate estimation of SRAM data storage stability in pre-silicon design stage and verification of SRAM stability in the post-silicon testing stage are important steps in SRAM design and test flows. The following of this section will state the most widely adopted SRAM cell stability definition.

## 2.4.1 Static Noise Margin (SNM)

The most common method to measure the stability of SRAM cells is hold/read static noise margin (SNM). Hold static noise margin is defined as the maximum value of static DC voltage noise which can be tolerated by the SRAM bit-cell without flipping the storage node when word-line turns off. Fig. 2.11 shows the standard setup for modeling hold SNM. DC noise sources  $V_N$  are introduced at each of the internal nodes in the bit-cell.

On the other hand, Fig. 2.12 shows the standard setup for modeling Read SNM. Word-line (*WL*) turns on for read access, and bit-line (*BL*) and bit-line-bar (*BLb*) are set to VDD to indicate that the initial value of bit-lines is pre-charged to high. Higher read SNM implies higher noise tolerance of SRAM bit-cell during read operation. Fig. 2.11 and Fig. 2.12 also show the example of butterfly curves during hold and read, revealing the degradation in SNM during read



Fig. 2.11 Standard setup for finding Hold SNM



Fig. 2.12 Standard setup for finding Read SNM

### 2.4.2 Write Margin (WM)

Write margin is defined as  $V_{DD} - MIN[V(WWL)]$ . MIN[V(WWL)] is the minimum write-word-line voltage required for flipping the bit-cell. The higher write margin, the easier the data is written into bit-cell. Fig. 2.13 shows a corresponding example of finding write margin. The write margin is defined as the  $V_{DD}$  - VWL value at the point when VR and VL flip. The write margin value and variation is a function of the

cell design, SRAM array size and process variation. A cell is considered not writeable if the worst-case write margin becomes lower than the ground potential.



Fig. 2.13 Write margin of a SRAM bit-cell

## 2.4.3 Impact of Variation on SRAM in Low Voltage

#### **Differential 6T SRAM**

The 6T bit-cell fails to operate at ultra low voltages because of reduced signal levels and increased sensitivity to random dopant fluctuation. In this configuration, both read and write accesses are opposite making it highly difficult to overcome the severe effect of variation and manufacturing defects. Fig. 2.14 shows the  $\beta$  ratio of 6T SRAM bit-cell and the  $\beta$  ratio conflict will be described afterward.



Fig. 2.14 The  $\beta$  ratio of 6T SRAM bit-cell.

During read access the cell must remain bi-stable to ensure that both data logic value can be held and read without being upset by read disturb that occur at the internal nodes. In order to facilitate read and minimize read disturb, the  $\beta_2$  ratio should be small enough by strong *PD* nMOS and weak *AX* nMOS. During write access the cell should be made mono-stable to write the desired data. For improving writability, the  $\beta_3$  ratio must be large by strong *AX* nMOS and weak *PUP* pMOS.

For improving writability and minimizing read disturb simultaneously, the transistor can be sized as PD > AX > PUP. However, it would degrade the  $\beta_1$  ratio hence the V<sub>TRIP</sub> result in poor read SNM. Therefore, these three  $\beta$  ratios are conflict to each other, simply sizing could not solve 6T SRAM failures.

#### Hold and Read Failure

Hold failure happens if the destruction of the cell content in the standby mode at a low supply voltage. Therefore higher trip point of back-to-back makes the cell easier to flip, thereby increasing the hold failure probability. As shown in Fig. 2.15, it is preserved to very low voltages and will form the basis for several of the ultra-low voltage bit-cell design described in section 2.5 and 2.6.



Fig. 2.15 6T SRAM SNM loss at low voltages [2.39]

If the data stored in an SRAM cell flips during reading, there is a read failure. If the voltage rise at the node storing "0" and higher than the trip point of the back-to-back inverter, then the data stored in the cell would flip over. Fig. 2.15 shows that the 6T SRAM bit-cell fails to operate at low voltages because of reduced signal levels and increased variation. At low voltages, the read SNM is negative, indicating loss of stability.

#### Write Failure

If the data stored in an SRAM cell can't be flip during writing, there is a write failure. While writing "0" to node storing "1," the voltage at the node need to be discharged below the trip point of the back-to-back inverter. As shown in Fig. 2.16, it is also lost at low voltage, where a positive value, in this case, indicates write failures.



Fig. 2.16 6T SRAM write margin [2.39]

#### **Access Failure**

If the voltage difference between the two bit-lines (dual-end) or the voltage drop of the single bit-line (single-end) can't be sensed by the sense amplifier during the access time, there is an access failure. The cause of access failure can be ascribed to read-current degradation and data-dependent bit-line leakage.

The cell read-current,  $I_{READ}$ , is the current sunk from the pre-charged bit-lines

during a read access when the access devices are enabled. At ultra low voltages, we expect a significantly reduced read-current because of the lower gate-drive voltage. However, the increased effect of threshold voltage variation severely degrades the weak cell read-current even further. Fig. 2.17 normalizes the read-current distribution by the mean read-current to highlight just the further degradation due to variation.



Fig. 2.18 I<sub>READ</sub> is less than I<sub>leakage</sub> from un-accessed cells at low voltage. [2.40]

An implied consequence of the reduced read-current is that the aggregate leakage currents from the unaccessed cells on the same bit-lines can make conventional data sensing impossible. Because of the reduced  $I_{ON}$ -to- $I_{OFF}$  ratio and severe degradation from read-current variation, these can exceed the actual read-current of the accessed

cell. Fig. 2.18 shows  $I_{READ}$  / $I_{LEAK,TOT}$  of 256-row SRAM array loss of functionality at low voltages. At ultra-low voltage the bit-line leakage exceeds the read signal, making the accessed data indecipherable.

#### **Half-Select Disturb**

In bit-interleaving architecture, which means that there is more than one word in a word-line, the half-selected 8T cells on the same word-line are experiencing a read operation of 6T SRAM during a write operation as shown in Fig. 2.19, The half-select disturb can usually be eliminate by fated write word-line signal (Byte write).



Fig. 2.19 Half-select disturb during a write operation

## 2.5 Previous Low Voltage SRAM design

Because of process variation and low voltage operation, innovative techniques for improving cell robustness by minimizing read and write failure are vital for deep-scaling SRAM designs. In this section, we will introduce the read/write assist by using peripheral circuit and new cell topology to recover read/write margin.
### 2.5.1 Previous Read/Write Assist Peripheral Circuit

In [2.41] [2.42], as shown in Fig. 2.20, each *WL* is connected to several normally-on replica access transistors which lowers the WL level hence degrade the strength of access transistors. If the *WL* level is lowered, the write operation becomes difficult. So, the capacitive write-assist circuit is proposed to lower the  $V_{CS}$  hence decrease the trip point of cross-couple inverter, which can improve write margin. Noted that  $V_{CS}$  level could not be drop too much, which would result in data retention failure.



Fig. 2.20 The read/write assist circuit in [2.41]

As shown in Fig. 2.21, lower the *WL* voltage to cut down the read disturb are presented in [2.43]. Compare with [2.42] the resistance *R1* is added between *NA* and *MR*, which allows us to suppress the sensitive dependence of *NA* against  $\Delta Rs$ . For write margin improvement, increase the gate-source bias of the access-NMOS is more effective than decrease the drain-source bias of the load-PMOS [2.41] [2.42]. Therefore, it applies negative bias on pull down bit-line in write operation and significantly reduces write failure. Noted that  $\Delta V_{BL}$  should be controlled properly much high  $\Delta V_{BL}$  would make data retention failure.



Fig. 2.21 The read/write assist circuit in [2.43]

In [2.44], in read operation, only a unit column based VSM in an activated

column is forced to negative bias, and the *VSM* is connected to source of the pull down device in the cross point 8T-SRAM as shown in Fig. 2.22. The negative  $V_{SS}$  reduce the read disturb and reduce read delay, which leads to a power reduction.



Fig. 2.22 The read/write assist circuit in [2.44]

In [2.45], a transition negative bit-line, is present to solve write failure. As shown in Fig. 2.23. The Capacitive coupling is used to generate a transient negative voltage at the low-going bit-line during Write operation without using any on-chip or off-chip negative voltage source. This technology show a  $10^3$ X reduction in write failure and marginal reductions in access and disturb failure. Noted that the *BIT\_EN* signal can not arrived too early when  $V_L$  is high the negative change in  $V_{BL}$  is not sufficient to pull- down  $V_L$  to  $V_{TRIP}$ .



Above paper survey, show the various read/write assist circuit to improve read/write margin of 6T SRAM in super-threshold voltage. However, in sub-threshold voltage the robust SRAM bit-cell is needed to replace convention 6T SRAM bit-cell. In next section, we will present the well-know SRAM bit-cell topology for ultra-low voltage operation.

# 2.6 Previous Low Power Single-Port SRAM

As described in section 2.4, the conventional symmetric 6T SRAM has degraded read SNM and write margin when operating at ultra-low voltage due to reduced signal levels and increased variation. Hence, bit-cell designs in ultra-low voltage to possess reliable stability and performance are essential. [2.26] [2.30]improve the read stability and write-ability at ultra-low supply voltage in specific ways.



In Fig. 2.24, there are four single-port SRAM cells design for ultra-low voltage.

Fig. 2.24 The single-port SRAM cells view (a) ST 10T bit-cell [2.26]. (b) P-P-N 10T bit-cell [2.27]. (c) single-ended 9T bit-cell [2.28]. (d) differential 10T bit-cell [2.29].

#### Schmitt Trigger based subthreshold SRAM [2.26]

Fig. 2.24(a) shows the Schmitt Trigger-2 (ST-2) based SRAM bit-cell [2.26]. In ST-2 bit-cell, feedback is provided by additional *WL* signal where, as in ST-1 in [2.30], the feedback is provided by the internal storage node. Due to the reduced number of series connected transistors in the read path, the read-disturb voltage (node  $V_L$ ) in ST-2 is lower than ST-1. Moreover, ST-2 employs a separate *WL* signal for stronger feedback, resulting in better read stability compared to ST-1 (Fig. 2.25 (a)). The combined effect of additional pull down path for node storing '1' and increased voltage at node storing '0' results in higher write-trip-point in ST-2 compared to the ST-1 (Fig. 2.25(b)).



Fig. 2.25 (a) Read SNM comparison (b) Write mode comparisons [2.26]

#### P-P-N Based 10T Subthreshold SRAM [2.27]

The nodes between the two cascaded P-MOS transistors are called *pseudo storage nodes* in the Fig. 2.24 (b), which are separate from the true storage nodes, can address read disturbance problem. As shown in **1896** 

Fig. 2.26(a), comparing with [2.30], the read SNM is improved by 1.5X. According to the Reverse short channel effect (RSCE), the sizes of transistors *PGL*, *PGR* are 3 times to minimum channel length and *PUL* and *PUR* are 3 times to minimum channel length as show in Fig. 2.24(b). Overall, the WM is significantly improved.



#### Fig. 2.26 (a) Read SNM comparison (b) Write mode comparisons [2.27]

This 10T cell can work with bit-interleaving structure naturally using only single word-line. As shown in Fig. 2.27, by keeping the voltage of the *VGND* line of each unaccessed cell to *VDD*. In this way, the potential leaking path from bit-line to the *VGND* is turned off.



#### Single-Ended 9T Subthreshold SRAM [2.28]

Fig. 2.24(c) shows the single-ended 9T bit-cell. In Fig. 2.28(a), it enlarges WM by cutting off the positive feedback loop of inverter pair with on peripheral assistant circuit. MTCMOS design in the bit-cell delivers benefits of saving leakage and increasing WM/SNM as show in Fig. 2.28(b). RSCE is utilized in access and buffer transistors to lesson subthreshold voltage variation and improves on-off current ratio leading to higher performance.



Fig. 2.28 (a) Write mode comparisons (b) Read SNM comparison [2.28]

In addition, the bit-interleaving scheme of the proposed 9T SRAM addresses the issues of write-half-select disturbance and achieves soft error tolerance as shown in Fig. 2.29.



Fig. 2.29 write-half-select disturbance reduction [2.28]

#### Differential 10T Subthreshold SRAM [2.29]

A 10T subthreshold SRAM with efficient bit-interleaving for soft-error immunity and fully differential read for better stability is shown in Fig. 2.24(d). During read operation, disabled  $W_WL$  makes the storage nodes isolated from the bit-lines, which eliminates read disturb noise. During write operation, both column-based *WL* and row-based *WWL* are boosted to compensate weak write-ability (Fig. 2.30). Only the selected bit-cells turn on all access transistors; therefore, the half-select disturb is eliminated.



Instead of single-end read structure, the fully differential read scheme improves the bit-line noise immunity significantly during read operation. To further increase leakage tolerance the dynamic DCVSL technique is employed (Fig. 2.31). The discharge of *BL* turns on keeper M2 and bit-line leakage current in *BLB* is compensated by the drive current of M2.



Fig. 2.31Differential structure and dynamic DCVSL technique [2.29]



# 2.7 Previous Low Voltage Dual-Port SRAM

Fig. 2.32 The dual-port SRAM cells view (a) single-ended 8T bit-cell [2.34]. (b) single-ended 8T bit-cell [2.35]. (c) single-ended 9T bit-cell [2.36]. (d) single-ended 10T bit-cell [2.37].

The embedded dual-port SRAM can increase the internal memory access speed. With SP-SRAM, two functional units must access in series because there is only one-port accessibility. Consequently, two clock cycles are required. On the other hand, both functional units can access a DP-SRAM block simultaneously within a cycle. The one-read/one-write (1R1W) type DP-SRAM cell, in which only one of the two ports is allowed for read operation [2.46]. This 1R1W memory cell has stable read operation, though its single-end read-bit-line structure might have an impact of access-time degradation unfortunately because of large amplitude RBL swing. In Fig. 2.32, the following are the dual-port cells for read-disturb elimination, leakage reduction and write-margin improvement in ultra-low voltage. [2.31]-[2.36]

#### Single-ended 8T SRAM [2.31]-[2.34]

In order to settle the critical read SNM problem of 6T SRAM bit-cells, in Fig. 2.33 an 8T SRAM bit-cell adds two transistors (read-buffer), *RA* and *RD*, to a conventional 6T SRAM bit-cell. The read buffer provides a read mechanism that does not disturb the storage nodes of the bit-cell, thereby eliminating the worst-case stability condition. The original word-line (*WLw*) of the 6T bit-cell is used exclusively for write operation while a second read-word-line (*WLr*) is tied to the read-buffer.



Fig. 2.33 8T SRAM bit-cell circuit and Read noise margin of 8T SRAM cell and 6T SRAM cell [2.31]

Without read-disturb concerns, there is no limit to the  $\beta$  ratio in an 8T SRAM bit-cell. Therefore, 8T cells can simultaneously improve both stability and write-ability yields. These techniques may, however, raise more cell leakage and add the cell size. To diminish the leakage, the write-bit-lines (*BLLw* and *BLRw*) can be allowed to float, potentially reducing the leakage current flows through access transistors (*AXL* and *AXR*). The layout overhead area of 8T cell is approximately 20%~30% over that of 6T

cell. In Fig. 2.34, the dual-port nature of the 8T SRAM bit-cell enables separating optimizations of the read (e.g., 8 bits per read bit line) and write paths (512 bits per write bit line) and simple local evaluation circuitry [2.32].



Fig. 2.34 Separation optimization of the read and write paths [2.32]

Since read stability and write-ability of the 8T SRAM bit-cell could both be improved, yield at low operating voltages can be enhanced. This improved variability tolerance can also be translated into improved performance and power.



Fig. 2.35 (a) 8T SRAM cell utilizing RSCE (b) layout [2.33]

As shown in Fig. 2.35, [2.33]presents a technique for improving write margin and read performance of 8T subthreshold SRAM by using long channel devices to utilize the pronounced reverse short channel effect (RSCE). The longer channel length of

write access transistors (M3 and M6) increases the write path current drivability due to the reduced threshold voltage and the exponential increase in drive current. The read current and its process tolerance are also improved by using longer channel length transistors (M7 and M8) for read-buffer.

In Fig. 2.36, [2.34] utilizes peripheral circuits to enhance the robustness and stability of 8T subthreshold SRAM. Peripheral footer circuitry eliminates bit-line leakage; peripheral charge-pumps ensure buffer-foot drivers do not limit  $I_{READ}$ ; peripheral write drivers and storage-cell supply drivers interact to reduce the cell supply voltage during write operations; and sense-amp redundancy provides a favorable trade-off between offset and area.



Fig. 2.36 Circuitry to eliminate leakage from un-accessed read-buffers. [2.34]

#### Single-ended 8T SRAM [2.35]

Fig. 2.32 (b) shows the single-ended 8T SRAM cell. In write operation, the transistor *MNP* cuts-off the positive feedback loop of back-to-back inverter so that the write margin is improved at all as shown in Fig. 2.37. Besides, the access transistor

(*MNA*) is utilizing RSCE to further improve write margin. In order to remove the problem of bit-line leakage, the buffer-footer of unsccessed cells are pulled up to '1' and then the leakage current flowing from the bit-line can be restrained.



Fig. 2.37 write operation of 8T SRAM [2.35]

#### Single-ended 10T SRAM [2.36]

Though the 8T subthreshold SRAM bit-cell buffers the read path to eliminate read disturb noise by read-buffer, read-bit-line leakage, which decreases sensing margin, is still a critical issue. In Fig. 2.32(c), a 10T bit-cell employs leakage reduction read-buffer to remove the problem of read SNM, the 10T bit-cell cost ~66% overhead area over conventional 6T SRAM bit-cell and consume leakage power as well

As shown in Fig. 2.32(c), *M10* significantly reduces leakage power relative to the case where it is excluded. In addition to read buffer leakage reduction, the virtual supply ( $VV_{DD}$ ) to the selected bit-cells floats during the write operation to improve write-ability in the subthreshold region as shown in Fig. 2.38.



Fig. 2.38 floating VVDD during write allows robust write operation [2.36]

#### Single-ended 10T SRAM [2.37]

As show in Fig. 2.32(d), the 10T SRAM bit-cell employs two transistors on read-buffer for leakage reduction. When read is disable, the node *A* is charged to VDD, regardless of the storage node. This data-independent read-bit-line leakage scheme enables long bit-lines as shown in Fig. 2.39. For the robust high-density subthreshold SRAM, particular techniques and peripheral circuits are employed, including utilizing RSCE for write margin improvement.

In Fig. 2.40, virtual ground replica scheme are employed for improving bit-line sensing margin, write back scheme for data preservation during write, and optimal gate sizing based on subthreshold logical effort.



Fig. 2.39 Effect of data-independent bit-line leakage. [2.37]



Fig. 2.40 VGND replica scheme for ideal bit-line sensing margin [2.37]

# 2.8 Summary

For power reduction, reducing supply voltage is the most popular method in modern low-power IC design. Aggressive scaling of the supply voltage makes SRAM to fail with low read current and low SNM and the PVT variation degrades the margin even more. In order to address these issues, the low-power SRAM design technology and ultra-low voltage SRAM designs considerations are present recently. In the ahead of this chapter, SRAM architecture and peripheral circuitries are introduced. Secondly, dynamic, leakage, and short-circuit power dissipation of deep-submicron circuits are described and some of the power reduction methodologies of SRAM are distinctly surveyed. Thirdly, SRAM stability and the limitation of the conventional 6T SRAM at low supply voltage are then illustrated. Finally, some SRAM read/write assist circuitries and robust SRAM topology are described to increase noise margin and recovery reliability in low voltage and ultra-low voltage.

# Chapter 3 A Near-/Sub-threshold 10T SRAM Cell Design

# **3.1 Introduction**

In this chapter, A near-/sub-threshold 10 transistors (10T) SRAM bit-cell is proposed. The 10T SRAM bit-cell improve write-ability, eliminate read disturb and reduce data-dependent bit-line leakage in ultra-low voltage.

## 3.1.1 Conventional Dual-Port SRAM Bit-Cell

Fig. 3.1Conventional dual-port SRAM bit-cell shows a conventional dual-port (DP) SRAM bit-cell which adds two access transistors into conventional 6T SRAM bit-cell to separate the read/write path of the storage node so that it has the ability to simultaneously read and write. Read and write operation of DP SRAM bit-cell is the same as 6T SRAM bit-cell, but extra peripheral circuitry is needed to support the dual-port structure.



Fig. 3.1Conventional dual-port SRAM bit-cell

#### **3.1.2 Conventional Dual-Port SRAM limitation**

As technology and supply voltage scaling down, the conventional DP SRAM fails to maintain reliable operations. The convention DP SRAM has almost the same issues in 6T SRAM bit-cell.

Exponential effect of threshold voltage variation, reduction of signal level, and degradation of  $I_{on}$ - $I_{off}$ -ratio are critical issues of sub/near-threshold circuitry. In detail, processing variation causes sideways offsets [3.1]. Threshold voltage shifts due to random dopant fluctuations, line-edge roughness, and local oxide thickness variations [3.2]. Furthermore, the reduction of signal level directly hurts the noise margin of logics. The degradation of the  $I_{on}$ - $I_{off}$ -ratio limits the sharing elements of the array logic such as the memory element. The combined effect of low supply voltage and process variation results in memory operation failure such as read disturb, write failure, and bit-line leakage. An example to address the local variation is to enlarge the sizes of transistors at a cost of larger leakage and capacitance [3.3].

To solve above mentioned problems, several effective techniques have been proposed. Read buffer could eliminate read-disturb [3.4]-[3.6]. Write-ability could be improved by lowing cell supply voltage ( $V_{CS}$ ) [3.7], or by boosting the write-word-line voltage for the access transistors [3.8], or by applying negative voltage on write-bit-line [3.9]. Bit-line leakage could be reduced by changing bit-cell topologies [3.5] and [3.6], or by pulling the feet of all the unaccessed read-buffers up to  $V_{DD}$  [3.10] to mitigate bit-line leakage. However, the above techniques require additional peripheral circuitries and overhead power.

## 3.2 A Near-/Sub-threshold 10T SRAM Bit-Cell



Fig. 3.2 The proposed 8T subthreshold SRAM bit-cell

A robust dual-port 10T SRAM bit-cell (Fig. 3.2) is proposed to be the storage element of ULP FIFO memory. It improves read static noise margin (SNM), and reduces write variations. It also reduces bit-line leakage in near-/sub-threshold voltage regime. With single-ended write port scheme, the proposed SRAM bit-cell can reduces leakage and switch power during write operation [3.11]. Therefore, the proposed SRAM bit-cell only contains one write-bit-line (*WBL*).

The proposed SRAM bit-cell consists of a cross-coupled inverter pair, a write access transistor (*MN1*), a pass transistor (*MP1*), and a decoupled read-out structure (*MP2*, *MN2*, *MN3*, *MN4*). The *MP1* is utilized to cut-off feedback loop of the inverter pairs, and eliminates the voltage dividing effect between *MN1* and inverter *B* during write operation. In order to reduce leakage currents, all of the MOSFETs are high-V<sub>t</sub> devices except *MN1* and *MP1*. The regular V<sub>t</sub> device can reduce V<sub>t</sub> loss through *MN1* and *MP1* to improve the hold SNM and write margin.

# **3.2.1 Layout Consideration**



Fig. 3.3 Schematic of the 10T SRAM bit-cell (UMC 90nm tech.)



Fig. 3.4Layout of the 10T SRAM bit-cell (UMC 90nm tech.)

For improving mismatch and dimension control, the proposed 10T SRAM bit-cell regular layout is design as "straight line layout" which can facilitate lithography and reduce sensitivity to overlay errors. As show in Fig. 3.4, the cell layout is shaped in thin cell; therefore, the length bitline could be shorter so that the equivalent RC value could be decrease.

The cell is design in UMC 90nm standard process technology. Four metal layers are utilized in the bit-cell layout. *VDD* and *GND* are routed in second and third metal layer, Read/Write bitlines (*RBL/WBL*) are routed in third metal layer and read/write word-lines (*RWL/WWL*) are routed in fourth metal layer.

# **3.3 Dual-Port SRAM Bit-cell Analysis**

In the previous chapter several SRAM bit-cells have been proposed having different design propose. Fig. 3.5 list the SRAM bit-cells having dual-port design.



Fig. 3.5 Published dual-port SRAM bit-cell configurations

## 3.3.1 Read ability Improvement

Fig. 3.6 shows the read operation of the proposed 10T. In read mode, read-word-line (*RWL*) is "High". Read-bit-line (*RBL*) is precharged to  $V_{DD}$  before the cell is accessed. Then, *MP2* is turned off; *MN2* and *MN4* are turned on. Depending on the cell data, the *RBL* is conditionally discharge to GND through *MN2*, *MN3* and *MN4*. Therefore, the proposed SRAM bit-cell can keep the storage away from disturb noise and enlarge the read SNM as large as the hold SNM.



Fig. 3.6 The read operation of the proposed 10T

Fig. 3.7 (a) shows the read SNM of proposed SRAM bit-cell, convention DP 8T, and other state-of-the-art SRAMs [3.4]-[3.6] in ultra-low supply voltage. The proposed SRAM bit-cell has much better RSNM than convention DP 8T.



Fig. 3.8 shows the distribution of read SNM in Monte Carlo simulation (100,000 times). Although, comparing to [3.4], [3.6], due to the *MP1*, the proposed 10T SRAM bit-cell has minor SNM drop ( $\triangle \mu$ =18mV,  $\triangle \sigma$ =6.3mV), the proposed 10T bit-cell has 1.9X read SNM and better variation immunity comparing with conventional DP 8T bit-cell.



#### 3.3.2 Write-ability improvement

Fig. 3.9 shows the write operation of the proposed 10T. In write mode, write-word-line (*WWL*) is "high". The *WBL* is precharged to  $V_{DD}$  before the cell is accessed. *WWL* turn on *MN1* and turn off *MP1* simultaneously, so the writing in data passes through *MN1*, inverter *A* and *B* to the node *VC*. This scheme can cut off the positive feedback loop of inverter pairs. The proposed SRAM bit-cell enlarges write margin without any peripheral circuit especially in near-/sub-threshold regions.



Fig. 3.9 The write operation of the proposed 10T

Fig. 3.10 shows the write margin distribution of proposed single-ended write-port 10T bit-cell writing "0" and writing "1" and the other dual-ended write-port bit-cell at supply voltage of 0.4V in Monte Carlo simulation (100,000 times). Noted that write margin is defined as the minimum word-line voltage required to flip the cell data. It shows that the proposed SRAM bit-cell has 3.2X write margin and better variation immunity comparing with other SRAMs.



Fig. 3.10 Distributions of write margin in MC Simulation

## 3.3.3 Bit-line Leakage Reduction

Fig. 3.11 shows the hold operation and hold SNM of the proposed 10T. In hold mode, the proposed SRAM bit-cell eliminates the data-dependent bit-line leakage by

turning on *MP2*. The drain voltage of *MP2* becomes  $V_{DD}$  and forces the leakage current to flow from the cell into *RBL* regardless of the cell data. This scheme can reduce bit-line leakage current in SRAM cell itself.



Fig. 3.11 The hold operation and hold SNM of the proposed 10T

Fig. 3.12 shows the simplified schematic of proposed bit-line with data-independent bit-line leakage current. The logic low is decided by the balance between the pull up leakage current of unaccessed cells and the pull down read current of the accessed cells. The logic high level is close to *VDD* because both bit-line leakage current and cell current are pulling up the *RBL*. Consequently, the sensing margin is improved significantly especially in high temperature.



our proposed 10T SRAM bit-cell has 6% better temperature variation tolerance.



Fig. 3.13 Sensing margin comparisons under worst case column pattern

# 3.4 Dual-Port SRAM Bit-Cell V<sub>MIN</sub> Analysis

## 3.4.1 Iso-Area Bit-cells

Comparing with the conventional dual-port (DP) 8T and single-ended (SE) 8T [3.4], the proposed 10T bit-cell consumes 1.8X and 1.98X area. In order to estimate the minimum supply voltage  $V_{min}$  in fair condition, comparing the bit-cells with iso-area is the best method. Fig. 3.14 shows the thin-cell illustrate layout of the conventional DP 8T mincell and the SE 8T mincell. Consideration with the bit-line capacitance, the vertical dimension along the bit-line is unchanged (2poly-pitch) for the bit-cell upsizing. Any upsizing is realized in the lateral direction by increase word-line capacitance. However, only one word-line is active during a read/write operation. Hence, this approach of bit-cell upsizing would result in minimal increase in power dissipation. However, due to the difference area efficiency in these bit-cells [3.10] [3.12], the iso-area bit-cell should be evaluated at iso-subarray area consideration. Table 3.1shows the split-up of total subarray area containing DP 8T/SE 8T/proposed 10T bit-cells.



(a)



Fig. 3.14 thin-cell layout (a) conventional DP 8T mincell (b) SE 8T mincell

|                          | DP 8T | SE 8T  | This Work |
|--------------------------|-------|--------|-----------|
| Normalized Bit-cell area | 1     | 0.91   | 1.8       |
| Total bit-cell area      | Ν     | 0.91N  | 1.8N      |
| Array efficiency         | 70%   | 50%    | 85%       |
| Peripheral circuit area  | 0.43N | 0.91N  | 0.32N     |
| Total sub-array area     | 1.43N | 1.82N  | 2.12N     |
| Iso-subarray area        | 1.69  | 8 1.21 | 1.8       |

Table 3.1 Subarray area analysis of DP 8T/SE 8T/10T

The array efficiency for differential sensing, SE 8T and single sensing is assumed to be 70%, 50% and 85% respectively. As shown in Table 3.1 the iso-subarray area of DP 8T can be upsized by (2.12N-0.43N)/N=1.69. On the other hand, the iso-subarray area of SE 8T can be upsized by (2.12N-0.91N)/N=1.21. Consequently, with the same iso-subarray area, the DP 8T and SE 8T can be upsized by 1.69 and 1.21, respectively.

In conventional DP 8T and SE 8T bit-cell, the mincell device width is 200, 200, 200 and 400nm for pull-up/write-access/read-access/pull-down transistors, respectively. These bit-cells can be upsize separately for write-stable and read-stable. As shown in Fig. 3.15 (a), for DP 8T, the write-access transistors are upsized by 4.5X, and the others transistor are upsized by 2X simultaneously. As shown in Fig. 3.15 (b), for SE 8T, the write-access transistors are upsized by 4X, read access transistors are

upsized by 2X. Single-ended sensing 10T bit-cells occupy 1.4X area compared with the DP 8T mincell area [3.5], [3.6], Hence, the write-access transistor by 3X for iso-area comparison.

Table 3.2 lists device sizing for various bit-cell topologies and the upsized bit-cells. The devices in bit-cells are denominated in Fig. 3.5.



Fig. 3.15 thin-cell layout (a) Iso-area conventional DP 8T (b) Iso-area SE 8T

| Bit-cell topology | NB/<br>NA | PB/<br>PA | AXR1/<br>AXR2 | AXW1/<br>AXW2 | P1  | P2  | N1  | N2  | N3  | N4  |
|-------------------|-----------|-----------|---------------|---------------|-----|-----|-----|-----|-----|-----|
| DP 8T (1X area)   | 400       | 200       | 200           | 200           | -   | -   | -   | -   | -   | -   |
| DP 8T (Iso-area)  | 800       | 400       | 400           | 900           | -   | -   | -   | -   | -   | -   |
| SE 8T (1X area)   | 400       | 200       | 200           | 200           | -   | -   | -   | -   | -   | -   |
| SE 8T (Iso-area)  | 400       | 200       | 400           | 800           | -   | -   | -   | -   | -   | -   |
| SE 10T (1X area)  | 400       | 200       | -             | 200           | -   | 200 | -   | 200 | 200 | 200 |
| SE 10T (Iso-area) | 400       | 200       | -             | 600           | -   | 200 | -   | 200 | 200 | 200 |
| This Work         | 200       | 200       | -             | _             | 200 | 200 | 200 | 200 | 200 | 200 |

Table 3.2 Device sizing for various bit-cell topology

## 3.4.2 Read-Failure Probability

Read static noise margin (SNM) is used to quantify the read stability of the SRAM bit-cells. Read-failure probability ( $P_{read-fail}$ ) is estimated as  $P_{read-fail} = Prob$ . (read SNM < kT). As shown in Fig. 3.16, if SNM is lower than thermal voltage(kT = 26mV at 300K), the storage data would be flipped by noise. Read-V<sub>min</sub> is determined at the 3-sigma read-failure probability  $P_{read-fail}=1e-5$ .



Fig. 3.17 plots read-failure probability versus supply voltage for various bit-cells. It shows that due to the read disturb noise free scheme, the read-stability of SE 8T bit-cells is as same as the hold-mode stability and has the lowest read-failure probability. The proposed 10T bit-cell has a pass transistor (*MP1*) between the inverter pairs therefore the proposed 10T has higher read-failure probability compared with the SE 8T. Nevertheless, the proposed 10T has much lower read-failure probability compared with the DP 8T.



Similar to the read stability case, hold stability is estimated by computing the hold SNM. Hold-failure probability ( $P_{hold-fail}$ ) is estimated as  $P_{hold-fail} = Prob$ . (hold SNM < kT). As shown in Fig. 3.18, if SNM is lower than thermal voltage(kT = 26mV at 300K), the storage data would be flipped by noise. Hold-V<sub>min</sub> is determined at the 3-sigma read-failure probability  $P_{hold-fail}=1e-5$ .



Fig. 3.18 Hold-failure occurrence

As shown in Fig. 3.19, it is observed that upsizing DP 8T bit-cell can earn robust inverter characteristics. For this reason, the iso-area DP 8T bit-cell gives lower hold failure probability and lower hold  $V_{min}$  compare to minimum sized DP 8T. As same as the read failure probability, the hold-failure probability of proposed 10T has some rise because of the pass transistor (*MP1*).



Fig. 3.19 Hold-failure probability comparison

## 3.4.4 Write-Failure Probability

Write-ability of a bit-cell gives an indication of how easy to write data in to bit-cell. In this part, write margin is defined as  $V_{DD} - MIN[V(WWL)]$ . MIN[V(WWL)] is the minimum write-word-line voltage required for flipping the bit-cell. The higher write margin, the easier the data is written into bit-cell. Therefore, as shown in Fig. 3.20 write-failure probability (P<sub>write-fail</sub>) is calculated as P<sub>write-fail</sub> = Prob. (write margin < 0V). It means that if MIN[V(WWL)] larger than V<sub>DD</sub>, the bit-cell would not be flipped due to IR drop on WWL. Write-V<sub>min</sub> is determined at the 3-sigma write-failure probability P<sub>read-fail</sub>=1e-5.



As shown inFig. 3.21, the write operation of proposed 10T can separate into write "0" and write "1", and the write "0" is the worst case in this condition. Thus the write  $V_{min}$  of the proposed 10T is depending on write "0". It is observed that upsizing access transistors of DP 8T, SE 8T, and SE 10T bit-cell can earn lower write failure probability and lower write  $V_{min}$  than the mincell of them. However, even the access transistors of DP 8T are upsized by 4.5X, the write  $V_{min}$  is much higher than the write  $V_{min}$  of proposed 10T. Because of the cut off scheme, the proposed 10T can reach the lowest  $V_{min}$  among the others mincell and iso-area bit-cell.



Fig. 3.21 Write-failure probability comparison

# 3.4.5 Iso-Area V<sub>min</sub> Comparison<sup>996</sup>

Table 3.3 compares the estimated  $V_{min}$  for various bit-cell topologies under iso-area condition. Vmin of a bit-cell is determines as

$$V_{\min} = MAX[Read V_{\min}, Write V_{\min}, Hold V_{\min}].$$

As seen in Table 3.3, the proposed 10T shows the lowest  $V_{min}$  of 398mV. Added read buffer eliminates read disturb keep the read SNM as same as hold SNM. In addition, cutting off positive feedback loop of inverter pairs, improve write margin significantly. Despite the pass transistor (*MP1*) reduce hold SNM hence read SNM, the DTCMOS technology is employed to cut down the impact of hold SNM degradation. In this bit-cell analysis, without any assist circuit, the read failure, write failure, and access failure are solve. It is believe that this cell topology can be applied to the bit-cell topology for further V<sub>min</sub> reduction.

| Bitcell Topology           | Read V <sub>min</sub> | Write V <sub>min</sub> | Hold V <sub>min</sub> | V <sub>min</sub> (mV) |
|----------------------------|-----------------------|------------------------|-----------------------|-----------------------|
|                            | ( <b>mV</b> )         | ( <b>mV</b> )          | (mV)                  |                       |
| DP 8T Mincell              | 788                   | 746                    | 334                   | 788                   |
| DP 8T bit-cell (Iso-area)  | 574                   | 508                    | 250                   | 574                   |
| SE 8T Mincell [3.4]        | 334                   | 746                    | 334                   | 746                   |
| SE 8T bit-cell (Iso-area)  | 334                   | 511                    | 334                   | 511                   |
| SE 10T Mincell [3.5]       | 334                   | 746                    | 334                   | 746                   |
| SE 10T bit-cell (Iso-area) | 334                   | 550                    | 334                   | 550                   |
| SE 10T Mincell [3.6]       | 334                   | 746                    | 334                   | 746                   |
| SE 10T bit-cell (Iso-area) | 334                   | 550                    | 334                   | 550                   |
| This Work                  | 398                   | 335                    | 398                   | 398                   |

Table 3.3 V<sub>min</sub> comparison of various bit-cell topology

# **3.5 Power Consumption**

Power consumption is an important parameter in the bit-cell design. In this part, the dual-port bit-cell operation power and leakage power of DP 8T /SE 8T [3.4]/SE 10T [3.5] /10T [3.6]/proposed 10T iso-area bit-cell are compared and discussed.

896

# 3.5.1 Read Power Consumption

Fig. 3.22 shows the normalized read power versus supply voltage under 0.5V with the proposed 10T SRAM bit-cell and the others. It is observed that the SE 10T [3.6] bit-cell consume the lowest read power, however it would have data-dependent bit-line leakage in hold mode [3.5] and degrade sensing margin. The conventional DP 8T bit-cell consumes the maximum read power among the read-buffered SRAMs, since the Dual-ended read-port SRAM consume more power in bit-line swing. Compared with the DP 8T bit-cells, the others bit-cell with single-ended read-port consumes less power in read operation. Thus, single bit-line structure is able to achieve read power reduction in low voltage SRAM design.



Fig. 3.22 Read power comparison

## 3.5.2 Write Power Consumption

Fig. 3.23 shows the normalized write power versus supply voltage under 0.5V with the proposed 10T bit-cell and the others. The conventional DP 8T, SE 8T [3.4] and the SE 10T bit-cells in [3.5] and [3.6] have the same write scheme with different size of write access transistors. The proposed 10T bit-cell consumes overhead power to cut off the positive feedback inverter pair of the storage node and the leakage of regular  $V_t$  device of write-access transistor.



Fig. 3.23 Write power comparison
In sub-threshold region, due to the write-bit-line leakage dominated, the proposed 10T consumes more write power than the other bit-cells. However, in the near-/super-threshold region, the single-ended write-bit-line scheme of the proposed 8T SRAM has the advantage over a lower active factor due to decrease the write-bit-line charge and discharge probability. [3.12] describes more detail about the relation between activity factor and power saving of single-ended write-bit-line SRAM array scheme.

#### **3.5.3 Leakage Power Consumption**

Fig. 3.24 shows the comparison of leakage power versus ultra-low supply voltage with the proposed 8T SRAM and the others. Since the proposed 8T SRAM bit-cell has single-end write-bit-line and single-end read-bit-line, which lessens the leakage path to VDD or ground. Although, when the *VR* is in "0" state the  $V_t$  of the pass transistor (*MP1*) result in some raised voltage in *VR* and increasing leakage power consumption. The higher leakage does not have significantly impact leakage power consumption. Thus, the proposed 10T bit-cell has single-ended read/write port, which lessens the leakage path, and it consumes the leakage power among the other bit-cell design.



Fig. 3.24 leakage power comparison

## **3.6 Summary**

An ultra-low power 10T near-/sub-threshold dual-port SRAM bit-cell is proposed to improve write-ability, reduce write variation and eliminate the data-dependent bit-line leakage. At ultra-low supply voltage, the proposed 10T bit-cell has 3.2Ximprovement in write margin, better write-variation immunity and 1.9X improvement in read SNM compared to conventional dual-port SRAM. Comparing with the presented dual-port SRAM bit-cell in the iso-area condition, the proposed 10T bit-cell is predicted to have the lowest V<sub>min</sub> in 90-nm technology Monte Carlo simulations. The proposed 10T SRAM has least write power consumption in near-threshold voltage and least leakage power consumption in ultra-low voltage.



# Chapter 4 A 16Kb Near-threshold SRAM-Based FIFO in 90nm CMOS for WBANs

## **4.1 Introduction**

First-in first-out (FIFO) memory is commonly used for data buffers and flow control in many SoC applications. As example is the emerging wireless body area network (WBAN), a breakthrough personal healthcare technology for body condition monitoring and diagnosis. Due to limited energy source and long-term stability requirement, robust ultra-low power designs are indispensable for a WBAN system [4.1]. As shown in , a major component of the system wireless sensor node (WSN) is a FIFO memory, which dominates the total die area and power [4.2].Therefore, reducing power consumption of the FIFO memory is an urgent design consideration for optimal WBAN. For the consideration of high density and low power, dual-port SRAM based FIFO is applicable [4.3].



Fig. 4.1Block diagram of the wireless body network (WBAN) system wireless sensor

node (WSN)

Due to loose timing constraint of the wireless sensor node, sub/near-threshold supply voltage is suggested to gain ultra-low power operation. Moreover, voltage scaling is an effective method to reduce energy in digital circuit due to quadratic saving in the  $CV_{DD}^2$  energy [4.4]. However, as supply voltage scales down, CMOS circuit becomes sensitive to noise. Stability issue is especially important for storage elements that operate under ultra-low voltage. Thus, a robust ultra-low voltage SRAM design has been discuss in previous chapters to achieve minimum power consumption.

### 4.1.1 Wireless Sensor Node for WBANs



Fig. 4.2 WBAS of Intelligent Sensors for Ambulatory Health Monitoring

Wireless body area network (WBAN) is one of the most suitable technology for building unobtrusive, scalable, and robust wearable health monitoring systems [4.5]. It consists of a set of mobile and compact intercommunicating sensors, either wearable or implanted into human body, which monitor vital body parameters and movements. These devices, communicating through wireless technologies, transmit data from the body to a home base station, from which the data can be forwarded to a hospital, clinic or cloud computer, real time. Fig. 4.2 shows the generic concept of wireless body area network of intelligent sensors for patient monitoring [4.6].

The sensors must be lightweight with small form factor. The size and weight of sensors is primarily determined by the size and weight of batteries. Requirements for extended battery life directly oppose the requirement for small form factor and low weight. This implies that sensors have to be extremely power efficient, as frequent battery changes for multiple WBAN sensors would likely hamper users' acceptance and increase the cost. In addition, low power consumption is very important toward future generations of implantable sensors that would ideally be self-powered, using energy extracted from the environment. As a result, the ultra-low power WSN is the most crucial design target to achieve.

## 4.2 Ultra-Low Power FIFO Memory

In a conventional FIFO memory, there are three major components: storage elements, read/write pointers, and read/write control circuitries as shown in Fig. 4.3. Storage elements and read/write pointers usually occupy most of power consumption of a FIFO memory. Thus, the key method of power minmization is to reduce the power consumption of them. For the consideration of high density and low power, the SRAM-based storage elements are more suitable than registers and latches. Nevertheless, the degraded voltage margin and the increased device variability are serious challenges to near-/sub-threshold SRAMs [4.7]. In this work, a 10T SRAM bit-cell is proposed to enable a robust ultra-low voltage FIFO operation.



Fig. 4.3 A FIFO memory and the power proportion of FIFO Memory

A typical way to construct the read/write pointers of FIFO memory is the utilization of ring shift registers [4.8]. However, the shift-register-based pointers account for a relatively large portion of the total power consumption due to a large number of flip-flops and long metal lines as shown in Fig. 4.3. Such design is not suitable for highly energy-constrained systems such as WBANs. In order to implement an ultra-low power FIFO memory, a counter-based pointer structure is an alternative solution for aggressive power reduction.

For SRAM-based FIFO, the power dissipation during read/write operation is significant due to the largest capacitive on bit-line and word-line. Recently, modified read/write control circuitries with adaptive timing adjustment were presented to reduce active power and track process, voltage, and temperature (PVT) variations. In addition, the worst delay of read/write operation was considered as writing data "0" and sensing data "0". However, the worst case under PVT variations is not deterministic for single-ended scheme as the supply voltage scaling down. Hence, a worst case detector is necessary for robust ultra-low voltage operations.



| Pin     | Description         | Pin | Description                    |
|---------|---------------------|-----|--------------------------------|
| CLK_w   | Write Clock Input   | CEN | Chip Enable (0) / Disable (1)  |
| CLK_r   | Read Clock Input    | WEN | Write Enable (0) / Disable (1) |
| D[15:0] | Data in (D[0]=LSB)  | REN | Read Enable (0) / Disable (1)  |
| Q[15:0] | Data out (Q[0]=LSB) |     |                                |

Fig. 4.4 describes the block diagram of the proposed 16Kb SRAM-based FIFO memory and Table 4.1 lists the pin of the proposed 16Kb SRAM-based FIFO memory. Based on the first-in first-out data behavior, an adaptive power control circuit [4.3] turned off the power supply of the read-out words is used to minimize the leakage power. Also, the proposed 10T SRAM bit-cell discussed in the previous chapter is adapted to be the storage elements of our FIFO. In order to further reduce total power consumption and provide robust ultra-low voltage operations, a counter-based pointer structure and a adaptive replica read/write control unit are implemented.

#### 4.2.1 10T SRAM Storage Element

The basic storage element of the ultra-low power FIFO memory utilizes the dual- $V_t$  and dual-port 10T SRAM bit-cell proposed in Chpater 3 as show in Fig. 4.5.



- Single-ended scheme: A major power consumption source of a memory array is caused by the voltage swing on bit-line. Therefore, reducing bit-line loading results in significant active power reduction.
- Dual-V<sub>t</sub> structure: The dual-V<sub>t</sub> structure improves hold stability and write ability.
  Enabling more reliable operation under ultra-low supply voltage.
- Read buffer: With the read buffer structure the 10T bit-cell isolates storage nodes from being directly interfered by the read disturb. Moreover, this read buffer can eliminate the data-dependent bit-line leakage and force the leakage current to flow from cell into bit-line.
- Cutting off the positive feedback: This scheme can cut off the positive feedback loop of inverter pairs. The 10T bit-cell enlarges the write margin without any

peripheral circuit especially in near-/sub-threshold regions.

### 4.2.2 Adaptive Power Control

The key idea of leakage power minimization is to reduce voltage swing on un-functioning hardware. To take Fig. 4.6 for example, grey blocks represent words that contain data, while white blocks represent words that are empty. Empty words does not need data retention ability, thus, the don't care word can be power gated for leakage power minimization. Since the status of all the words in FIFO memory is predicable due to first-in first-out data behavior, an adaptive power control system, which cutoff the power supply of don't care words, could be utilized to efficiently reduce power consumption [4.3].



Fig. 4.6 A FIFO operation example [4.3]

Fig. 4.7 (a) shows the finite-state machine (FSM) and the equivalent circuitry of the power gating signal, *power\_on*. In the beginning, each word is in *cutoff state*. Whenever the accessed word is going to write, it changes to active state and the cell supply of the word is charged to VDD. Each written word stays in *active state* until the word-data is read out. As shown in Fig. 4.7(b), the power gating circuit is inserted in each word of the FIFO memory. The leakage current of each don't care word would be eliminated by adaptively turning off its power MOS.



#### **4.2.3 Counter-based Pointer Structure**

The independent read pointer and write are used as the address pointer that select the accessed word in the FIFO memory. A typical way to construct the read/write pointers of FIFO memory is the utilization of ring shift registers. However, as the depth of FIFO increases, the flip-flops and long metal lines of shift-register-based pointers increase exponentially. Such design is no longer suitable for highly energy-constrained systems, e.g. WBANs. A counter-based pointer structure is proposed to construct energy-efficient pointers as shown in Fig. 4.8. Since system performance is not the major concern anymore, the delay of read/write pointers caused by it is acceptable.



Fig. 4.8 The block diagram of the counter-based pointer

A synchronous counter (Fig. 4.9), in contrast to an asynchronous counter, is one whose output bits change state simultaneously, with no ripple. The clock input of the flip-flops are connect together, so that each flip-flop receives the exact same clock pulse at the exact same time. As supply voltage decreases, the most energy efficient flip-flop architecture depends on switching probabilities, where PowerPC (Fig. 4.9) achieves better EDP at low activities [4.9]. Therefore, due to ultra-low voltage operation and low activity of the logic pointer, the  $C^2MOS$ -based flip-flop is chosen to be the basic element.



Fig. 4.9 The synchronous counter



Fig. 4.10 The read pointer

Fig. 4.10 illustrates the read pointer. The 7-bit address ( $A_0 \sim A_{N-I}$ ) is generated by a synchronous counter which is triggered by the clock. When CEN=1, the counter is reset to 0. RP signal is generated by the read control circuit, which will be described afterwards. *CLK\_r* represents the clock for read operation. When CLK\_r is at the positive edge and *REN=0*, the output of counter will increase one step. The N-bit address is decoded to 2<sup>N</sup> bits using an N-to-2<sup>N</sup> decoder. Therefore, in these 2<sup>N</sup> bits, only one bit is asserted as the selected word-line, along with the AND gates for signal *RP* to control timing properly, the selected read-word-line (*RWL*) will be raised at the desired time.

For hardware, this counter-based pointer only needs seven registers and seven long address lines ( $A_0$ ,  $A_1$ ...  $A_6$ ) shared with four decoders. In addition, every two blocks are shared with a decoder in order to reduce the amount of them. Thus, the registers and long metal lines in counter-based pointer are less than those in shift-register-based pointer. It can significantly reduce power consumption of read/write pointers (Fig. 4.11).



Fig. 4.11 Power consumption comparisons

Fig. 4.12 shows the write pointer. The write pointer functions similar to the read pointer. WP signal is generated by the write control circuit, which will be described afterwards. CLK\_w represents the clock for write operation.



Fig. 4.12 The write pointer

## 4.2.4 Adaptive Replica Read/Write Control Unit

As the process variation increases seriously with scaling down the supply voltage, the worst case of write operation is uncertain for single-ended write port scheme. Because of threshold voltage of the access transistor, the delay time of write"1" is sensitive to

the variation of process and temperature as shown in Fig. 4.13 The worst case of write delay (write "0" and write "1") is varied with process corner and temperature. Therefore, the proposed adaptive replica management unit is utilized to detect the worst case to ensure robust write operation and reduce active power consumption.



The adaptive replica management unit consists of a 10T SRAM replica column, a read window control circuit, and a write window control circuit, as shown in Fig. 4.14. The cell data of replica cell is fixed at logic "1" by wiring the  $VC_{rp}$  to  $V_{DD}$ . Because the  $WBL_{rp0}$  and  $RBL_{rp}$  of replica column are shared with read and write window control circuit, respectively, the proposed adaptive replica management unit only needs one replica column.



Fig. 4.14 The adaptive replica read/write control unit

In read mode, read-pulse signal (*RP*) initially enables the accessed read-word-line (*RWL<sub>i</sub>*) of the SRAM array and sense amplifier including the sense amplifier in replica column. The read-word-line (*RWL*) active time should be long enough for the sense

amplifier to function reliably, but it should be turned off soon after the read operation is finished to cut off the marginal compensation current in order to reduce the power consumption..For read tracking, the set data for replica cells duplicates the worst case data pattern for discharging *RBL* to ground. Therefore, the read window control circuit utilizes *RBL*<sub>rp</sub> to track the sense behavior across various PVT conditions. As soon as *RBL* are discharged to ground, signal R-ok would be triggered and disable *WP*. Thus, the read window control circuit can adaptively control read window.

In a write operation, write-pulse signal (*WP*) initially enables the accessed write-word-line (*WWL<sub>i</sub>*) of the SRAM array and the write-word-line (*WWLrp*) of the duplicate bit-cell of the replica column. *WP* additionally turns on enable signals of all write drivers including the write driver in replica column. The data, *Din* [15:0], are then written to the accessed word. At the same time, "0" and "1" is written to the in the replica write cell, respectively.



Fig. 4.15 Write delay versus BL length

For write tracking, the worst case detector detects two conditions in write operation: write "0" and write "1". The set data in replica column creates the worst case of discharging  $WBL_{rp0}$  to ground to write "0" as the major effect of write "0" is the capacitance and leakage of bit-line. On the other sides, The  $WBL_{rp1}$  is hold at  $V_{DD}$  to write "1" as the profound effect of write "1" is the strength of *MN1*. Fig. 4.15 shows the write delay of write "0" and write "1", and the number of bit per bit-line does not affect the delay of write "1". Hence the bit-line load of write "1" can be omitted. The worst case detector provides information to tell whether write "0" or write "1" is the worst case. After replica data is written to the duplicate bit-cell successfully,  $W_ok$ , delayed by an inverter delay line for wider window margin and disables *WP*. Accordingly, the adaptive replica management unit can guarantee the sufficient write window.



## 4.3 A 16kb 0.5V SRAM-Based FIFO in 90nm CMOS

The WBAN consists of a multiple of wireless sensor nodes (WSNs) and a central processing node (CPN) which are attached on human body skin and integrated in a portable device respectively, as shown in Fig. 4.16. In order to extend the battery life of wireless sensor node for WBAN applications, the supply voltage is scaled to 0.4 V (near-threshold region) to reduce the power consumption of the transceiver and its storage circuits.com

In WSN system for WBAN, the body signal from the external readout sensors are accumulated in the FIFO memories. The read/write clock frequency (625kHz/50kHz) and 16-bit resolution of the proposed FIFO memory that connects to the readout sensor is based on the specification of the electrocardiogram (ECG) signal that

requires almost the highest sampling rate and longest bit-length to represent its waveform among a human body. An analog to digital converter (ADC) with 16-bit resolution is used to sample body signals in [4.10]. Either of OFDM or MT-CDMA modulation scheme with 5MHz signal bandwidth can be used to transmit data. The read clock ( $CLK_r$ ) frequency could run at 8 times slower for lower power consumption due to the processing and behavior of these modulation schemes in [4.10].



#### **4.3.1** Power Consumption Analysis

Fig. 4.17 The contributions of power reduction from energy-efficient techniques

Power consumption of the FIFO memory is based on the system operation characteristic in [4.2], which can be approximately described as:

$$Power = 1\%$$
 standby power + 93% write power

+5% simulateous read / write power + 1% read power

Since the application behavior of longer writing period and shorter read operation period where it wastes more power consumption, it takes the advantage of lower power expense, leading to extend the power duration of whole system. Fig. 4.17 shows the contributions of power reduction from energy-efficient techniques with 0.5V supply voltage, the proposed design consumes ultra-low read/write/standby power  $(1.98\mu$ W/1.61 $\mu$ W /1.91 $\mu$ W), and it has only 1.64 $\mu$ W average power consumption. Adaptive power control circuit and counter-based pointers each provides 10.7% and 45.7% of power reduction, respectively. The proposed smart replica read/write control unit provides additional 1.3% power reduction. The overall power is reduced by 57% at supply voltage of 0.4V.

#### 4.3.2 Post-Layout Simulation Result

In this section, a 0.5V 1024-word by 16-bit (16kb) near-threshold asynchronous 10T SRAM-based FIFO is implemented in UMC 90nm technology. Tolerating -20°C to 80°C temperature variation and all process corners, this proposed FIFO can operate at 625kHz reading frequency and 50kHz writing frequency. Fig. 4.18 shows the floorplan and layout views of the proposed 16Kb SRAM-based FIFO memory. The design profile is summarized in Table 4.2 with comparing to the previous work.

|                             | Previous Work  | This work     |  |  |
|-----------------------------|----------------|---------------|--|--|
| Technology                  | UMC 90nm tech. |               |  |  |
| Supply Voltage              | 0.5V           |               |  |  |
| Memory Size                 | 32kb           | 16kb          |  |  |
| Write Frequency             | 50kHz          |               |  |  |
| Read Frequency              | 625kHz         |               |  |  |
| Operating Temp.             | -40°C ∼ 125°C  | -20°C ~ 80°C  |  |  |
| Average Power<br>(@TT, 25℃) | 4.81µW         | 1.646µW       |  |  |
| Area                        | 854µm x 475µm  | 666µm x 508µm |  |  |

Table 4.2 Summary of the 16kb FIFO memory and comparison

At 0.5V supply voltage, the maximum read and write frequencies achieve 3.94MHz

and 3.39MHz in SS corner  $-20^{\circ}$ C. According to the requirement of WBANs, the read/write frequencies of FIFO are set to be 625kHz/50kHz. With 0.5V supply voltage and the frequency of specification, the proposed design consumes ultra-low read/write/standby power (), and it has only 1.646 $\mu$ W average power consumption.



Fig. 4.18 Layout of 32kb FIFO

# 4.4 Summary

This chapter has presented a 16kb robust near-/sub-threshold asynchronous SRAM-based FIFO memory. A 10T SRAM bit-cell, which has presented in chapter 3, is utilized as the storage element to provide the advantage of read SNM enhancement, write variation reduction, and bit-line leakage reduction. Using the adaptive power control circuit, counter-based pointer, and smart replica read/write control unit, these techniques result in a 57% reduction in total power consumption and tracking the worst case under the serious PVT variations. All the above presented FIFO memory design techniques enable energy-efficient and robust operation for WBAN applications.



# Chapter 5 A 2kb Built-in Row-Control Dynamic Voltage Scaling Near-/Sub-threshold FIFO memory in 65nm CMOS for WBANs

## **5.1 Introduction**

Dynamic voltage scaling (DVS) is a popular approach for energy reduction of digital circuits. DVS typically have an operating voltage range from full to half of the maximum supply voltage depending on the performance requirement. In DVS schemes, a lower voltage is applied to a circuit in low-power mode, while a nominal or boosted voltage is applied to a circuit in high-performance mode. Therefore, It is possible to construct designs that operate over a wide range of supply voltage: from full voltage to sub-threshold voltage. It is illustrated in [5.1]. Beside, a 64 kb reconfigurable SRAM fabricated in 65 nm low-power CMOS process operating from 250 mV to 1.2 V is proposed in [5.2]. For high reliability in ultra-low supply voltage and high efficient power delivery in micro-power system, a subthreshold microcontroller with integrated SRAM and power-efficient switched capacitor DC-DC converter is presented in [5.3].

## 5.2 Built-in Row-Control DVS FIFO Memory

In this design, SRAM block are divide into several sub-blocks. In receiving time, the supply voltage of sub-block can be scaled down to reduce power consumption for low sample rate. In transmitting time, the supply voltage of sub-block is maintained original supply voltage for high access rate. Therefore, each sub-block is defined by three operating modes: *Low-power mode*, *Typical mode* and *Floating mode*. In *Low-power mode*, the FIFO records various physiological signals throughout its life time, e.g. heart rate and ECG or holds data. In *Typical mode*, the FIFO shortly process and transmit real-time informative cardiovascular parameters to a host. After these words in the sub-block are read-out, these data is not need data retention. The sub-block turn in *Floating mode* to power gated SRAM sub-block for leakage power minimization. In this *Low-power mode* dominated scenario, applying DVS technique is efficient to further reduce total energy consumption. Therefore, the supply voltage is scaled down into the sub-threshold region in *Low-power mode* to provide quadratic savings in active  $CV_{DD}^2$  energy.

Fig. 5.1 describes the block diagram of the proposed 2kb built-in row-controlled DVS SRAM-based FIFO memory, where the *VDDL* is offered by switch capacitance (SC) DC-DC converter. For generating dual supply voltage, the SC DC-DC converter is employed to decrease voltage from the system voltage source instead of using multiple voltage sources.

The proposed built-in row-controlled DVS FIFO composes of write peripheral circuitries, read peripheral circuitries, adaptive power switch control system, and SRAM based storage element. These circuits are operated in different voltage region. Write peripheral circuitries which consisted of write driver (WD), write pointer, write-word-line (WWL) driver, and write window controller operate in low voltage region. On the other hand, Read peripheral circuitries are operating in high voltage region, which are composed of sense amplifier, read pointer, read-word-line (RWL) driver, and read window controller. The adaptive power switch control (APSC)

system is operated in both voltage region and offer property cell supply voltage to each correspond sub-block in storage element. The cell supply voltage is changed between 0.3V, 0.5V, cut-off depending on the operation mode of the corresponding sub-block, and the operation mode is *Low-Power Mode*, *Typical Mode*, and *Floating Mode*, respectively. Therefore the proposed FIFO can adaptive converter the cell supply voltage in different operation mode.



Fig. 5.1 System block diagram of DVS FIFO and behavior time line of sub-block

## **5.3 Storage Element**

For the consideration of high density and low power, the storage element of the proposed FIFO is composed of dual-port SRAM bit-cells. Dual-port SRAM bit-cells almost have independent read/write bit-line. Thus, the proposed FIFO is provided with the ability to simultaneously read and write different words at the same time.

5.3.1 A 9T Sub/Near-threshold SRAM Bit-cell



Fig. 5.2 The proposed 9T sub/near-threshold SRAM bit-cell

The storage element of the ultra-low power FIFO memory utilizes the proposed dual- $V_T$  9T SRAM cell as shown in Fig. 5.2. In order to simultaneously read/write different words at same time, the proposed 9T SRAM bit-cell have independent read/write bit-line (*RBL/WBL*). Moreover, the single bit-line scheme results in significant active power reduction [5.6]. In order to reduce leakage currents and enlarge write margin and SNM, dual-threshold CMOS (DTCMOS) design and reverse short channel effect (RSCE) design is utilized in the bit-cell. The low V<sub>t</sub> device can reduce V<sub>t</sub> loss through *MP1* to improve the hold SNM and read SNM and the

utilization of RSCE can reduce  $V_t$  loss further more as show in Fig. 5.3. In sub-threshold region, the longer channel length can releases process variation and improve  $I_{on/off}$  ratio.



**5.3.2 Layout Consideration** 



Fig. 5.4 Schematic of the 9T SRAM bit-cell



Fig. 5.5 Layout of the 9T SRAM bit-cell (TSMC 65 tech.)

As show in Fig. 5.4 and Fig. 5.5, the proposed 8T SRAM bit-cell regular layout is design in "straight line layout" and "thin cell layout". The straight line layout can facilitate lithography and reduce sensitivity to overlay errors. Hence, improves mismatch and increase SNM. The thin cell layout can decrease bit-line length and then reduce equivalent RC value of bit-line. In brief, the straight line layout enhance yield, and the thin cell layout earn better performance. Unfortunately, the proposed 9T SRAM bit-cell cost much more area overhead than 8T SRAM bit-cell because of the RSCE and DTCMOS design. Nevertheless, the area penalty of the proposed 9T is almost as same as the 10T cell as shown in Table 5.1

|                 | DP 8T | 8T  | 10T  | 9T  |
|-----------------|-------|-----|------|-----|
| #WL             | 2     | 2   | 2    | 2   |
| #BL             | 2     | 3   | 3    | 2   |
| #VGND           | no    | yes | no   | no  |
| RSCE            | no    | yes | yes  | yes |
| Dual-Vt design  | no    | no  | no   | yes |
| Normalized area | 0.88  | 1   | 1.64 | 1.7 |

Table 5.1 Comparison of various DP SRAM bit-cells

The cell is design in TSMC 65nm standard process technology. There are three metal layers in the bit-cell, where *VDDC*, *GND*, *WBL* and *RBL* are routed in second metal layer, *VDDC*, *GND*, *WWL* and *RWL* are routed in third metal layer.



Fig. 5.6 Body bias of pMOS of the 9T bit-cell

Moreover, the other layout consideration of the 9T bit-cell is pMOS body bias as shown in Fig. 5.6. If the body of pMOS is biased in *VCS*, the pitch of N-Well to N-Well is to far to construct high density SRAM block. For the consideration of density, the body of the pMOS must be fixed at 0.3V or 0.5V. However, the body bias

has some effect on cell margin and power. If the body of the pMOS is biased in 0.5V, it is reverse body bias (RBB) in write and hold operation. On the other hand, if the body of the PMOS is biased in 0.3V, it is forward body bias (FBB) in read operation. As shown in Fig. 5.6, RBB only has tiny degradation in both write margin and hold SNM. However, the power consumption of 0.5V body bias is less than 0.3V body bias in Fig. 5.8. Therefore, the body of pMOS in the proposed 9T bit-cell is biased in 0.5V.



Fig. 5.8 Power consumption in different body bias

## **5.3.3 Operation Analysis**

The proposed 9T SRAM bit-cell is applied to the storage element in the proposed FIFO. There are different operating voltages in each operation. For the proposed FIFO, *VDDC* of the operation mode hold/read/write are *VDDL/ VDDH/VDDL*, respectively.

The operation truth table in each mode and operation voltage of proposed 9T SRAM is shown in Table 5.2. *WWL* is enabled only in write mode and *WBL* is driven by the write driver. Besides, *RWL* is enabled in read mode and conditionally discharge floating *RBL*.

|       | VDDC | WWL  | RWL  | WBL    | RBL      |
|-------|------|------|------|--------|----------|
| Hold  | VDDL | GND  | GND  | VDDL   | VDDH     |
| Read  | VDDH | GND  | VDDH | VDDL   | Floating |
| Write | VDDL | VDDL | GND  | Driven | VDDH     |

Table 5.2 Operation truth table of the 9T SRAM

#### **Hold operation**

In the hold mode (Fig. 5.9), the cascaded high V<sub>t</sub> transistors *MN2* and *MN4* and the extension of channel length can significantly reduce the bit-line leakage. The low Vt device can reduce voltage drop between *VL* and *VC* to diminish the loss of hold SNM as shown in Fig. 5.9 and Fig. 5.10. In addition, the PSC boosts the sub-block up to *VDDH* sequentially. There is only one sub-block or two sub-blocks is operated in *VDDH* at one time. This brings the advantage in leakage reduction, because of the sub-blocks operated in *VDDL* have lower bit-line leakage than those operated in *VDDH*. Therefore, the proposed 9T SRAM bit-cell reduce 66% bit-line leakage than 8T [5.7] in the worst case (FF corner and  $100^{\circ}$ C).



Fig. 5.9 Hold mode of the 9T SRAM cell



Fig. 5.10 Hold SNM comparison

#### **Read operation**

In read mode, the propose 9T SRAM bit-cell operation in near-threshold region (0.5V). Because the stored node is isolated by the read buffer MN2, MN3, and MN4, the read disturbance is eliminated. As shown in Fig. 5.12, *RWL* turn on the *MN2* and *MN4*, and the floating RBL is discharged to ground or hold depending on the cell data. Fig. 5.13 show read SNM of different SRAM. Obviously, the proposed 9T SRAM has significant improvement than convention DP 8T and a little drop than 10T [5.8].



Fig. 5.11 Read mode of the 9T SRAM cell



Fig. 5.12 Read SNM comparison



Fig. 5.13 Disturbance of read SNM in MC Simulation

Fig. 5.13 shows the distribution of read SNM in Monte Carlo Simulation (10000 times). The convention 6T SRAM bit-cell has awful mean ( $\mu$ ) and standard deviation ( $\sigma$ ) values. The proposed 9T SRAM leads to reliable mean and standard deviation value of read SNM.

The read-buffer of proposed 9T SRAM extend channel length to improve  $I_{on}$ - $I_{off}$  ratio. Thus, the margin of sense amplifier is enhanced by the higher  $I_{on}$ - $I_{off}$  ratio, and the local V<sub>t</sub> variation is decreased by the larger channel length. As shown in Fig. 5.14, the increasing channel length can decrease leakage current and increase  $I_{on}$ - $I_{off}$  ratio. The channel length of 120nm can minimize the T<sub>delay</sub>/  $I_{on}$ - $I_{off}$  ratio and 58%

improvement of  $I_{on}$ - $I_{off}$  ratio. To sum up, the  $I_{read}$ - $I_{leakage}$  ratio of the proposed 9T SRAM bit-cell can improvement 2.5X than 8T in the worst case (FF corner 100°C) [5.7]. Moreover, the replica circuit is applied to adaptively cutoff the *RWL* window to prevent access failure.



Fig. 5.14 Ion/Ioff ratio, and delay versus channel length

#### Write operation

In write mode, the proposed 9T SRAM bit-cell operates in sub-threshold region (0.3V). As show in Fig. 5.14 write-word-line (*WWL*) is "high". The *WBL* is precharged to  $V_{DD}$  before the cell is accessed. *WWL* turn on *MN1* and turn off *MP1* simultaneously, so the writing in data passes through *MN1*, inverter *A* and *B* to the node *VC*. This scheme can cut off the positive feedback loop of inverter pairs. The proposed SRAM bit-cell enlarges write margin without any peripheral circuit especially in near-/sub-threshold regions.



Fig. 5.14 Write mode of the 9T SRAM cell

Fig. 5.15 show the distribution of write margin with proposed 8T SRAM and the other SRAMs at supply voltage in sub-threshold region, in Monte Carlo simulation (10000 times). It obviously shows that the proposed 9T SRAM cell has better write-ability and less variation than the others. It show that the proposed 9T scheme has 69% / 100% improvement in write "0" / "1"margin, respectively ,and 38.8% reduction in write "0" variation (standard deviation) compared to conventional dual-port SRAM.



Fig. 5.15 Distributions of write margin in MC Simulation

## **5.4 Adaptive Replica Control Circuits**

## **5.4.1 Write Pulse Control Circuit**



Fig. 5.16 The replica column for write window control circuit

The single-end write-port scheme of the proposed 9T SRAM can reduce the active power during write operation. However, there is different impact of variation between write "0" and write "1" in different corner and temperature. Therefore, a write window control circuitry is needed in this proposed FIFO and the scheme of write window control circuitry is shown in Fig. 5.16.

In a write operation, write-pulse signal (*WP*) initially enables the accessed write-word-line (*WWL<sub>i</sub>*) of the SRAM array and the write-word-line (*WWLrp*) of the duplicate bit-cell of the replica column. *WP* additionally turns on enable signals of all write drivers including the write driver in replica column. The data, *Din [15:0]*, are then written to the accessed word. At the same time, "0" and "1" is written to the in the replica write cell, respectively. For write tracking, the worst case detector detects two conditions in write operation: write "0" and write "1". The set data in replica column creates the worst case of discharging  $WBL_{rp0}$  to ground to write "0" as the major effect of write "0" is the capacitance and leakage of bit-line. On the other sides, The  $WBL_{rp1}$  is hold at  $V_{DD}$  to write "1" as the profound effect of write "1" is the strength of *MN1*.

# 5.4.2 Read Pulse Control Circuit

In Fig. 5.17, a 9T SRAM replica column and read window control circuitry are utilized to automatically control read-word-line window for PVT variation tolerance. Due to the cell-data of the replica cell are hardwired to "1", the replica column can duplicates the worst case of discharge voltage of *RWL* to ground. It has the longest delay time for the replica column than the others columns. In this way, the replica column can generate enough *RWL* pulse for sense amplifier to sense correct data. As soon as *R*-ok is triggered, the register in read window control circuit would be reset and turns off *RP* pulse.

The read-word-line active in read mode should be long enough for the sense amplifier to sense completely. Nonetheless, because of the bit-line leakage, it should be turned off soon after the read operation is finish to avoid sensing fail as shown in Fig. 5.18.


Fig. 5.17 The replica column for read window control circuit



Fig. 5.18 Sensing margin comparisons under worst case column pattern

# **5.5 Adaptive Power Switch Control System**

DVS is one of the most effective techniques to reduce energy consumption. Fig. 5.19 shows the block diagram of the convention DVS FIFO. When the FIFO switches supply voltage from *VDDL* to *VDDH*, or from *VDDH* to *VDDL*, the setup time is needed to avoid supply shorting, and data corrupting owing to supply grid noise. Besides, as shown in Fig. 5.19, the significant charge current and discharge current of convention DVS FIFO result in large overhead energy in every supply switch. With the increasing size of the FIFO, the setup time, charge energy and discharge energy would be increased. Therefore, the convention DVS controller is no longer suitable.



Fig. 5.19 The convention DVS FIFO and current waveform

As shown in Fig. 5.20, an adaptive power switch control (APSC) system is proposed to widely reduce charge and discharge energy, eliminate setup time of voltage convert and be able to simulation read/write. Instead of outside the FIFO, the proposed APSC system is constructed in the FIFO and the supply voltage of read/write circuits are separate in 0.3V/0.5V, respectively. Along with the circular shifting read/write pointer, the proposed APSC system controls the supply voltage of each sub-block separately. Due to first-in-first-out data behavior, status of each sub-block is completely predictable. The setup time for converting sub-block supply voltage can be eliminated. Beside, the capacitance of sub-block is much small than whole memory. Thus, the charge and discharge is also small. As shown in Fig. 5.20, comparing with convention DVS FIFO, the proposed built in row-control DVS FIFO can reduce the charge current/discharge current by 22X/5X respectively.



Fig. 5.20 The built-in row-control DVS FIFO and current waveform

Fig. 5.21 is a built-in row-control DVS FIFO operation example, where gray blocks represent the sub-block is *Low-power Mode* and is going to be write or contain data, dark grey blocks represent the sub-block is *Typical Mode* and is going to be read, and white blocks represent the sub-block is *Floating Mode* and data is read-out.



In the beginning, except the first sub-block, each sub-block is in *Floating mode* and each sub-block supply voltage is cut off. In write time, whenever the last word of  $1_{th}$  sub-block is going to be written,  $2_{th}$  sub-block changes to *Low-power mode*. After every words in  $1_{th}$  sub-block are written completely, the  $1_{th}$  sub-block continues staying in *Low-power mode* and holding the cell data. In read time, as soon as the last word of  $1_{th}$  sub-block is going to be read,  $2_{th}$  sub-block changes to *Typical mode*. Therefore, there is only one sub-block in *Typical mode*. Until the last word of  $2_{th}$ sub-block is read-out,  $2_{th}$  sub-block turns back to *Floating mode*.

Depending on operation mode, the sub-block supply voltage is converted. In *Low-power mode*, the sub-block supply voltage is scaling down to sub-threshold voltage (0.3V) for saving power in low simple rate. In *Typical mode*, the sub-block

supply voltage is keeping at near-threshold voltage (0.5V) for speed in high access rate. In *Floating mode*, the sub-block supply voltage can be power gating for leakage power minimization. For converting this sub-block supply voltage, each sub-block need a power switch and control circuit.

### 5.5.1 Adaptive Power Switch Control Circuit

The APSC circuit is proposed to adaptive convert sub-block supply voltage as shown in Fig. 5.22 (a).



Fig. 5.22 (a) Adaptive power switch control circuitry (b) waveform of the signal

The APSC consists with two power pMOS, one discharge nMOS and control logic, and the APSC are controlled by the read/write pointer. The signal of read/write pointers (*RWL/WWL*) are decoded to generate signal power switch high (*PS\_H*),

power switch low ( $PS_L$ ), and discharge (DSCH) by the control logic as shown in . The signal  $PS_H$  control high voltage power pMOS (MPH) to provide voltage high (VDDH), The  $PS_L$  control low voltage power pMOS (MPL) to provide voltage low (VDDL), and the DSCH control discharge nMOS. The waveform of these signals are shown in Fig. 5.22 (b).



The signal and operation mode follow the finite state machine (FSM) as shown in Fig. 5.23. Noted that to support security supply switch, *Wait State* is necessary when operation *mode* is switched from *Low-power mode* to *Typical mode*. If the operation of turning off *MPL* is slower than turn on *MPH*, it is possible to short between supplies. For ensuring the security switch, a safe margin is added by the inverter delay line. Thus, *PS\_H* signal is delayed to guarantee the safe switch.

Besides, when operation *mode* is switched from *Typical mode* to *Floating mode*, *Discharge State* is necessary. In *Discharge State*, signal *DSCH* turns on the discharge nMOS to discharge voltage level of *VCS* from *VDDH* below to *VDDL*. This action is utilized to avoid *VCS* charge into supply of *VDDL*, if *MPL* is turn on soon after. Based on the above, the delayed *PS\_H* signal and discharging nMOS can prevent reverse current issue.

### 5.5.2 Sub-block Size Design Consideration

In order to maximize saving power, it is important to select appropriate sub-block size and sub-block size is definite by the number of word in sub-block. The major considerations for sun-block size design are area overhead and power dissipation. Area overhead is determinate by number of APSC circuit and size of power pMOSs. The larger size of sub-block is set, the larger size of power pMOSs are need, but the fewer corresponding APSC circuit are set. Thus, the number of APSC circuit and the size of power pMOSs are in reverse proportion. In Fig. 5.24, the area overhead combining with power pMOSs and APSC circuit is simulated.

The other consideration is power dissipation which is determinate by overhead power of APSC circuit, charge and discharge overhead power, and saving power in low-power mode and floating mode. Large sub-block consumes less overhead power of APSC circuit, but consumes great charge and discharge power. Beside, power saving in low-power mode floating mode are not obvious in large sub-block. Fig. 5.24 shows trade-off of overhead power and saving power. It seems that the smallest power dissipation occur at 16 words in a sub-block. Considering the power dissipation, 16 words are arranged in a sub-block.



Fig. 5.24 The area overhead and power dissipation in different sub-block size

# 5.6 A 2kb Built-in Row-Controlled DVS

# Near-/Sub-threshold FIFO memory in 65nm CMOS

### **5.6.1 Post-Layout Simulation Result**

In this section, a 2kb built-in row-controlled DVS FIFO memory is implemented in TSMC 65nm technology at 625kHz reading frequency and 33kHz writing frequency. Tolerating -20°C to 80°C temperature variation. The layout view of the proposed FIFO is show in Fig. 5.25. The design profile is summarized in Table 5.3. The proposed design has only 0.606µW average power consumption in average per read/write access. In receiving time, the proposed design has only 0.578µW average power consumption, which reduces 46.2% over that in transmitting time.



Fig. 5.25 Layout of 2kb proposed FIFO



Table 5.3 Summary of the 1kb FIFO memory and comparison

Fig. 5.26 Power reduction by the proposed DVS FIFO



without DVS (gray bar) and proposed FIFO with built-in row-controlled DVS technique. In receiving time, the benefit of which is attributed to the quadratic savings in active  $CV_{DD}^2$  energy, and result in 49.3% power saving with 17.3% power overhead of adaptive power switch control system. In transmitting time, the proposed DVS FIFO has 18.5% power saving, consequently.

### 5.6.2 Design Issue of DVS FIFO

Adjusting the supply voltage of the FIFO in DVS domain involves charging and discharging the large capacitance of the circuit. This results in two design issues: 1) the degradation of FIFO performance caused by the stall period for the voltage power rail dampening to an acceptable range [5.9]. 2) the increment of FIFO energy consumption caused by supply voltage level adjusting. The energy overhead should be smaller than the energy saving. Note that the overhead energy mainly produced by two sources. One is caused by the power of supply switch control circuit. The other is created by charging and discharging of capacitances in the supply node of FIFO.

The first issue is solved by the proposed built-in row-controlled DVS architecture. Because our FIFO can automatically change the supply voltage level in advance, it does not need to stall until the supply voltage to be stable. The stall of FIFO access is prevented. For example, when the first sub-block is going to be full, the second sub-block would be charged from *Floating Mode* to *Low-power Mode* and prepare for the next access. Furthermore, when the first sub-block is going be read out, the second sub-block would be charge from *Low-power Mode* to *Typical Mode*.

For the second issue, the energy overhead and energy saving will be discussed and analyzed clearly in last section.

### 5.6.3 Analysis of DVS FIFO Energy Consumption

In the section, we will analyze energy overhead and energy saving of this proposed built-in row-control DVS FIFO compare with convention FIFO without DVS. Then, we will calculate this information. Finally, how many data access at least for saving energy will be determined.

Generally speaking, Energy consumption can be formed as (5.2).

$$E = E_{active} + E_{leakage} + E_{overhead}$$
(5.2)

where  $E_{active}$  is the switching energy for read/write,  $E_{leakage}$  is leakage energy of cells and peripheral circuit, and  $E_{overhead}$  is the overhead energy to switch between *Floating/Low-power/Typical mode*. Fig. 5.27 shows the functionality constraints. During the access time, the FIFO is going to access data, when both active-switching and static-leakage energy are consumed in this time. Access time ( $T_{acc}$ ) is supposed to be receiving time and transmitting time ( $T_R + T_T$ ). In receiving time ( $T_R$ ), the  $N_{acc}$  set of data are written in the FIFO. During transmitting time ( $T_T$ ),  $N_{acc}$  words are read out Therefore,  $T_{acc}$  can be written as  $N_{acc} \times (T_{write}+T_{read})$ .



#### Fig. 5.27 The receiving time and transmitting time of FIFO

#### A. Energy Composition of Conventional FIFO without DVS

The energy components of convention FIFO are modeled, and the parameters related to the conventional FIFO array that are used in the energy model are summarized in Fig. 5.28.



Fig. 5.28 Highlighting parameters critical for determining energy of conventional FIFO without DVS

1)  $E_{active}$ : The active energy associate with capacitive switch during writes is given by

$$E_{active,WR} = N_{acc} [C_{WWL} V_{DD}^2 + 16C_{WBL} V_{DD}^2 + 16C_{WD} V_{DD}^2].$$
(5.1)

Similarly, the active energy during reads is given by (2)

$$E_{active,RD} = N_{acc} [C_{RWL} V_{DD}^2 + 16C_{RBL} V_{DD}^2 + 16C_{SA} V_{DD}^2].$$
(5.2)

Here,  $N_{acc}$  is the number of word accesses.



Fig. 5.29 Supply voltage of words verse access time for conventional FIFO without

#### DVS

2)  $E_{leakage}$ : The leakage energy during the access time as shown in Fig. 5.29 is given by  $E_{leak,Block}$   $= \begin{cases} \int_{T_{write}} 16I_{leak,Cell}V_{DD}dt + \int_{(N_{acc}-1)T_{write}} 16I_{leak,Cell}V_{DD}dt \\ + \cdots + \int_{T_{write}} 16I_{leak,Cell}V_{DD}dt \end{cases}$   $+ \begin{cases} \int_{N_{acc}T_{read}} 16I_{leak,Cell}V_{DD}dt + \int_{(N_{acc}-1)T_{read}} 16I_{leak,Cell}V_{DD}dt \\ + \cdots + \int_{T_{read}} 16I_{leak,Cell}V_{DD}dt \end{cases}$   $= 16I_{leak,Cell}V_{DD}\left[\frac{(N_{acc}+1)N_{acc}}{2} \times T_{write}\right]$  $+ 16I_{leak,Cell}V_{DD}\left[\frac{(N_{acc}+1)N_{acc}}{2} \times T_{read}\right]$  (5.3)

Here,  $I_{leak,Cell,W}$  present cell leakage i

$$E_{leak,periphary} = \int_{T_{access}} (I_{leak,periphary,w} + I_{leak,periphary,R}) V_{DD} dt$$

$$= (I_{leak,periphary,w,H} + I_{leak,periphary,R,H})V_{DD}T_{acc}.$$
 (5.4)

#### B. Energy Composition of the proposed FIFO with BR-DVS

For the proposed DVS-BR FIFO, sub-blocks may enter the *Low-power Mode* sequentially in  $T_{write}$ . In  $T_{read}$ , each sub-block would enter the *Typical Mode*, separately. The parameters related to the array that are used in the energy model are summarized in Fig. 5.30.



Fig. 5.30 Highlighting parameters critical for determining energy of DVS-BR FIFO

1)  $E_{active}$ : The active energy associate with capacitive switch during writes is given by

$$E_{active,WR} = 16M_{acc} [C_{WWL} V_{DDL}^2 + 16C_{WBL} V_{DDL}^2 + 16C_{WD} V_{DDL}^2] .$$
(5.5)

$$E_{active,RD} = 16M_{acc} [C_{RWL}V_{DDH}^2 + 16C_{RBL}V_{DDH}^2 + 16C_{SA}V_{DDH}^2].$$
(5.6)

Here,  $M_{acc}$  is the number of sub-block accesses,  $V_{DDL}$  is the *Low-power Mode* supply voltage, and  $V_{DDH}$  is the *Typical Mode* supply voltage.



Fig. 5.31 Supply voltage of words verse access time for BR-DVS FIFO



Here, the size of the sub-block is 256 bit-cell (16-columns × 16- rows), therefore the sub-block leakage is  $256I_{leak,cell}$ . Besides,  $I_{leak,Cell,L}$  present cell leakage in *Low-power mode* and  $I_{leak,Cell,T}$  present cell leakage in *Typical mode*.

$$E_{leak,periphary} = \int_{T_{access}} (I_{leak,periphary,W}) V_{DDL} + (I_{leak,periphary,R}) V_{DDH} dt$$

$$= (I_{leak, periphary, W, L}V_{DDL} + I_{leak, periphary, R, H}V_{DDH})T_{acc}.$$
 (5.8)

3)  $E_{overhead}$ : The overhead energy, which is incurred in order to switch different mode is given by

$$E_{overhead} = M_{acc}C_{sub}V_{DDH}(V_{DDH} - V_{DDL}) + M_{acc}C_{sub}V_{DDL}^{2} + E_{PSCTRL}$$
 (5.9)

Here, the total capacitance that is coupled to the supply voltage of sub-block is represented by  $C_{sub}$ , and the overhead of the power switch control circuitry is represented by  $E_{PSCTRL}$ .

# C. Energy saving of BR-DVS FIFO 1896

For calculating conveniently,  $N_{acc}$  is supposed to be  $16M_{acc}$  that is divisible by 16, and depending on specification  $T_{write}/T_{read}$  is 33  $\mu$  s/1.6  $\mu$  s, respectively. The energy of convention FIFO could be expressed as:

$$E_{wo/DVSBR} = (5.1) + (5.2) + (5.3) + (5.4) .$$
(5.10)

The energy of proposed BR-DVS FIFO can be expressed as:

$$E_{w/DVSBR} = (5.5) + (5.6) + (5.7) + (5.8) + (5.9).$$
(5.11)

The following inequality implies that the energy saving must exceed the overhead,

(5.11) < (5.10):

$$M_{acc}C_{sub}V_{DDH}(V_{DDH} - V_{DDL}) + M_{acc}C_{sub}V_{DDL}^{2} + E_{PSCTRL}$$

$$+256I_{leak,Cell,L}V_{DDL}\left[\frac{(M_{acc}+1)M_{acc}}{2} \times 16T_{write}\right]$$

$$+256I_{leak,Cell,L}V_{DDL}\left[\frac{M_{acc}(M_{acc}-1)}{2} \times 16T_{read}\right]$$

$$+M_{acc}256I_{leak,Cell,T}V_{DDH} \times 16T_{read}$$

$$<16M_{acc}(C_{WWL}+16C_{WBL}+16C_{WD})(V_{DDH}^{2}-V_{DDL}^{2})$$

$$+16I_{leak,Cell,W}V_{DD}\left[\frac{(16M_{acc}+1)16M_{acc}}{2} \times T_{write}\right]$$

$$+16I_{leak,Cell,R}V_{DD}\left[\frac{(16M_{acc}+1)16M_{acc}}{2} \times T_{read}\right]$$

$$+(I_{leak,periphary,W,H}V_{DDH}-I_{leak,periphary,W,L}V_{DDL})T_{acc}$$
(5.12)

The simulation results of proposed BR-DVS FIFO show the each parameter. These parameters are bringing into the inequality (5.12), and we can get the following inequality: 1896

 $M_{acc}(0.0379 pJ + 0.0405 pJ + 0.008 pJ) + 0.183 \mu W \times (16 M_{acc} \times 34.6 \mu s)$ 

$$+256 \times 0.1427 nA \times 0.3V \left[ \frac{(M_{acc} + 1)M_{acc}}{2} \times 16 \times 33 us \right]$$
$$+256 \times 0.1427 nA \times 0.3V \left[ \frac{M_{acc}(M_{acc} - 1)}{2} \times 16 \times 1.6 us \right]$$

 $+M_{acc} \times 256 \times 0.341 nA \times 0.5V \times 16 \times 1.6 us$ 

$$< 16M_{acc}(2.14pF)(0.5^{2}V - 0.3^{2}V)$$

$$+16 \times 0.5978nA \times 0.5V \left[ \frac{(16M_{acc} + 1)16M_{acc}}{2} \times 33us \right]$$

$$+16 \times 0.5978nA \times 0.5V \left[ \frac{(16M_{acc} + 1)16M_{acc}}{2} \times 1.6us \right]$$

$$+(0.292\mu A \times 0.5V - 0.0874\mu A \times 0.3V)(16M_{acc} \times 34.6\mu s)$$
(5.13)  
$$\Rightarrow (18.147M_{acc} - 31.457) > 0$$
  
$$\Rightarrow M_{acc} > 1.73$$
(5.14)

The inequality (5.14) means that if the FIFO access data over 32 words ( $M_{acc}$ =2), the proposed BR-DVS FIFO could save energy. Furthermore, the more data accessed the more energy saved.

### **5.7 Summary**

An energy-efficient dynamic voltage scaling built-in row-control FIFO memory is presented in this chapter. A 9T SRAM bit-cell, which is utilized as the storage element to improve write margin and to reduce write variation. Extend channel length is utilized to decrease  $V_t$  and to rise  $I_{on}$ - $I_{off}$ -ratio. For improving read ability and managing PVT variation, an adaptive window control circuitry is designed. A row-based adaptive power switch control circuitry converts two operating supply voltages, 0.5V and 0.3V, for saving power in 0.3V (*Low power mode*) of sub-block. The simulation shows that if the FIFO access data over 32 words, the proposed BR-DVS FIFO could save energy. In receiving time and transmitting time, the proposed DVS FIFO has 49.3% and 18.5% power saving, consequently. In conclusion, the proposed BR-DVS FIFO is actually energy-efficient for WBAN applications.

# Chapter 6 Conclusions

# **6.1 Conclusions**

Voltage scaling is a popular method to reduce energy in digital circuit due to quadratic saving in the  $CV_{DD}^2$ . However the effects of  $I_{ON}/I_{OFF}$  ratio and process variation become the prominent challenges, particularly in deeply scaled technologies. This limits circuit operation in the ultra-low voltage regime, particularly for SRAM cells where minimum sized transistors are often used. For preventing these challenges, high static noise margin, high write margin and high sensing margin is required. Thus, the conventional 6T SRAM bit-cell is no longer suitable in ultra-low voltage SRAM design.

In this thesis, a robust ultra-low power near-/sub-threshold 10T SRAM bit-cells are proposed. For power saving, The DTCMOS design saving leakage power consumption and increasing WM/SNM, and single-ended write port scheme can reduces leakage and switch power during write operation. Thus, the proposed 10T bit-cell has an ultra-low power consumption character at sub/near-threshold voltage. For reliability, the bit-cell can cut off positive feedback loop of inverter pairs to improves write-ability and reduce write variation in write operation. The read-buffer can eliminate read disturb to keep read SNM and clamp the non-selected cells to increase  $I_{READ}$ - $I_{TOTAL\_LEAKAGE}$ -ratio. Hence, the proposed 10 bit-cell has minimum  $V_{min}$  among the other iso-area bit-cells.

A 16kb 0.5V 10T SRAM-based asynchronous FIFO memory in 90nm CMOS with the proposed 10T SRAM bit-cell is implemented in this thesis. With the proposed energy-efficient technology, the 16kb 10T SRAM-based FIFO consumes extremely low power and tolerate PVT variation, which specifically suitable for WBAN healthcare applications. Furthermore, the built-in row-control DVS 9T SRAM-based FIFO adjust the sub-block voltage level between sub-threshold and near-threshold region automatically without performance loss. The analysis of the energy consumption can prove the overhead energy of the proposed built-in row-control DVS can be compensated by the saving power and the more access the more power saving.

### **6.2 Future Work**

Three-dimensional (3-D) hyperintegration is an emerging technology, which vertically stacks and interconnects multiple materials, technologies, and functional components to form highly integrated micro-nano systems. The potential benefits of 3-D integration can vary depending on approach; they include multifunctionality, increased performance, increased data bandwidth, reduced power, small form factor, reduced packaging, increased yield and reliability, flexible heterogeneous integration, and reduced overall costs. It is expected that the industry paradigm will shift to a new industry-fusing technology era that will offer tremendous global opportunities for expanded use of 3-D silicon-based technologies in highly integrated systems. Indeed, 3-D integration is recognized as an enabling technology for bio heterogeneous systems.

Thermal and power constraints are of great concern with any 3-D microarchitecture because die stacking could dramatically increase power density. Thermal issues arise from increasing electrical power density with the continuous scaling of the ICs. Feature size shrinking leads to increased transistor leakage power loss and interconnect joule power loss. These power losses in turn increase the temperature in the IC chip. Our future work can try to employ our BR-DVS FIFO with PVT sensors



to overcome the serious PVT variations in TSV 3-D IC as shown in Fig. 6.1.

Fig. 6.1 PVT sensors with built-in row-controlled DVS FIFO in 3D integration



# References

- [1.1] S. K. Gupta, A. Raychowdhury, and K. Roy, "Digital Computation in Subthreshold Region for Ultralow-Power Operation: A Device-Circuit-Architecture Codesign Perspective," *IEEE Proceeding*, vol. 98, issue 2 Feb. 2010, pp. 160-190.
- [1.2] A. P. Chandrakasan, D. C. Daly, D. F. Finchelstein, J. Kwong, Y. K. Ramadass,
   M. E. Sinangil, V. Sze, N. Verma, "Technologies for Ultradynamic Voltage Scaling", *IEEE Proceedings*, vol. 98, issue 2, Feb. 2010, pp. 191-214.
- [1.3] W. Zhao and Y. Cao, "New generation of predictive technology model for sub-45 nm early design exploration," *IEEE Trans. Electron Devices*, vol. 53, no. 11, Nov. 2006, pp. 2816–2823.
- [1.4] Y. Wang, U. Bhattacharya, F. Hamzaoglu, P. Kolar, Y.-G. Ng, L. Wei, Y. Zhang, K. Zhang and M. Bohr, "A 4.0 GHz 291 Mb Voltage-Scalable SRAM Design in a 32 nm High-k + Metal-Gate CMOS Technology With Integrated Power Management," in *IEEE JSSC*, vol. 45, no. 1, Jan. 2010, pp. 103-110.
- [1.5] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS," *IEEE Trans. Comput.–Aided Design (CAD) integer. Cricuit Syst.,vol. 24, no. 12, Dec. 2005, pp. 1859–1880.*
- [1.6] J. P. Kulkarni and K. Roy "Ultralow-voltage Process-variation-Tolerant Schmitt-Trigger-Based SRAM Design," *IEEE Trans. VLSI System 2011.*
- [1.7] T.-W. Chen, J.-Y. Yu, C.-Y. Yu, and C.-Y. Lee, "A 0.5 V 4.85 Mbps dual-mode basedband transceiver with extended frequency calibration for biotelemetry applications," *IEEE J. of Solid-State Circuits*, vol. 44, no. 11, pp. 2966-2976, Nov. 2009.
- [1.8] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann, A. Chandrakasan, "A 65nm Sub-Vt Microcontroller with Integrated SRAM and Switched-Capacitor DC-DC Converter,", *IEEE ISSCC*, Feb 2008, pp. 318-616.

- [2.1] N. Verma, "Analysis Towards Minimization of Total SRAM Energy Over Active and Idle Operating Modes," *IEEE Trans. VLSI Systems*, no.99, 2010, pp. 1-9.
- [2.2] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-Power CMOS Digital Design," *IEEE JSSC*, vol. 27, no. 4, April 1992, pp. 473-484.
- [2.3] L. Chang, D.M. Fried, J. Hergenrother, J.W. Sleight, R.H. Dennard, R.K. Montoye, L. Sekaric, S.J. McNab, A.W. Topol, C.D. Adams, K.W. Guarini, W. Haensch, "Stable SRAM cell design for the 32nm node and beyond," *IEEE Symposium on VLSI*, June 2005, pp. 128- 129.
- [2.4] Kaushik Roy, S. Mukhopadhyay, and H. Mahomoodi-Meimand, "Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits," *Proceedings of the IEEE*, vol. 91, no. 2, February 2003, pp. 305-327.
- [2.5] F. Fallah and M. Pedram, "Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits," *IEICE Trans. Electron*, vol. E88-C, no. 4, April 2005, pp. 509-519.
- [2.6] Semiconductor Industry Association, International Technology Roadmap for Semiconductors, 2003 edition, http://public.itrs.net.
- [2.7] N. Yang, W. K. Henson, and J. Wortman, "A Comparative Study of Gate Direct Tunneling and Drain Leakage Currents in N-MOSFETS with Sub-2100-nm Gate Oxides," on *IEEE Trans. Electron Devices*, vol. 47, August 2000, pp. 1636-1644.
- [2.8] D.-G. Park et. al, "High-κ/Metal Gate Low Power Bulk Technology -Performance Evaluation of Standard CMOS Logic Circuits, Microprocessor Critical Path Replicas, and SRAM for 45nm and beyond," in Proc. VLSI Technol. Symp., 2009, pp. 90-92.
- [2.9] K. Kim, K. K. Das, R. V. Joshi, C.-T. Chuang, "Leakage Power Analysis of 25-nm Double-Gate CMOS Devices and Circuits," on *IEEE Trans. Electron Device*, vol. 52, no. 5 May 2005, pp. 980-986.
- [2.10] S. Gangwal, S. Mukhopadhyay, K. Roy, "Optimization of Surface

**Orientation for High-Performance Low-Power and Robust FinFET SRAM,**" on *IEEE CICC*, 2006, pp. 433-436.

- [2.11] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits", *IEEE JSSC*, vol. sc-19, August 1984, pp. 468-473.
- [2.12] J. Davis et al., "A 5.6GHz 64kB Dual-Read Data Cache for the POWER6TM Processor," IEEE ISSCC, February 2006, pp. 622–623.
- [2.13] Ramy E. Aly and Magdy A. Bayoumi, "Low-power cache design using 7T SRAM cell," *IEEE Trans. on Circuits and Systems*, April 2007, pp. 318-322.
- [2.14] F. Boeuf et al., "0.248 μm<sup>2</sup> and 0.334 μm<sup>2</sup> conventional bulk 6T\_SRAM bit-cell for 45nm node low cost general propose application," in *Proc. VLAI Technol. Symp.*, 2005, pp. 130-131.
- [2.15] F. Arnaud *et al.*, "A functional 0.69µm<sup>2</sup> embedded 6T\_SRAM bit cell for 65nm CMOS platform," in *Proc. VLSI Technol. Symp.*, 2003, pp. 65-66.
- [2.16] K. Kim, H. Mahmoodi and K. Roy, "A Low-Power SRAM Using Bit-Line Charge-Recycling," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, Feb. 2008, pp. 446–459.
- [2.17] S. Mukhopadhyay, C. Neau, R. T. Cakici, A. Agarwal, C. H. Kim, and K. Roy, "Gate Leakage Reduction for Scaled Device Using Transistor Stacking," *IEEE Tran. VLSI Systems*, vol. 11, no. 4, Aug 2003, pp. 716-730.
- [2.18] M. Khellah, N. Kim, et al., "A 4.2Ghz, 130Mb/cm2, dual-Vcc SRAM in 65nm CMOS featuring active power management with autonomous compensation of PVT variation & aging impacts," *IEEE ISSCC*, Feb. 2006.
- [2.19] K. Nii et al., "A 90 nm Low Power 32K-Byte Embedded SRAM with Gate Leakage Suppression Circuit for Mobile Applications," Digest of Tech. Papers, Symp. VLSI Circuits, 2003, pp. 247-250.
- [2.20] T.-H. Kim, J. Liu, and C. H. Kim, "A Voltage Scalable 0.26 V, 64 kb 8T SRAM With V<sub>min</sub> Lowering Techniques and Deep Sleep Mode," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 6, June. 2009, pp. 1785–1795.
- [2.21] Mu-Tien Chang, Po-Tsang Huang, Wei Hwang, "A robust ultra-low power asynchronous FIFO memory with self-adaptive power control," *IEEE* SOCC, Sept. 2008, pp. 175-178.

- [2.22] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, "A 32 kb 10T subthreshold SRAM array with bitinterleaving and differential read-scheme in 90 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 2, Feb. 2009, pp. 650–658.
- [2.23] A. Kawasumi, T. Yabe, Y. Takeyama, O. Hirabayashi, K. Kushida, A. Tohata, T. Sasaki, A. Katayama, G. Fukano, Y. Fujimura, and N. Otsuka, "A Single-Power-Supply 0.7V 1GHz 45nm SRAM with An Asymmetrical Unit-β-ratio Memory Cell," *IEEE ISSCC*, Feb. 2008, pp. 382-383.
- [2.24] Koichi Takeda, Yasuhiko Hagihara, Yoshiharu Aimoto, Masahiro Nomura, Yoetsu Nakazawa, Toshio Ishii, and Hiroyuki Kobatake, "A Read-Static-Noise-Margin-Free SRAM Cell for Low-VDD and High-Speed Applications," *IEEE JSSC*, January 2006, pp. 113-121.
- [2.25] J. Wang and B. H. Calhoun, "Canary replica feedback for near-DRV stand-by VDD scaling in a 90nm SRAM," in Proc. IEEE Custom Integr. Circuits Conf. Sept. 2007, pp. 29-32.
- [2.26] Jaydeep P. Kulkarni, Keejong Kim, Sang Phill Park and Kaushik Roy, "Process Variation Tolerant SRAM Array for Ultra Low Voltage Applications," *IEEE DAC*, June 2008, pp. 108-113.
- [2.27] Cheng-Hung Lo and Shi-Yu Hung, "P-P-N Based 10T SRAM Cell for Low-Leakage and Resilient Subthreshold Operation," in IEEE JSSC, 2011.
- [2.28] Yi\_Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang,
- [2.29] I. J. Chang, J. J. Kim, S. P. Park, and K. Roy, "A 32kb 10T Subthreshold SRAM Array with Bit-Interleaving and Differential Read Scheme in 90nm CMOS," *IEEE ISSCC*, February 2008, pp. 388-389.
- [2.30] Jaydeep P. Kulkarni, Keejong Kim, and Kaushik Roy, "A 160 mV, Fully Differential, Robust Schmitt Trigger Based Subthreshold SRAM," *IEEE ISLPED*, August 2007, pp. 171-176.
- [2.31] L. Chang, D.M. Fried, J. Hergenrother, J.W. Sleight, R.H. Dennard, R.K. Montoye, L. Sekaric, S.J. McNab, A.W. Topol, C.D. Adams, K.W. Guarini, W. Haensch, "Stable SRAM cell design for the 32nm node and beyond," *IEEE Symposium on VLSI*, June 2005, pp. 128- 129.
- [2.32] Leland Chang, R. K. Montoye, Yutaka Nakamura, Kevin A. Batson, Richard J. Eickemeyer, Robert H. Dennard, Wilfried Haensch, and Damir Jamsek, "An

**8T-SRAM for variability tolerance and low-voltage operation in high-performance caches**," *IEEE JSSC*, April 2008, pp. 956-963.

- [2.33] Tae-Hyoung Kim, Jason Liu and Chris H. Kim "An 8T Subthreshold SRAM Cell Utilizing Reverse Short Channel Effect for Write Margin and Read Performance Improvement," *IEEE CICC*, Sept. 2007, pp. 241-244.
- [2.34] Naveen Verma, Anantha P. Chandrakasan, "A 65nm 8T Sub-Vt SRAM Employing Sense-Amplifier Redundancy," IEEE ISSCC, Feb 2007, pp. 328-329.
- [2.35] Yi-Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "Subthreshold Asynchronous FIFO Memory for Wireless Body Area Networks (WBANs)", International Symposium on Medical Information and Communication Technology (ISMICT), March 2010.
- [2.36] B. H. Calhoun and A. P. Chandrakasan, "A 256 kb subthreshold SRAM in 65 nm CMOS," in IEEE Int. Solid State Circuits Conf., Feb. 2006, pp. 628-629.
- [2.37] T.H. Kim, J. Liu, J. Keane, and C.H. Kim, "A 0.2V, 480 kb subthreshold SRAM with 1k cells per bitline for ultra-low-voltage computing," *IEEE J. of Solid-State Circuits*, vol. 43, no. 2, Feb. 2008, pp. 518-529.
- [2.38] Mahmut E. Sinangil, Naveen Vermal, and Anantha P. Chandrakasan, "A Reconfigurable 8T Ultra-Dynamic Voltage Scalable (U-DVS) SRAM in 65 nm CMOS", *IEEE JSSC*, Nov. 2009, pp. 3163-3173.
- [2.39] Naveen Verma, Joyce Kwong, and Anantha P. Chandrakasan, "Nanometer MOSFET Variation in Minimum Energy Subthreshold Circuits," *IEEE Trans. on Electron Device*, January 2008, pp. 163-174.
- [2.40] K. Zhang (ed.), Embedded Memories for Nano-Scale VLSI, Series on Integrated Circuits and Systems. Springer Science+Business Media, LLC 2009.
- [2.41] S. Ohbayashi, et al., "A 65 nm SoC Embedded 6T-SRAM Design for Manufacturing with Read and Write Cell Stabilizing Circuits," in Symp. VLSI Circuits Dig. Tech. Papers, 2006.
- [2.42] Makoto Yabuuchi et al., "A 45nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2007, pp. 326–606.

- [2.43] K. Nii et al., "A 45-nm Single-port and Dual-port SRAM family with Robust Read/Write Stabilizing Circuitry under DVFS Environment," in Symp. VLSI Circuits Dig. Tech. Papers, 2008, pp. 212-213.
- [2.44] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, Y. Nakase and H. Shinohara, "A 45nm 0.6V Cross-Point 8T SRAM with Negative Biased Read/Write Assist," in Symp. VLSI Circuits Dig. Tech. Papers, 2009, pp. 158-159.
- [2.45] S. Mukhopadhyay, R. M. Rao, J.-J. Kim, and C.-T. Chuang, "SRAM Write-Ability Improvement With Transient Negative Bit-Line Voltage," *IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst.*, 2009 pp. 1-9.
- [2.46] K. Nii, Y. Tsukamoto, M. Yabuuchi, Y. Masuda, S. Imaoka, K. Usui, S. Ohbayashi, H. Makino, H. Shinohara, "Synchronous Ultra-High-Density 2RW Dual-Port 8T-SRAM With Circumvention of Simultaneous Common-Row-Access," *IEEE JSSC*, vol.44, no.3, March 2009, pp.977-986.

- [3.1] N. Verma, J. Kwong, and A.P. Chandrakasan, "Nanometer MOSFET Variation in Minimum Energy Subthreshold Circuits," *IEEE Trans. on Electron Devices*, Jan. 2008, pp. 163-174.
- [3.2] K. J. Kuhn, "Reducing variation in advanced logic technologies: Approaches to process and design for manufacturability of nanoscale CMOS," *IEEE IEDM*, Dec. 2007, pp.471-474.
- [3.3] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE JSSC*, vol.24, Oct. 1989, pp. 1433-1439.
- [3.4] L. Chang, Y. Nakamura, R. K. Montoye, J. Sawada, K. Martin, K. Kinoshita, F. H. Gebara, K. B. Agarwal, D. J. Acharyya, W. Haensch, K. Hosokawa, and D.Jamsek, "A 5.3 GHz 8T-SRAM with operation down to 0.41V in 65nm CMOS," in *Proc. VLSI Circuit Symp.*, Jun. 2007, pp. 252-253.
- [3.5] T.H. Kim, J. Liu, J. Keane, and C.H. Kim, "A 0.2V, 480 kb subthreshold SRAM with 1k cells per bitline for ultra-low-voltage computing," *IEEE J. of Solid-State Circuits*, vol. 43, no. 2, Feb. 2008, pp. 518-529.
- [3.6] B. H. Calhoun and A. P. Chandrakasan, "A 256 kb subthreshold SRAM in

**65 nm CMOS,**" *in IEEE Int. Solid State Circuits Conf.*, Feb. 2006, pp. 628-629.

- [3.7] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoyo, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, H. Shinohara, "A 45nm Low-Standby-Power Embedded SRAM with Improved Immunity Against Process and Temperature Variations," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2007, pp. 326–606.
- [3.8] I. J. Chang, J.-J. Kim, S. Park, and K. Roy, "A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS" *IEEE J. of Solid-State Circuits*, Feb. 2009, pp. 650-658.
- [3.9] S. Mukhopadhyay, R. M. Rao, J.-J. Kim, and C.-T. Chuang, "SRAM Write-Ability Improvement With Transient Negative Bit-Line Voltage," *IEEE Trans. VLSI System*, vol. 19, no. 1, January 2011, pp. 24-32.
- [3.10] N. Verma and A. P. Chandrakasan, "A 256kb 65nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. of Solid-State Circuits*, Jan. 2008, pp. 141-149.
- [3.11] M.-H. Tu, J.-Y. Lin, M.-C. Tsai and S.-J. Jou, "Single-ended Subthreshold SRAM with Asymmetrical Write/Read-Assist," in *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 12, December 2010, pp.3039-3047.

[3.12] D.P. Wang, H.J. Liao, H. Yamauchi, D.-P. Wang, H.-J. Liao, H. Yamauchi, W. Hwang, Y. L. Lin, Y. H. Chen, H. C. Chang, "A 45nm Dual-Port SRAM with Write and Read Capability Enhancement at Low Voltage," *IEEE SOCC*, June 2007, pp. 211-214.

- [4.1] J. Y. Yu, W. C. Liao, and C. Y. Lee, "An MT-CDMA Based Wireless Body Area Network for Ubiquitous Healthcare Monitoring," *IEEE BioCAS*, November 2006.
- [4.2] J. Y. Yu, C. C. Chung, W. C. Liao, and C. Y. Lee, "A sub-mW Multi-Tone CDMA Baseband Transceiver Chipset for Wireless Body Area Network Applications," *IEEE ISSCC*, Feb. 2007, pp. 364-365.
- [4.3] M.T. Chang, P.T. Huang, and W. Hwang, "A Robust Ultra-Low Power

Asynchronous FIFO Memory With Self-Adaptive Power Control," *IEEE* SOCC, Oct. 2008, pp. 175-178.

- [4.4] D. Markovic, C. C. Wang, L. P. Alarcon, T.-T. Liu, and J. M. Rabaey, "Ultralow-power design in near-threshold region," in *IEEE Proceedings*, vol. 98, no 2, pp. 237-252, Feb. 2010.
- [4.5] C. A. Otto, E. Jovanov, and A. Milenkovic, "A WBAN-based System for Health Monitoring at Home," *IEEE-EMBS*, September 2006, pp. 20-23.
- [4.6] E. Jovanov, A. Milenkovic, C. A. Otto, P. C. de Groen, "A wireless body area network of intelligent motion sensors for computer assisted physical rehabilitation," J. of Neuro Engineering and Rehabilitation, 2:6, March 2005.
- [4.7] N. Verma, "Analysis towards minimization of total SRAM energy over active and idle operating modes," *in IEEE Trans. VLSI Systems*, 2010.
- [4.8] K. Nii, et al., "A 65 nm ultra-high-density dual-port SRAM with 0.71um<sup>2</sup>
   8T-cell for SoC," Symp. on VLSI Circuits, pp. 130-131, 2006.
- [4.9] B. Fu and P. Ampadu, "Comparative Analysis of Ultra-Low Voltage Flip-Flops for Energy Efficiency," IEEE ISCAS, May 2007, pp. 1173-1176.
- [4.10] Tsan-Wen Chen, Jui-Yuan Yu, Chien-Ying Yu, Chen-Yi Lee, "A 0.5 V 4.85 Mbps Dual-Mode Baseband Transceiver With Extended Frequency Calibration for Biotelemetry Applications," *IEEE JSSC*, vol.44, Nov. 2009, pp. 2966-2976.

- [5.1] Anantha P. Chandrakasan, Denis C. Daly, Daniel Frederic Finchelstein, Joyce Kwong, Yogesh Kumar Ramadass, Mahmut Ersin Sinangil, Vivienne Sze, Naveen Verma, "Technologies for Ultradynamic Voltage Scaling", *IEEE Proceedings*, Vol. 98, Issue 2, Feb. 2010, pp. 191-214.
- [5.2] Mahmut E. Sinangil, N. Verma, A.P. Chandrakasan, "A Reconfigurable 8T Ultra-Dynamic Voltage Scalable (U-DVS) SRAM in 65 nm CMOS", *IEEE JSSC*, Nov. 2009, pp. 3163-3173.
- [5.3] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann, and A.P. Chandrakasan, "A 65nm Sub-Vt Microcontroller with Integrated

SRAM and Switched-Capacitor DC-DC Converter," *IEEE JSSC*, vol. 44, Jan. 2009, pp. 115-126.

- [5.4] Hong-Wei Huang, Ke-Horng Chen, and Sy-Yen Kuo, "Dithering Skip Modulation, Width and Dead Time Controllers in Highly Efficient DC-DC Converters for System-On-Chip Applications," *IEEE JSSC*, vol. 42, Nov. 2007, pp. 2451–2465.
- [5.5] Y. K. Ramadass and A. P. Chandrakasan, "Voltage scalable switched capacitor DC-DC converter for ultra-low-power on-chip applications," *IEEE PESC*, June 2007, pp. 2353–2359.
- [5.6] M.-H. Tu, J.-Y. Lin, M.-C. Tsai and S.-J. Jou, "Single-ended Subthreshold SRAM with Asymmetrical Write/Read-Assist," in *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 12, December 2010, pp.3039-3047.
- [5.7] L. Chang, D.M. Fried, J. Hergenrother, J.W. Sleight, R.H. Dennard, R.K. Montoye, L. Sekaric, S.J. McNab, A.W. Topol, C.D. Adams, K.W. Guarini, W. Haensch, "Stable SRAM cell design for the 32nm node and beyond," *IEEE Symposium on VLSI*, June 2005, pp. 128-129.
- [5.8] T.H. Kim, J. Liu, J. Keane, and C.H. Kim, "A 0.2V, 480 kb subthreshold SRAM with 1k cells per bitline for ultra-low-voltage computing," *IEEE J. of Solid-State Circuits*, vol. 43, no. 2, Feb. 2008, pp. 518-529.
- [5.9] Mahmut E. Sinangil, N. Verma, A.P. Chandrakasan, "A Reconfigurable 8T Ultra-Dynamic Voltage Scalable (U-DVS) SRAM in 65 nm CMOS", *IEEE JSSC*, Nov. 2009, pp. 3163-3173.

### Wei-Hung Du 杜威宏

#### PERSONAL INFORMATION

Birth Date: Aprial. 27, 1987Birth Place: Tainan, TAIWAN.E-Mail Address: <u>dudu0427.ee89g@nctu.edu.tw</u>

#### **EDUCATION**

09/2009 - 09/2011 M.S. in Electronics Engineering, National Chiao Tung University Thesis: Design and Implementation of Near-/Sub-threshold SRAM-based FIFO Memory for WBAN Application 09/2005 - 06/2009 B.S. in Engineering Science, National Cheng Kung University

#### **PUBLICATIONS**

### 

Wei-Hung Du, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "An Energy-Efficient 10T SRAM-based FIFO Memory Operating in Near-/Sub-threshold Regions", SOCC Sept. 2011. (Accepted)

1896

#### PATENTS

Wei-Hung Du, Ming-Hung Chang, Po-Tsang Huang, and Wei Hwang, "Built-in Row-Controlled Dynamic Voltage Scaling Near-/Sub-threshold Asynchronous FIFO memory for WBANs", US/TW Patent Pending (Submitted)