# 國立交通大學

## 電子工程學系 電子研究所

# 博士論文

適用於高能源效率晶片之 可感知變異超低電壓設計

Variation-Aware Ultra-Low Voltage Design for Energy Efficient Chips

研究生:張銘宏

指導教授:黃威教授

中華民國一〇一年六月

# 

## 可感知變異超低電壓設計

### Variation-Aware Ultra-Low Voltage Design for Energy Efficient Chips

| 研 | 究 | 生 | : | 張銘宏 | Student : Ming-Hung Chang | 3 |
|---|---|---|---|-----|---------------------------|---|
|   |   |   |   |     |                           |   |

指導教授: 黃 威 教授 Advisor: Prof. Wei Hwang

### 國立交通大學

電子工程學系 電子研究所

### 博士論文

A Dissertation Submitted to Department of Electronics Engineering and Institute of Electronics College of Electrical and Computer Engineering National Chiao Tung University in partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electronics Engineering

> June 2012 Hsinchu, Taiwan, Republic of China



### 適用於高能源效率晶片之可感知變異超低電壓設計

學生:張銘宏

#### 指導教授:黃威教授

國立交通大學電子工程學系電子研究所博士班

#### 摘要

### 

本論文提出一具備高能源效率設計之動態電壓頻率調整平台。高能源效率設計包括超低 電壓溫度感測器、可感知變異之頻率產生器、高可靠度之超低電壓靜態記憶體與先進先出記憶 體。以上述先進先出記憶體作為驗證電路,實現一個高穩定性的動態電壓頻率系統設計。

超低電壓全晶上頻率基底之溫度感測器可工作於 0.4V 與 0°C~100°C 溫度範圍內,每秒 可有效偵測 45k 次,使用一位元校正機制下,僅有-1.81°C~+1.52°C 的溫度誤差,其實現於 TSMC 65nm 製程下,使用面積為 990µm<sup>2</sup>。Logical effort 是數位設計者常用之技巧,但傳統 的 Logical effort 並未考慮 CMOS 操作於不同工作區間,以及溫度和製程對其造成的影響,本 論文提出一個可應用在 0.1V~1V 間的統一 Logical effort,並且可減少溫度和製程變化所造成 的延遲估計誤差。根據上述的統一 Logical effort,本論文設計一超低電壓頻率產生器,其內建 的感測器可提供資訊動態自我調整鎖定區間誤差,此技術實現於 UMC 65nm 製程下,可產生 625kHz 及 5MHz 最高頻率輸出分別在 0.2V 與 0.5V 下,且其消耗的功率僅各有 0.18µW 與 5.17µW,同時本頻率產生器可合成出 1/8 至 4 倍於參考頻率之輸出。

本論文設計一運用打斷正回授正反器迴圈以改善寫入能力之 9T 靜態記憶體,本記憶體同時具備讀取緩衝以增進寫入可靠度與降低漏電電流,位元交錯結構也可與本靜態記憶體交錯運用以提高軟錯誤的抵抗能力,本靜態記憶體實現於 UMC 65nm 製程下,可工作於電壓為 0.3V 以 909kHz 頻率操作且僅消耗最低能源 3.51µW。為提供無線近身網路系統良好的儲存單元,本論文設計一以 10T 靜態記憶體基底之先進先出記憶體,該先進先出記憶體實現於 UMC 90nm 製程下,可工作於電壓為 0.4V 以 50kHz 頻率操作寫入僅消耗最低能源 2.09µW,同時以 625kHz 頻率操作讀取僅消耗最低能源 2.25µW。

本論文提供一具備高能源效率設計之動態電壓頻率調整平台,以8T 靜態記憶體基底之先 進先出記憶體作為展示電路,提供兩種工作模式:低電壓(0.3V)與高效能(0.5V),若其持續工 作於低電壓模式時可節省 69.5%功率消耗,本平台可適用於高穩定性之無線近身網路應用。

## Variation-Aware Ultra-Low Voltage Design for Energy-Efficient Chips

Student : Ming-Hung Chang

Advisor : Prof. Wei Hwang

### Department of Electronics Engineering & Institute of Electronics National Chiao-Tung University

#### <u>Abstract</u>

Energy efficient design is a k ey focus in emerging energy-constrained platforms. Dynamic voltage frequency scaling (DVFS) platform with energy-efficient designs are presented in this thesis. Ultra-low voltage temperature sensor and variation-aware clock generator are implemented to enable DVFS platform. Robust near-/sub-threshold SRAM/FIFO memories are designed as the test vehicle of DVFS platform.

An ultra-low voltage fully integrated frequency-domain smart temperature sensor is presented. With one-point calibration, a -1.81°C~ +1.52°C inaccuracy over a 0°C~100°C temperature operation range has been measured for 12 test chips. At a conversion rate of 45k samples/s, the proposed temperature sensor consumes an average power of 520nW and achieves 0.49°C/LSB at 11-bit output resolution. It occupies only 990µm<sup>2</sup> in a TSMC 65-nm general purpose bulk CMOS process. The voltage-/temperature-induced delay estimation error of conventional logical effort is much more severe in near-/sub-threshold region. Super-/near-/sub-threshold logical effort models are presented eliminate delay estimation error caused by voltage and temperature variations. A to near-/sub-threshold programmable clock generator is also presented in this thesis. The major challenge of the ultra-low voltage (ULV) circuits is that the lock-in range of the delay line is easily affected by the environmental variations. In the proposed clock generator, there is a PVT compensation unit which consists of a set of delay line and a PVT detector. The unit is responsible for adjusting the lock-in range of clock generator to guarantee successful clock lock. In addition, it has the ability to generate the output clock with frequency from 1/8 to 4 times of the reference clock. The clock generator has been designed using UMC 65nm CMOS technology. The frequencies of reference clock are 625 KHz at 0.2V and 5MHz at 0.5V. The power consumptions are 0.18µW and  $5.17\mu$ W, respectively, at 0.2V and 0.5V. The core area of this clock generator is 0.01mm<sup>2</sup>.

A 9T SRAM bit-cell is presented to enhance write ability by cutting off the positive feedback loop of SRAM cross-coupled inverter pair. In read mode, an access buffer is designed to isolate storage node from read path for better read robustness and leakage reduction. Bit-interleaving scheme is allowed by incorporating the proposed 9T SRAM bit-cell with additional write-wordlines (WWL/WWLb) for soft error tolerance. A 1Kbit 9T 4-to-1 bit-interleaved SRAM is implemented in 65nm bulk CMOS technology. The experimental results demonstrate that the test chip minimum energy point occurs at 0.3V supply voltage. It can achieve an operation frequency of 909kHz with 3.51µW active power consumption. An ultra-low power (ULP) 16Kbit SRAM-based first-in first-out (FIFO) memory is also presented for wireless body area networks (WBANs). The proposed FIFO memory is capable of operating in ultra-low voltage (ULV) regime with high variation immunity. An ULP near-/sub-threshold 10 transistors (10T) SRAM bit-cell is proposed to be the storage element for improving write variation in ULV regime and eliminate the data-dependent bit-line leakage. The proposed SRAM-based FIFO memory also features adaptive power control circuit, counter-based pointers, and a smart replica read/write control unit. The proposed FIFO is implemented to achieve a minimum operating voltage of 400mV in UMC 90nm CMOS technology. The write power is 2.09µW at 50kHz and the read power is 2.25µW at 625kHz.

Finally, a 512-word by 16-bit (8kb) subthreshold asynchronous first-in first-out (FIFO) memory is presented for wireless body area networks (WBANs). Meanwhile, A 1kb dynamic voltage scaling 8T SRAM-based FIFO memory is implemented to operate between 0.5V (near-threshold) and 0.3V (subthreshold) in UMC 65nm technology with 0.535µW at 625kHz and 0.163µW at 20kHz power consumption, respectively. The proposed DVS FIFO memory can provide up t o 69.5% power savings when low-power mode is always engaged, and there is no power overhead if the period of low-power mode is longer than 48.66µs. It is suitable for healthcare applications equipped with DVFS capability.

## Acknowledgements

I would like to thank my parents and brother for all the supports they have given me. Thank you for raising me and guiding me to be the person I am.

I am extremely grateful to my advisor, Prof. Wei Hwang, for providing me a good research environment and giving me the maximum freedom of research. Thank you for all the constructive comments and suggestions on my research.

I would also like to thank all the laboratory fellows and school mates, graduated or still in the school. Thank you for making the school life more delightful. Special thanks to MOEA u-PHI project and ITRI project teams, who had been a great help on my research.

To all my friends not mentioned here, thank you for being friends in my life.



# Table of Contents

| C             | hines  | e Abstract                                                          | i  |
|---------------|--------|---------------------------------------------------------------------|----|
| E             | nglisł | n Abstract                                                          | ii |
| $\mathbf{A}$  | cknov  | wledgement                                                          | iv |
| Ta            | able o | of Contents                                                         | v  |
| $\mathbf{Li}$ | st of  | Tables                                                              | ix |
| $\mathbf{Li}$ | st of  | Figures                                                             | x  |
| 1             | Intr   | oduction 1896                                                       | 1  |
| <b>2</b>      | Pric   | or Works Review                                                     | 3  |
|               | 2.1    | Energy Efficient Techniques for Ultra-Low Voltage Designs           | 5  |
|               |        | 2.1.1 Subthreshold Regimes                                          | 6  |
|               |        | 2.1.2 Near-threshold Regimes                                        | 8  |
|               | 2.2    | Ultra-Low Voltage Memories                                          | 8  |
|               | 2.3    | Variation-Aware Circuits                                            | 11 |
|               | 2.4    | Dynamic Voltage Frequency Scaling                                   | 12 |
|               | 2.5    | Wireless Body Area Sensor Networks                                  | 14 |
| 3             | Ultı   | ra-Low Voltage Temperature Sensor and Clock Generator Design        | 16 |
|               | 3.1    | Ultra-Low Voltage Process-Invariant Frequency-Domain Smart Tempera- |    |
|               |        | ture Sensor Design                                                  | 17 |
|               |        | 3.1.1 Previous Work                                                 | 19 |

|     | 3.1.2  | Subthreshold Frequency-Domain Temperature Sensor Design 22 |                                                          | 22 |
|-----|--------|------------------------------------------------------------|----------------------------------------------------------|----|
|     |        | 3.1.2.1                                                    | Design Principles                                        | 23 |
|     |        | 3.1.2.2                                                    | Simulation Results                                       | 25 |
|     | 3.1.3  | Ultra-Lo                                                   | ow Voltage Frequency-Domain Temperature Sensor with Pro- |    |
|     |        | cess Var                                                   | iation Immunity Enhancement                              | 26 |
|     |        | 3.1.3.1                                                    | Design Principles                                        | 27 |
|     |        | 3.1.3.2                                                    | Implementation                                           | 29 |
|     | 3.1.4  | Experin                                                    | nental Results in 65nm CMOS                              | 31 |
|     | 3.1.5  | Summar                                                     | cy                                                       | 34 |
| 3.2 | Near-/ | Sub-thre                                                   | shold DLL-based Clock Generator with PVT-aware Locking   |    |
|     | Range  | Compen                                                     | sation                                                   | 36 |
|     | 3.2.1  | Unified                                                    | Logical Effort Models                                    |    |
|     |        | 3.2.1.1                                                    | Super-threshold Region                                   |    |
|     |        | 3.2.1.2                                                    | Near-threshold Region                                    | 39 |
|     |        | 3.2.1.3                                                    | Sub-threshold Region                                     |    |
|     | 3.2.2  |                                                            | enerator Architecture                                    |    |
|     | 3.2.3  | PVT-Av                                                     | ware Delay Line Design                                   | 44 |
|     |        | 3.2.3.1                                                    | Variation-Aware Lock-in Delay Line Design                | 44 |
|     |        | 3.2.3.2                                                    | PVT Compensation Delay Line Design                       | 45 |
|     |        | 3.2.3.3                                                    | Delay Ratio of FO1-INV to FO2-NAND                       | 47 |
|     | 3.2.4  | Circuits                                                   | Implementation                                           | 48 |
|     |        | 3.2.4.1                                                    | Control Unit                                             | 48 |
|     |        | 3.2.4.2                                                    | Phase Detector                                           | 49 |
|     |        | 3.2.4.3                                                    | Simulation Results                                       | 50 |
|     | 3.2.5  | Summar                                                     | cy                                                       | 52 |
| Ult | ra-Low | v Voltage                                                  | e Memory Design                                          | 54 |
| 4.1 | 9T Su  | bthreshol                                                  | d SRAM Design with Bit-Interleaving Scheme               | 56 |
|     | 4.1.1  | 9T Subt                                                    | hreshold SRAM Bit-Cell Design                            | 57 |
|     |        | 4.1.1.1                                                    | Basic Operations                                         | 58 |
|     |        | 4.1.1.2                                                    | Layout Considerations                                    | 60 |

4

|     | 4.1.2 | Iso-Area   | a SRAM Bit-Cell $V_{\min}$ Analysis $\ldots \ldots \ldots \ldots \ldots \ldots$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 60 |
|-----|-------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|     |       | 4.1.2.1    | Iso-Area Bit-Cells                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 60 |
|     |       | 4.1.2.2    | Hold-Failure Probability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 62 |
|     |       | 4.1.2.3    | Read-Failure Probability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 63 |
|     |       | 4.1.2.4    | Write-Failure Probability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 64 |
|     |       | 4.1.2.5    | Iso-Area $V_{\min}$ Comparison                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 64 |
|     | 4.1.3 | 1Kbit 9    | $\Gamma$ SRAM Implementation and Measurement Results in 65nm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |    |
|     |       | CMOS       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 65 |
|     |       | 4.1.3.1    | Bit-Interleaving Scheme for Soft Error Rate Reduction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 65 |
|     |       | 4.1.3.2    | 1Kbit 9T Bit-Interleaved SRAM Array Implementation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 67 |
|     |       | 4.1.3.3    | Measurement Results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 69 |
|     | 4.1.4 | Summar     | ry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 71 |
| 4.2 | Energ | y-Efficien | t 10T SRAM-based FIFO Memory Design                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 72 |
|     | 4.2.1 | 10T Nea    | ar-/Sub-threshold SRAM Bit-Cell Design                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |    |
|     |       | 4.2.1.1    | Layout Considerations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |    |
|     |       | 4.2.1.2    | Read Ability Improvement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |    |
|     |       | 4.2.1.3    | Write Ability Improvement                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |    |
|     |       | 4.2.1.4    | Bitline Leakage Reduction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 78 |
|     | 4.2.2 | Iso-Area   | a Dual-Port SRAM Bit-Cell $V_{min}$ Analysis $\hdots \hdots \hdo$ | 80 |
|     |       | 4.2.2.1    | Iso-Area Bit-Cells                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 80 |
|     |       | 4.2.2.2    | Hold-Failure Probability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 83 |
|     |       | 4.2.2.3    | Read-Failure Probability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 83 |
|     |       | 4.2.2.4    | Write-Failure Probability                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 84 |
|     |       | 4.2.2.5    | Iso-Area $V_{min}$ Comparison                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 85 |
|     |       | 4.2.2.6    | Leakage Current Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 86 |
|     | 4.2.3 | 16Kbit I   | Near-threshold SRAM-based FIFO memory in 90nm CMOS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    |
|     |       | for WB.    | ANs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 87 |
|     |       | 4.2.3.1    | Adaptive Power Control Unit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 87 |
|     |       | 4.2.3.2    | Counter-based Pointer Structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 89 |
|     |       | 4.2.3.3    | Smart Replica Read/Write Control Units                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 91 |

|    |                                                                       |                                | 4.2.3.4 Implementations and Simulation Results         | 93  |  |
|----|-----------------------------------------------------------------------|--------------------------------|--------------------------------------------------------|-----|--|
|    |                                                                       | 4.2.4                          | Summary                                                | 94  |  |
| 5  | Dyı                                                                   | namic                          | Voltage Frequency Scaling Platform                     | 95  |  |
|    | 5.1                                                                   | Near-                          | /Sub-threshold Robust 8T SRAM Design                   | 98  |  |
|    |                                                                       | 5.1.1                          | Basic Operations                                       | 100 |  |
|    |                                                                       | 5.1.2                          | Layout Considerations                                  | 103 |  |
|    | 5.2                                                                   | Async                          | chronous 8T-SRAM-based FIFO Memory Design in 65nm CMOS | 104 |  |
|    |                                                                       | 5.2.1                          | Adaptive Power Control System                          | 105 |  |
|    |                                                                       | 5.2.2                          | Read/Write Pulse Control Circuit Design                | 106 |  |
|    |                                                                       |                                | 5.2.2.1 Read Pulse Control Circuit                     | 106 |  |
|    |                                                                       |                                | 5.2.2.2 Write Pulse Control Circuit                    | 107 |  |
|    | 5.3 1Kbit Dynamic Voltage Frequency Scaling 8T-SRAM-based FIFO Memory |                                |                                                        |     |  |
|    |                                                                       | in 65nm CMOS for DVFS Platform |                                                        |     |  |
|    |                                                                       | 5.3.1                          | Switched Capacitor DC-DC Converter                     | 110 |  |
|    |                                                                       | 5.3.2                          | Supply Switch and DVFS Controller                      | 111 |  |
|    |                                                                       | 5.3.3                          | Implementation and Simulation Results                  | 113 |  |
|    |                                                                       | 5.3.4                          | Energy Consumption Analysis                            | 114 |  |
|    | 5.4                                                                   | Summ                           | harv                                                   |     |  |
| 6  | 6 Conclusions and Future Works 117                                    |                                |                                                        |     |  |
| Bi | Bibliography 121                                                      |                                |                                                        |     |  |
| V  | ita                                                                   |                                |                                                        | 148 |  |

# List of Tables

| 3.1 | The Performance Comparison of Recent Temperature Sensors                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 35  |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 3.2 | Functions of $A(T)$ for super-threshold unified logical effort model consid-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |     |
|     | ering supply voltage and temperature                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 40  |
| 3.3 | Functions of $B(T)$ , $C(T)$ , and $D(T)$ for near-threshold unified logical effort                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |     |
|     | model, and Functions of $E(T)$ and $F(T)$ for sub-threshold unified logical                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |     |
|     | effort model considering supply voltage and temperature $\ldots \ldots \ldots$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 41  |
| 3.4 | Specifications of the proposed DLL-based clock generator                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 52  |
| 4.1 | Proposed 9T Bit-Cell Basic Operations Truth Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 58  |
| 4.2 | $V_{min}$ Comparison of Various Bit-Cell Topologies $\ . \ . \ . \ . \ . \ . \ . \ .$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 65  |
| 4.3 | Test chips measurement summary and comparison                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 70  |
| 4.4 | Iso-area calculation considering subarray efficiency                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 82  |
| 4.5 | Device sizing for various bit-cell topologies                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 83  |
| 4.6 | $V_{min}$ Comparison of Various Bit-Cell Topologies $\hdots$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 86  |
| 4.7 | $V_{min}$ Proposed 10T SRAM-based FIFO memory $\hdots \hdots \hd$ | 93  |
| 5.1 | Comparison of various SRAM bit-cells                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 104 |
| 5.2 | Specifications of 1Kbit asynchronous DVFS 8T-SRAM-based FIFO $\ . \ . \ .$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 114 |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |     |

# List of Figures

| 1.1 | Thirty-five years of semiconductor technology scaling $[1.2]$                       | 1  |
|-----|-------------------------------------------------------------------------------------|----|
| 2.1 | Main leakage current components in an NMOS transistor [2.28]                        | 6  |
| 2.2 | Tradeoff between frequency loss, leakage reduction, and area overhead [2.32].       | 8  |
| 2.3 | Energy and delay in different supply voltage operating regions $[2.47]$             | 9  |
| 2.4 | Memory occupied up to 69% chip power as the emerging applications hav-              |    |
|     | ing more critical energy constraints [2.48]                                         | 10 |
| 2.5 | Classification of variations [2.78].                                                | 11 |
| 2.6 | Minimum reported supply voltage for recent ultra-low voltage designs,               |    |
|     | highlighting limitation posed by SRAMs compared with logic [2.14]                   | 13 |
| 2.7 | Sensor power budgets with common power sources [2.2]                                | 15 |
| 3.1 | (a) Temperature-to-propagation-delay-difference generator. (b) Temperature-         |    |
|     | to-frequency-difference generator.                                                  | 19 |
| 3.2 | The linearity of temperature sensitive delay line (TSDL) in super-/sub-             |    |
|     | threshold region.                                                                   | 19 |
| 3.3 | The proposed ultra-low voltage frequency-domain temperature sensor                  | 22 |
| 3.4 | Timing diagram of the proposed fixed pulse width generator                          | 22 |
| 3.5 | Inverter used in sub-threshold temperature sensitive ring oscillator. $\ . \ . \ .$ | 25 |
| 3.6 | The proposed frequency-domain temperature sensor under (a) process vari-            |    |
|     | ation, and (b) voltage variation                                                    | 26 |
| 3.7 | Block diagram of the proposed ultra-low voltage frequency-domain tem-               |    |
|     | perature sensor with process variation immunity enhancement                         | 27 |

| 3.8  | The effect of process variation on the proposed process invariant tempera-               |    |  |
|------|------------------------------------------------------------------------------------------|----|--|
|      | ture sensor                                                                              | 29 |  |
| 3.9  | The implementation of the proposed process invariant temperature sensor.                 | 29 |  |
| 3.10 | Timing diagram of the proposed process invariant temperature sensor                      | 30 |  |
| 3.11 | Microphotograph of the proposed process invariant temperature sensor                     | 31 |  |
| 3.12 | Measurement environment for the test chips                                               | 32 |  |
| 3.13 | Bare die of the test chip on PCB board.                                                  | 32 |  |
| 3.14 | Measured error curves for 12 test chips                                                  | 33 |  |
| 3.15 | Measurement results for 12 test chips                                                    | 33 |  |
| 3.16 | Measurement error curves for supply voltage variations                                   | 34 |  |
| 3.17 | Concept diagram of PVT compensation.                                                     | 36 |  |
| 3.18 | Two cascaded FO1 inverters                                                               | 39 |  |
| 3.19 | Proposed clock generator for near-/sub-threshold DVFS system                             | 42 |  |
| 3.20 | Proposed finite state machine (FSM)                                                      | 43 |  |
| 3.21 | Timing diagram of our FMS operating from Reset to Lock state                             | 44 |  |
| 3.22 | Lock-in delay line (lattice delay line [3.28]) used in our proposed clock                |    |  |
|      | generator                                                                                | 45 |  |
| 3.23 | PVT compensation delay line used in our proposed clock generator                         | 46 |  |
| 3.24 | Proposed PVT detector.                                                                   | 46 |  |
| 3.25 | Monte Carlo simulations for periods of ring oscillators (composed of FO1-                |    |  |
|      | INV and FO2-NAND) (a) 0.2V supply voltage, and (b) 0.5V supply voltage.                  | 48 |  |
| 3.26 | Control unit including (a) lock-in delay line controller, and (b) SEL generator.         | 48 |  |
| 3.27 | (a) Phase detector, and (b) RSTPD generator                                              | 50 |  |
| 3.28 | $\rm PVT$ compensation for locking range of proposed generator at (a) 0.2V,              |    |  |
|      | TT, w/o compensation, (b) 0.2V, TT, with compensation, (c) 0.2V, FF,                     |    |  |
|      | w/o compensation, (d) 0.2V, FF, with compensation, (e) 0.5V, TT, w/o $$                  |    |  |
|      | compensation, (f) 0.5V, TT, with compensation, (g) 0.5V, FF, w/o com-                    |    |  |
|      | pensation, (h) 0.5V, FF, with compensation. $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 51 |  |
| 3.29 | Layout view of our DLL-based clock generator under UMC 65nm bulk                         |    |  |
|      | CMOS technology.                                                                         | 52 |  |

| 4.1  | Wireless sensor node block diagram for the WBAN system                           |    |  |
|------|----------------------------------------------------------------------------------|----|--|
| 4.2  | Block diagram of the proposed 9T bit-cell. The relative threshold voltage        |    |  |
|      | ratio of high $V_t$ MOSFET to regular $V_t$ one is 1.3 to 1                      | 57 |  |
| 4.3  | (a) Proposed 9T bit-cell in hold operation, and (b) HSNM performance             |    |  |
|      | comparison.                                                                      | 58 |  |
| 4.4  | (a) Proposed 9T bit-cell in read operation, and (b) RSNM performance             |    |  |
|      | comparison.                                                                      | 59 |  |
| 4.5  | (a) Proposed 9T bit-cell in write operation, and (b) write margin perfor-        |    |  |
|      | mance comparison.                                                                | 59 |  |
| 4.6  | Layout view of the proposed 9T bit-cell. Its size is $1.92\times$ larger than 6T |    |  |
|      | mincell                                                                          | 60 |  |
| 4.7  | Hold-failure probability comparison                                              | 62 |  |
| 4.8  | Read-failure probability comparison.                                             | 63 |  |
| 4.9  | Write-failure probability comparison.                                            | 64 |  |
| 4.10 | Standard 4-to-1 bit-interleaved SRAM array                                       | 66 |  |
| 4.11 | Schematic illustration of the proposed 9T bit-cells free of write-half-select    |    |  |
|      | problem                                                                          | 66 |  |
| 4.12 | HSNM distributions of write-half-selected 9T/8T bit-cells                        | 67 |  |
| 4.13 | Block diagram of 1Kbit 9T bit-interleaved SRAM                                   | 67 |  |
| 4.14 | Read replica column and read pulse control circuit.                              | 68 |  |
| 4.15 | Write pulse control circuit.                                                     | 69 |  |
| 4.16 | Die photo and layout view for 1Kbit 9T SRAM test chip fabribated in              |    |  |
|      | 65nm bulk CMOS process                                                           | 71 |  |
| 4.17 | Measured power of 1Kbit 9T SRAM versus VDD                                       | 72 |  |
| 4.18 | Standard FIFO memory and its power consumption ratio.                            | 72 |  |
| 4.19 | Conventional dual-port 8T bit-cell.                                              | 74 |  |
| 4.20 | Proposed dual-port 10T bit-cell                                                  | 75 |  |
| 4.21 | Layout view of the proposed dual-port 10T bit-cell in UMC 90nm CMOS $$           |    |  |
|      | technology                                                                       | 76 |  |
| 4.22 | Prior dual-port SRAM bit-cells configurations                                    | 77 |  |

| 4.23 | (a) Proposed 10T bit-cell in read operation, and (b) Read SNM comparison        |    |
|------|---------------------------------------------------------------------------------|----|
|      | in read mode                                                                    | 77 |
| 4.24 | Read SNM distributions of Monte Carlo simulations (100,000 times)               | 78 |
| 4.25 | Proposed 10T bit-cell in write operation                                        | 78 |
| 4.26 | Write margin distributions of Monte Carlo simulations (100,000 times)           | 79 |
| 4.27 | (a) Proposed 10T bit-cell in hold operation, and (b) Hold SNM comparison        |    |
|      | in hold mode                                                                    | 79 |
| 4.28 | Data-independent bitline leakage reduction scheme                               | 80 |
| 4.29 | Sensing margin comparisons under the worst case scenario                        | 80 |
| 4.30 | Thin-cell layout style (a) conventional DP 8T mincell, and (b) SE 8T mincell.   | 81 |
| 4.31 | Thin-cell layout style (a) conventional DP 8T iso-area bit-cell, and (b) SE     |    |
|      | 8T iso-area bit-cell.                                                           | 82 |
| 4.32 | Hold-failure probability comparison                                             | 84 |
| 4.33 | Read-failure probability comparison.                                            | 85 |
| 4.34 | Write-failure probability comparison.                                           | 86 |
| 4.35 | Write-failure probability comparison.                                           | 87 |
| 4.36 | Block diagram of the proposed 16K<br>bit SRAM-based FIFO memory                 | 88 |
| 4.37 | FIFO memory operation example.                                                  | 88 |
| 4.38 | (a) Adaptive power control finite state machine, and (b) $(i + 1)_{th}$ word of |    |
|      | storage element.                                                                | 89 |
| 4.39 | Block diagram of the proposed counter-based pointer                             | 90 |
| 4.40 | The synchronous counter-based pointer (a) schematic view, and (b) power         |    |
|      | consumption comparisons.                                                        | 90 |
| 4.41 | SRAM write delay in different process corner and temperature                    | 91 |
| 4.42 | Proposed smart replica read/write control units                                 | 92 |
| 4.43 | (a) Floorplan and layout views of our 16K<br>bit 10T SRAM-based FIFO mem- $$    |    |
|      | ory, and (b) power reduction ratio by the proposed energy-efficient techniques. | 94 |
| 5.1  | Micro-watt wireless wearable healthcare ECG microsystem block diagram .         | 95 |
| 5.2  | A wireless sensor node with two operating modes: Low-power Mode and             |    |
|      | High-performance Mode.                                                          | 97 |

| 5.3  | Proposed 8T SRAM bit-cell                                                                      |  |  |
|------|------------------------------------------------------------------------------------------------|--|--|
| 5.4  | $V_t$ , $I_{on}$ - $I_{off}$ -ratio, and delay versus channel length of proposed 8T SRAM       |  |  |
|      | bit-cell                                                                                       |  |  |
| 5.5  | Hold mode of proposed 8T SRAM bit-cell                                                         |  |  |
| 5.6  | Read mode and butterfly curve of proposed 8T SRAM bit-cell 100                                 |  |  |
| 5.7  | The distributions of read SNM of Monte Carlo simulation                                        |  |  |
| 5.8  | Read-bitline leakage reduced by read-buffer-footers                                            |  |  |
| 5.9  | (a) Hierarchical read-bitline scheme with footer in global read-bitline, and                   |  |  |
|      | (b) $I_{read}$ - $I_{leakage}$ -ratio of 512-bit(dot-line)/32-bit(solid-line) per read-bitline |  |  |
|      | with/without RSCE and read-buffer-footer                                                       |  |  |
| 5.10 | Equivalent circuit of the proposed 8T SRAM bit-cell in write operation $\ . \ . \ 103$         |  |  |
| 5.11 | (a) The distributions of write margin performing Monte Carlo simulation,                       |  |  |
|      | and (b) write delay performance comparison                                                     |  |  |
| 5.12 | Layout view of the proposed 8T SRAM bit-cell                                                   |  |  |
| 5.13 | Block diagram of proposed asynchronous 8T-SRAM-based FIFO 105                                  |  |  |
| 5.14 | (a) The adaptive power control system (b) $i_{th}$ word of storage element 106                 |  |  |
| 5.15 | The replica column for read operation and read pulse control circuit 107                       |  |  |
| 5.16 | The replica column for write operation and write pulse control circuit $\ . \ . \ 108$         |  |  |
| 5.17 | Block diagram of the proposed dynamic voltage frequency scaling 8T-                            |  |  |
|      | SRAM-based FIFO as a demonstration DVFS platform                                               |  |  |
| 5.18 | Switched capacitor DC-DC converter                                                             |  |  |
| 5.19 | DVFS controller and its timing diagram                                                         |  |  |
| 5.20 | Layout view and die photo of 1Kbit asynchronous DVFS 8T-SRAM-based                             |  |  |
|      | FIFO                                                                                           |  |  |
| 5.21 | Energy consumption comparisons of 1Kbit 8T-SRAM-based FIFO with                                |  |  |
|      | DVFS and without DVFS                                                                          |  |  |
| 6.1  | Proposed power management system architecture                                                  |  |  |
| 6.2  | PVT-aware ultra-low voltage DVFS FIFO system                                                   |  |  |
| 6.3  | PVT sensors for 3D-IC package technology                                                       |  |  |

## Chapter 1

## Introduction

Driven by the growing demands on battery-operated or self-powered mobile applications, high energy efficiency becomes the driving force for digital circuit design. For most scenarios, energy harvested from the ambient is in the orders of micro-watts, necessitating the circuit implementations to be very efficient in terms of energy consumption [1.1]. Therefore, ultra-low power designs for wireless devices have three primary concerns: small form factor, long lifetime, and low cost. In order to fulfill those requirements, the emerging digital circuit design targets are area-efficiency, energy-efficiency, and robustness.



Figure 1.1: Thirty-five years of semiconductor technology scaling [1.2].

Advances in sub-threshold circuit design have recently demonstrated capabilities compatible with aggressive energy consumption reduction. However, the drawbacks of subthreshold design are: the dramatically increased leakage plus decreased  $I_{ON}$ - $I_{OFF}$ -ratio, and the increased energy efficiency comes at the cost of performance loss. As shown in Fig. 1.1, technology scaling shrinks feature size by 70% every generations. However, power density doubles, leakage current increases by 25%, and  $I_{ON}$ - $I_{OFF}$ -ratio degrades by 60%. For short channel devices, parameter variations affect design performance more and result in larger threshold voltage variation. On the other hand, dynamic voltage frequency scaling (DVFS) is a popular solution to have energy efficiency and performance concurrently. In other words, if the throughput constraint is cycling between different operating modes, adjusting the supply voltage for the requirements of each mode can provide significant energy savings.

An overview of this work is as follows. In Chapter 2, previous work and basic energyefficient techniques will be introduced. Meanwhile, wireless body area sensor networks (WBANs) will also be discussed to give the concept of biomedical device standard. An ultra-low voltage temperature sensor with high process variation immunity is first presented in Chapter 3. Also, an unified logical effort model is presented to speed up ultra-low voltage circuit conceptual design. Based on the proposed model, a near-/sub-threshold DLL-based clock generator with PVT-aware locking range compensation is first presented. A 9T subthreshold SRAM design with bit-interleaving scheme is presented in Chapter 4. An energy-efficient 10T SRAM-based FIFO memory design is also presented. As the test vehicle of proposed dynamic voltage frequency scaling (DVFS) platform, a 8T-SRAMbased FIFO design in 65nm CMOS is first presented in Chapter 5. Finally, conclusions and possible future research directions will be discussed in Chapter 6.

## Chapter 2

## **Prior Works Review**

For emerging battery-powered/energy-harvested portable electronic devices, there are three major design requirements. They are long lifetime, low cost, and tiny form factor [2.1–2.5]. In order to meet these requirements, the development of digital system design concentrated on finding ultra-low-power, robust, and area-efficient solutions.

Power consumption is the sum of dynamic power and leakage power.

$$P_{active} = P_{dynamic} + P_{leakage} = fCV_{DD}^2 + I_{leakage}V_{DD}$$
(2.1)

Firstly, lowering supply voltage is an effective strategy to achieve long lifetime since dynamic energy consumption has a square dependence on the supply voltage [2.6, 2.7].

$$P_{dynamic} = f C V_{DD}^2 \tag{2.2}$$

where f is the switching frequency, C is the effective switched capacitance of the circuit, and  $V_{DD}$  is the supply voltage. Secondly, leakage current becomes a critical issue in nanometer regimes since subthreshold leakage currents vary exponentially with threshold voltage [2.8].

$$I_{leakage} = \frac{W}{W_0} I_0 \bullet 10^{(V_{GS} - V_{th})/S}$$
(2.3)

where  $U_T$  is the thermal voltage, W is the device width, and  $S = nU_T ln10$  is the subthreshold slope. The leakage power consumption can be much worse if the switch activity is low. Leakage current reduction techniques become a necessary requirement of energy efficient chips. Ultra-low voltage operations are being examined capable of providing orders of magnitude less power than standard-1V operations. Meanwhile, the minimum energy operations of logic and memory usually occur in the subthreshold and near-threshold regions [2.6,2.7,2.9,2.10]. Successful energy efficient techniques are discussed in Sec. 2.1 for both subthreshold and near-threshold regions. Also, state-of-the-art ultra-low voltage SRAM designs including new bit-cells, novel sensing schemes, and read/write assist circuits are introduced in Sec. 2.2.

However, performance loss and reliability degradation are two major problems for ultra-low voltage design. To retain or improve performance, it is necessary to reduce the threshold voltage as well, resulting in the exponential increase of the subthreshold leakage. On the other hand, global systematic and local random environmental variations in process, supply voltage, and temperature (PVT) are posing a major challenge to the future nanometer circuit design [2.11, 2.12]. In addition, aging variations degrade device robustness and strength when a device is used for a long period of time. Therefore, subthreshold leakage, PVT environmental variations, and aging variations monitoring and smart variability-resistant designs are necessary. The related researches on variationaware circuits are discussed in Sec. 2.3.

In order to retain the excellent energy efficiency while reducing performance loss, dynamic voltage frequency scaling (DVFS) [2.13] is an effective means for time-varying workload in wireless devices. It reduces supply voltage to enhance battery lifetime while only providing maximum performance when required. For applications with wide spread of workload intensity, DVFS technique is the key to build an optimum energy saving system. Recently, ultra-dynamic voltage scaling (UDVS) technique [2.14, 2.15] where supply voltage is reduced to less than threshold voltage was presented. Many successful designs based on DVFS concepts are surveyed in Sec. 2.4.

One popular energy-limited application with time-varying throughput is healthcare monitoring wearable body area sensor networks (WBANs). The standard of WBANs is under development by IEEE 802.15 TG6 [2.16] for low power devices operation on, in or around the human body. Typical WBANs consist of sensor nodes recognized as an enabling technologies for continuous and noninvasive measurements of vital signs such as body temperature, heart rate, and electrocardiogram (ECG/EKG). However, the wearable nature of the sensor nodes constrain form factor size and energy budget because battery replacement may be difficult or impossible. Sec. 2.5 reviews related work on energyefficient circuit designs for WBANs.

## 2.1 Energy Efficient Techniques for Ultra-Low Voltage Designs

Until the early 2000s, high performance design was the major trend of digital circuits. However, the cost-effective cooling solutions can only provide around 100W power consumption. Meanwhile, power-limited portable devices rapidly grew in the last 20 years. Traditional low power techniques including switching activity reduction, pipelining, alllevel parallelism, interconnect/logic optimization are no longer sufficient for micro-power microsystems. Several effective ideas have been drawn attention. Digital-assisted analog design for signal calibration and variation compensation became popular [2.17] as technology scaling down to nanometer range. A new FDSOI process technology [2.18] and a novel 3-D IC package technique [2.19–2.21] are also primary focuses to provide optimum operation while maintaining energy efficiency. Recently, the primary focus to achieve energy efficient digital designs is ultra-low voltage operations [2.3, 2.5, 2.8, 2.22, 2.23].

Ultra-low power circuits demonstrate a huge potential in enhancing the lifetime of portable/bio-medical applications. It is because supply voltage reduces dynamic energy consumption quadratically. However, leakage in subthreshold region increases dramatically and drain current decreases exponentially both impacting  $I_{ON}$ - $I_{OFF}$ -ratio. It can significantly degrade the devices performance and reliability. To aid in selection of gate size for leakage reduction of ultra-low voltage designs, a new framework for widely-used logical effort method [2.24] must be modified. Logical effort is defined as the ratio of the input capacitance of a gate to that of an inverter delivering the same amount of output current. It is for quickly estimating the optimal delay time and optimize super-threshold logic paths. Previous research about subthreshold logical effort for maximum drive current was present by Keane [2.25]. A framework was presented by choosing the optimal transistor

stack sizing factor for best performance. Later, an ultra-low voltage sizing method is proposed to minimizing OFF leakage current and maximizing ON active current at the same time [2.26]. The logical effort models extend the original high-performance-oriented design in super-threshold region to energy-efficiency-oriented design in near-threshold and subthreshold regions. Meanwhile, supply voltage and temperature variations are both taken into account.

#### 2.1.1 Subthreshold Regimes

Lowering supply voltage toward subthreshold region can help portable devices power budget under control. However, the penalties of working in such region are slower speeds, reduced  $I_{ON}$ - $I_{OFF}$ -ratio, and increased sensitivity to variations. [2.27]. Generally, energystringent applications like wireless sensor nodes, biomedical sensors, and battery-free electronics tend to have fairly low speed requirement. Although leakage current is decreased with supply voltage scaling down, the  $I_{ON}$ - $I_{OFF}$ -ratio in subthreshold region reduced down to only 160X (7,000X in super-threshold region) as stated in [2.10]. That is because subthreshold conduction drain current is far more less than super-threshold one. Therefore, it is essential to identify what leakage current components needs to be reduced.



Figure 2.1: Main leakage current components in an NMOS transistor [2.28].

There are four major short-channel leakage mechanisms as illustrated in Fig. 2.1. They are reverse-biased junction leakage current  $(I_{REV})$ , gate induced drain leakage  $(I_{GIDL})$ , gate direct tunneling leakage  $(I_{Gate})$  [2.29], and subthreshold leakage  $(I_{SUB})$  [2.28, 2.30].

For the leakage current of an OFF transistor  $I_{OFF}$ ,

$$I_{OFF} = I_{REV} + I_{GIDL} + I_{SUB} \tag{2.4}$$

Note that  $I_{Gate}$  is not included because the transistor gate is not at a high potential. Because of the low threshold voltage in nanometer technology,  $I_{SUB}$  typically dominates  $I_{OFF}$ .

There are plenty of successful energy-efficient techniques to reduce leakage current in subthreshold region. They are power gating through the use of sleep transistors [2.31–2.37], multiple threshold voltage CMOS (MTCMOS) [2.38, 2.39], and body bias control [2.3, 2.22, 2.30, 2.40–2.45].

Power gating technique is adding a header and/or footer (called sleep transistor) between the actual power/ground rail and the virtual power/ground. It helps to turn off the leakage current path during standby mode. In order to design the power gating devices efficiently, there exists three main design challenges. They are power gating structure, sleep transistor sizing, and supply noise minimization. A power gating structure presented in [2.35] that supports both a cutoff mode and an intermediate power-saving and dataretaining mode. In [2.33], an algorithm estimating the voltage drop and minimizing the size was presented. Moreover, an optimal sizing scheme in [2.32] using an explicit noise and impedance model was developed for supply noise minimization. As shown in Fig. ??, the width of power gating device can be a tradeoff between frequency loss, leakage reduction, and area overhead.

Most of advanced technologies provide MTCMOS technique to achieve high performance and low power demands. High  $V_{th}$  devices can reduce leakage current by sacrificing speeds. On the other hand, Low  $V_{th}$  devices can be operated faster than normal ones with low leakage overhead. In [2.38], a series-connected low  $V_{th}$  power gating structure with two virtual ground ports was presented to reduce  $I_{Gate}$ , wake-up time, and rush current. Meanwhile, a design methodology that enables local insertion of sleep devices for sequential and combinational circuits was presented in [2.39]. It also prevented most sneak leakage paths.

Utilizing the body effect, the device threshold voltage can be controlled by the substrate bias. It can provide high- $V_{th}$  characteristic in standby mode and low- $V_{th}$  one in ac-



Figure 2.2: Tradeoff between frequency loss, leakage reduction, and area overhead [2.32].

### 

tive mode [2.30,2.44,2.45]. However, it may increase the depletion width of the MOSFET parasitic junction diode and rapidly increases the BTBT current between the substrate and source/drain, especially in halo implants. In [2.40], optimum body bias voltages were generated for different temperature and process conditions adaptively based on the PVT monitoring and controlling systems. The power supply variations were also compensated based on the propagation delay change of the inverter chain.

1896

### 2.1.2 Near-threshold Regimes

Minimum energy operations for logic are usually happened in subthreshold region. However, it was reported in [2.46] that a 20% increase in energy from the minimum energy point gives back ten times in performance. Therefore, near-threshold operation can be more energy efficient than subthreshold region from energy-delay-product (EDP) view. For a broad range of power-constrained computing segments from sensors to high performance servers, near-threshold operation is preferred because it it more robust than subthreshold one and energy efficient than super-threshold one as shown in Fig. 2.3 [2.47].

### 2.2 Ultra-Low Voltage Memories

In highly energy constrained applications, the memory power consumption drives the need for ultra-low voltage operations as shown in Fig. 2.4 [2.48]. Traditional 6T bit-cell



Figure 2.3: Energy and delay in different supply voltage operating regions [2.47].

without large area overhead cannot survive in subthreshold region because of its read disturb nature. Meanwhile, bitline leakage in 6T SRAMs limits the number of bit-cells on a bitline to 16 [2.49]. In order to overcome the challenges of performing robust ultra-low voltage read/write/hold operations, several successful ultra-low voltage memory designs [2.10, 2.37, 2.48–2.74] were presented. Some of them presented novel bit-cells to avoid disturbances. Some of them presented read-/write-assist techniques in architecture level.

For novel bit-cells, a 5T bit-cell [2.57] used sizing asymmetry to improve read stability. Another 5T bit-cell [2.71] utilized dynamic read stability was presented. A read-staticnoise-margin-free 7T bit-cell [2.72] was presented to overcome the limits to the speed of 6T SRAM with a 0.5V supply voltage. Also, a 9T bit-cell [2.50] with bit-interleaving scheme enhances write ability by cutting off the positive feedback loop of inverter pair. The 9T bit-cell can reliably operate at the minimum energy point 0.3V. Meanwhile, 8T [2.53, 2.59, 2.61, 2.63–2.65, 2.67, 2.75] and 10T [2.49, 2.51, 2.66, 2.68, 2.70] bit-cells have various structure settings. In [2.67], read buffer was used to ensure read stability. It can



Figure 2.4: Memory occupied up to 69% chip power as the emerging applications having more critical energy constraints [2.48].

achieve a minimum operating voltage of 350mV. Utilization of the reverse short channel effect in 8T bit-cell [2.65] improved its write margin and read performance without the aid of peripheral circuits. Asymmetrical write-assist 8T bit-cell [2.59] with virtual ground biasing scheme was presented to achieve 0.2V supply voltage. In [2.53], a fully differential 8T bit-cell that allows bit-interleaving to achieve soft-error tolerance. For 10T bit-cell in [2.49], it used four extra transistors to implement a read buffer. The buffer solved read disturbance in 6T bit-cell and relaxed the bitline integration limitation. A schmitt trigger (ST) based differential 10T bit-cell [2.70] was presented to achieve  $1.56 \times$  higher read static noise margin compared to 6T bit-cell at 0.4V supply voltage. It can be operated at a supply voltage of 160mV. Then, another ST-based 10T bit-cell [2.76] was presented to achieve soft-error tolerance by providing bit-interleaving structure. A detail iso-area analysis was also reported in [2.51].

Other than novel bit-cell structure, peripheral assist techniques for stability improvement were presented in dynamic and static ways [2.56]. The dynamic ones include vertically routed  $V_{DD}/V_{SS}$ , horizontally routed  $V_{DD}/V_{SS}$ ,  $V_{wordline}$  adjustment,  $V_{bitline}$  adjustment, and replica scheme [2.68]. The static ones include  $V_{wordline}$  setting, dual supply voltage [2.37] or voltage scalable [2.58, 2.60, 2.65], and adaptive body bias control [2.65]. Meawhile, reducing  $V_{wordline}$  and/or increase bit-cell  $V_{DD}$  can increase read stability. During write operations, reducing bit-cell  $V_{DD}$  [2.69] and/or employing negative  $V_{bitline}$  [2.52, 2.77] can improve write margin [2.54].

### 2.3 Variation-Aware Circuits

Process variations [2.78] can be taken as two parts: global die-to-die (D2D) and local within-die (WID) variations. Global D2D process variations come from different runs, lots, and wafers. Local WID process variations due to fundamental physical or process control limitations. The major sources of WID process variations include random dopant fluctuation, channel length variation, line edge roughness, oxide charge variation, mobility fluctuation, gate oxide thickness variation, and channel width variation [2.79, 2.80]. The first two variations are the dominant sources of WID process variations in current technology. The other critical global variation is lifetime aging problems. They include negative bias temperature instability (NBTI) [2.81], positive bias temperature instability (PBTI), hot carrier injection (HCI), time-dependent dielectric breakdown (TDDB), and electromigration. Two successful on-chip aging sensors [2.82, 2.83] were presented to monitor the performance degradation. In [2.82], the sensor achieved a direct correlation between the threshold voltage degradation and the phase difference.

|        |                                                                                        | DYNAMIC                                                              |                                          |  |
|--------|----------------------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------------|--|
|        | Extremely SLOW                                                                         | SLOW-Changing                                                        | FAST-Changing                            |  |
| Global | Die-to-Die (D2D) Process Variations<br>Lifetime Degradation<br>(NBTI, PBTI, HCI, TDDB) | Voltage Variations (Package/Die)<br>Temperature Variations (Ambient) | PLL Jitter<br>IR drop<br>Ldi/dt          |  |
| Local  | Within-Die (WID) Process Variations                                                    | Temperature Variations (Hot-Spot)                                    | Capacitive Coupling<br>Clock-tree Jitter |  |

Figure 2.5: Classification of variations [2.78].

As shown in Fig. 2.5, various sources of variations according to their spatial and temporal rate-of-change [2.78]. Other than process variations, voltage and temperature variations are also needed to be reduced. The impact of variations leads to lower noise margins, reliability degradation, large power consumption, and temporal degradation. There are lots of previous researches related to variation-aware logic [2.1, 2.42, 2.43, 2.79, 2.80, 2.84–2.92] and memories [2.9, 2.11, 2.48, 2.93–2.95] for digital circuit performance and yield maintaining.

In [2.84, 2.85], local spatial variations on digital circuit performance was presented to on-chip measure the impact on FET current. A gated osillator was presented to be an all-digital measurement circuit for dynamic supply noise waveform. By taken WID process variations into account, a variation-aware optimal supply voltage scaling mechanism was presented in [2.89, 2.90]. To ensure the logic functionality, voltage transfer characteristic can be an indicator [2.9]. Meanwhile, soft error models accounting for D2D and WID process variations in subthreshold SRAM bit-cells were presented in [2.93]. Because SRAMs generally need to retain data, the low-leakage data-retention techniques in the presence of variations were analyzed in [2.48].

In order to monitor real-time on-chip environmental status, process, voltage, and temperature sensors are essential for variation-aware circuits. One major key focus is in smart temperature sensors [2.96–2.107]. Recently, process and voltage sensors [2.108] and threshold voltage sensors [2.81] also require close attention.

### 2.4 Dynamic Voltage Frequency Scaling

Emerging applications like implantable/wearable medical devices, wireless sensor networks and hand held electronics are battery-powered or even battery-free. However, the demand for diverse functionalities to be integrated in these applications creates a serious power management bottleneck. Power management techniques [2.109–2.118] are paramount for energy efficient chips. Utilizing on-chip sensors and an embedded microcontroller to measure power and temperature status, and modulate both voltage and frequency to maximize performance is applied on a 90-nm Itanium family processor [2.117]. Also, a multidimensional adaptive power management approach [2.116] optimally trades-off power and performance by concurrently tuning supply voltage in RF and digital baseband components. In [2.114], a online-learning algorithm for system-level



power management was presented with extremely lightweight and negligible overhead.

Figure 2.6: Minimum reported supply voltage for recent ultra-low voltage designs, highlighting limitation posed by SRAMs compared with logic [2.14].

Energy consumption is the sum of leakage energy and switching energy. In Sec. 2.1, techniques for leakage energy reduction are discussed. For switching energy reduction, dynamic voltage frequency scaling (DVFS) [2.7, 2.9, 2.13–2.15, 2.19, 2.27, 2.60, 2.61, 2.64, 2.89, 2.90, 2.119–2.127] serves as an energy effective solution in response to varying performance requirement. It was reported in [2.7] that minimum energy point (MEP) occurred in sub-threshold regions. MEP depends heavily on leakage current, which itself depends on supply voltage [2.124]. A circuit was presented to determine an optimal low activity supply voltage for energy-efficient DVFS. The reported minimum operational supply voltage has two different trends for SRAMs and logic as shown in Fig. 2.6. SRAMs pose a critical limitation in DVFS systems because they are far more less activity factors and sensitive to leakage than logic.

For DVFS platform, highly efficient power conversion achieved by DC-DC converters [2.2, 2.128–2.133] not only in sleeping mode at very light load condition but also in high-speed mode at very heavy load condition. Meanwhile, generating the clock frequencies [2.134–2.141] of DVFS platform is another critical challenge. Level converters [2.142, 2.143] capable of converting voltage from subthreshold to super-threshold regions are also

essential. According to [2.48], SRAMs occupied up to 69% chip power as the emerging applications having more critical energy constraints. Several successful DVFS SRAM [2.37, 2.48, 2.58, 2.60, 2.61, 2.64, 2.74, 2.119] implementations were also draw lots attention.

The state-of-the-art energy-efficient chips [2.23, 2.144, 2.145] were usually operated in ultra-low voltage domains and utilized DVFS technique. A 180-mV subthreshold FFT processor using minimum energy design methodology was presented in [2.6]. In [2.146], a fully integrated power management unit was implemented for GSM baseband-radios. A 167-processor computational platform with per-processor DVFS circuits was presented in [2.147]. Its DVFS controller provides three methods: 1) static, 2) dynamic runtime through software, and 3) dynamic runtime through local hardware for voltage and frequency setting. A near-/sub-threshold multi-standard JPEG co-processor was presented in [2.148]. It adopt a configurable  $V_{th}$  balancing scheme to enable ultra wide range  $V_{DD}$ scaling. Twenty-five power domain control was used in H.264 Full-HD decoding application processor [2.19].

## 2.5 Wireless Body Area Sensor Networks

Wireless body area sensor networks (WBANs) [2.2,2.4,2.149–2.151] are driven by growing aging population worldwide [2.152]. WBANs followed IEEE 802.15 TG6 [2.16] for low power devices operation on, in or around the human body. One recent famous application is wearable medical microsystems that measure human vital signs, e.g. electrocardiogram (ECG), electroencephalography (EEG), heart rate (HR), and blood pressure (BP). Most wearable medical devices include sensors, a analog frontend, a digital baseband and signal processing unit, a battery, a reference oscillator, and a RF transceiver. To ease the burden of human carrying, the form factor of it should be tiny. The volume should be less than  $1cm^3$ , and weight should be lighter than 100g [2.153]. As for common power sources of sensor, the small form factor also restrict us with tight power budgets as shown in Fig. 2.7.

In order to solve the major energy-limited constraint, an 0.5V to 1.0V 16-bit biomedical signal processing platform in [2.154] can achieve  $10.2 \times$  and  $11.5 \times$  energy reduction when running complete EEG and EKG applications respectively. Voltage scaling and



Figure 2.7: Sensor power budgets with common power sources [2.2].

block-level power gating optimizes energy efficiency under applications of varying complexity. In [2.155], a EEG acquisition SoC with integrated feature extraction processor was presented for a chronic seizure detection. It only consumed  $9\mu$ J per feature vector by reducing the rate of wireless EEG data transmission. Meanwhile, using multi-tone code division multiple access (MT-CDMA) and orthogonal frequency division multiple access (OFDM), a 0.5V dual-mode baseband transceiver [2.156] can meet up to 8 multi-user coexistence. This chipset can achieve 4.85Mbps with power consumption of  $5.52\mu$ W. Two successful general purpose subthreshold sensor processors [2.3, 2.157] were also presented with excellent energy efficiency of 2.6pJ and 3.5pJ per instruction respectively. Some other energy efficient techniques for WBANs were also presented in [2.158–2.160] for frequency tracking loop, signal component separator, and digitally controlled oscillator.

## Chapter 3

# Ultra-Low Voltage Temperature Sensor and Clock Generator Design

Thermal and power management are major challenges in emerging energy-constrained applications with lifetimes of months to years. A fully integrated high-resolution, smallsize, and ultra-low power temperature sensor is the key to providing vital environmental data for management units efficiency enhancement. On the other hand, pursuing longer operational lifetimes of portable platforms has driven the integrated circuit design into ultra-low voltage regime where process, voltage, and temperature (PVT) variations are much more severe than the conventional super-threshold design [3.1–3.3]. In this regime, threshold voltage shifts caused by local variation exponentially exacerbate the weak  $I_{ON}$ - $I_{OFF}$ -ratio. In order to ensure the functionality in the presence of PVT variations, it motivates the design of variation-aware near-/sub-threshold circuits [3.4]. In some energylimited miniature devices, they are powered by energy harvesting from the environment to increase the lifetime. The supply voltage it generated is usually not larger than 0.5V. Therefore, a temperature sensor capable of ultra-low voltage operation is essential. Moreover, a new class of package technologies, three-dimensional integrated circuit (3D-IC) [3.5, 3.6], for achieving multi-function integration, improving system speed, and reducing power consumption makes on-die hot-spot problem even worse because of increasing power density and unbalanced thermal stresses distribution. Temperature variations over time induced by those stacking structures in 3D-IC require a fast and area-efficient temperature

sensor to enable real-time multiple-location hot-spot detection.

With the evolution of CMOS process technology, the number of transistors in a digital core doubles about every two years. The increases of transistor density and operating frequency have brought the effect of shorter battery life. For some applications such as wireless body area network (WBAN) sensors, the critical consideration is life time instead of operating frequency. The WBAN system provides body signal collecting and reliable physical monitoring. It has many wireless sensor nodes (WSNs) attached on or implanted inside human body. How to perform an ultra-low voltage (ULV) design and simultaneously conform to the performance and reliability requirements is an important issue. Even though degradation in speed and increased susceptibility to parameter variations, the power dissipation can be achieved by operating digital circuits with scaled supply voltages. The operating voltage is scaled down to near-threshold (e.g. 0.5V) or sub-threshold (e.g. 0.2V) region depending on the power and speed requirements of the target systems.

Dynamic-voltage-and-frequency-scaling (DVFS) technique is widely used to achieve the goal of saving powers. Besides, advances in ULV circuit design have demonstrated capabilities to reduce the power consumptions. The mix of DVFS and ULV design techniques has a great potential for the ultra-low power demands. In the DVFS system, the clock generation and transmission are realized by clock generator and clock tree. The mainly possible problems in clock system are clock jitter and skew. Jitter comes from clock generator, and skew comes from clock tree. They may cause functional errors in digital circuits, and will be more serious in ULV region because of environmental variations. The environmental variations include process, voltage, and temperature (PVT); they should be considered carefully when designing ULV clock generators.

## 3.1 Ultra-Low Voltage Process-Invariant Frequency-Domain Smart Temperature Sensor Design

Thermistors and platinum resistors are two most popular conventional temperature sensors with high temperature detection accuracy. However, they need additional readout circuitry to produce temperature readings. In order to overcome it, analog-to-digital convertors (ADCs) were integrated into the so-called smart temperature sensors [3.7,3.8] for easily accessible results in digital format. Most high-accuracy and high-resolution temperature sensors are based on the temperature characteristics of parasitic bipolar transistors. The inaccuracy of the state-of-the-art smart voltage-domain temperature sensors were  $\pm 0.1^{\circ}$ C ( $3\sigma$ ) with resolution of 25mK [3.9] and 10mK [3.10]. Their digital output resolution can be no less than 0.025°C. Those were achieved by using dynamic element matching, a combination of correlated double-sampling and system-level chopping for offset cancellation, precision mismatch-elimination layout, and individual trimming at room temperature after packaging. In [3.11], energy-efficient "zoom-ADC" architecture was presented to maintain the resolution and accuracy of  $\Delta\Sigma$ -ADCs. An inaccuracy of  $0.2^{\circ}$ C( $3\sigma$ ) with resolution 15mK at conversion rate of 10 samples/s was achieved. However, it is hard to implement these analog voltage-domain temperature sensors to be operated in ultra-low voltage regime.

Recently, a time-to-digital-converter-based (TDC-based) CMOS smart temperature sensor [3.12] without a voltage/current ADC or bandgap reference was presented. The time-domain sensor utilized a temperature-dependent delay line to generate a pulse with a width proportional to the test temperature. Then, a cyclic TDC was implemented to convert the pulse into a corresponding digital code. Later, a slow conversion rate improved version [3.13] was presented with curvature compensation to achieve a better accuracy than other time-domain sensors. With two-point calibration, it realized a - $0.4^{\circ}C \sim +0.6^{\circ}C$  inaccuracy (3 $\sigma$ ) over  $0^{\circ}C \sim 90^{\circ}C$  range. Furthermore, process variation is a major challenge needed to be highlighted as technology aggressively scaling down. To remove the effect of process variation and reduce high volume production cost of twopoint calibration, a dual-DLL-based time-domain temperature sensor was presented in [3.14]. Initially, one DLL was in a closed loop while the other one was in an open loop to perform the calibration mode of the sensor. It provided required process corner data for the measurement mode to remove the effect of process variation. The use of DLLs yielded a high measurement bandwidth 5k samples/s at 7b resolution. However, hundreds of inverters were required in these time-domain sensors to obtain enough pulse delay for sufficient temperature resolution.

In this Section, an on-chip 0.4V area-efficient frequency-domain smart temperature sensor with enhanced process variation immunity is developed in TSMC 65nm general purpose CMOS technology. The rest of this paper is organized as follows. Two related state-of-the-art temperature sensors are discussed in Sec. 3.1.1. In Sec. 3.1.2, a frequency-domain temperature sensor for ultra-low voltage operation is proposed. The process variation immunity enhancement of the proposed smart temperature sensor will be described in Sec. 3.1.3. Sec. 3.1.4 provides the proposed 0.4V frequency-domain temperature sensor test chips and silicon measurement results. The summary is discussed in Sec. 3.1.5.



Figure 3.1: (a) Temperature-to-propagation-delay-difference generator. (b) Temperatureto-frequency-difference generator.



Figure 3.2: The linearity of temperature sensitive delay line (TSDL) in super-/subthreshold region.

A temperature-to-propagation-delay-difference generator [3.12] was designed to produce an output pulse with a width as linearly proportional to the measured temperature. As shown in Fig. 3.1(a), the START signal went through two different delay lines. One was temperature sensitive, and the other was temperature insensitive. The difference of propagation delay between those two delay lines,  $T_{d1}-T_{d2}$ , was generated by the XOR gate to form temperature-dependent output pulse width. Note that the second delay line with low thermal sensitivity was inserted to avoid large DC offset. However, the characteristics of temperature sensitive delay line (TSDL) becomes very different as the supply voltage scaling down. There are three operation regions of the MOSFETs, including super-, near-, and sub-threshold region. The corresponding current equations are listed as follows. Super-threshold region:  $(V_{GS} >> V_{th})$ 

$$I_{D\_sp} = \frac{1}{2} \mu^* C_{OX} \left(\frac{W}{L}\right) \left(V_{GS} - V_{th}\right)^2 \left(1 + \lambda V_{DS}\right).$$
(3.1)

Near-threshold region:  $(V_{GS} \sim V_{th})$ 

$$I_{D\_near} = \mu^* C_{OX} \left(\frac{W}{L}\right) V_{DS} \left(V_{GS} - V_{th} - \frac{1}{2} V_{DS}\right).$$
(3.2)

Sub-threshold region:  $(V_{GS} < V_{th})$ 

$$I_{D\_sb} = \mu^* C_{OX} \left(\frac{W}{L}\right) (m-1) U_T^2 \exp\left(\frac{V_{GS} - V_{th}}{mU_T}\right)$$
(3.3)

where  $V_{th}$  denotes threshold voltage and  $\mu^*$  denotes the effective channel mobility. The thermal voltage is represented by  $U_T$ . These three parameters are temperature related. Considering the transistor figure of merit for temperature sensing, the temperature coefficient of current (TCC) [3.15] was used. For a long channel transistor, the TCC in the super-threshold region of operation based on (3.1) is given by

$$TCC_{sp} = \left(\frac{1}{I_{D\_sp}} \frac{dI_{D\_sp}}{dT}\right)$$
  
$$= \frac{1}{\mu^*} \frac{d\mu^*}{dT} - \frac{2}{V_{GS} - V_{th}} \frac{dV_{th}}{dT}.$$
(3.4)

The relative change of  $TCC_{sp}$  is a negative few thousandths per degree because the negative mobility sensitivity dominates. In sub-threshold region, the TCC based on (3.3)

(assuming  $V_{DS}$  is much larger than  $U_T$ ) is given by

$$TCC_{sb} = \left(\frac{1}{I_{D\_sb}} \frac{dI_{D\_sb}}{dT}\right) = \frac{1}{\mu^*} \frac{d\mu^*}{dT} + \frac{2}{T} - \frac{1}{nU_T} \left[\frac{dV_{th}}{dT} + \frac{V_{GS} - V_{th}}{T}\right].$$
(3.5)

The relative change of  $TCC_{sb}$  is now positive because the negative threshold voltage sensitivity dominates in sub-threshold region due to the exponential dependence upon it. As the transistor goes deeper into weaker inversion, yielding  $TCC_{sb}$  of 6% per degree and more. Based on (3.4) and (3.5), the relationship of the TSDL propagation delay versus temperature in super-/sub-threshold region is shown in Fig. 3.2. The TSDL propagation delay in super-threshold region increases with temperature whereas that in sub-threshold region decreases with temperature. However, the linearity of the TSDL propagation delay in sub-threshold region is shown in Fig. 3.2. Therefore, the characteristics of the TSDL in sub-threshold region is not suitable for ultra-low voltage temperature measurement.

On the other hand, the temperature insensitive delay line (TIDL) in [3.12] was also hard to implement when the supply voltage is lower to near-/sub-threshold region. The design principle of TIDL was setting  $\partial I_D/\partial T=0$  to yield the thermal independent conduction current. The first challenge is that the conduction current equation in super-threshold region is very different from that in sub-threshold region, especially the power of  $V_{th}$  term. The second one is that the relative change of TCC in sub-threshold region is several positive hundredths per degree while the relative change of TCC in super-threshold region is a negative few thousandths per degree. The third one is that the conduction current equation of sub-threshold region shown in (3.3) is affected by the thermal voltage to the power of 2,  $U_T^2$ .

In [3.16], a temperature-to-frequency-difference generator was designed to have the temperature sensitive ring oscillator (TSRO) to be the clock source for up-counting, and the temperature insensitive ring oscillator (TIRO) to be the clock source for down-counting. With the same counting period, the output of the up-down counter was equal to the frequency difference of the two oscillators,  $f_{o1} - f_{o2}$ , as shown in Fig. 3.1(b). The counter output,  $f_{o1} - f_{o2}$ , was designed to be linearly proportional to the measured temperature. Such frequency-domain temperature sensor can achieve a conversion rate up to

366k samples/sec with only 400 $\mu$ W power consumption. It adopted a modified TIRO to solve the voltage head room problem. However, the implementation of the TIRO was still based on setting  $\partial I_{D_{sp}}/\partial T=0$  to acquire the minimum thermal sensitivity. Adopting the TIRO in ultra-low voltage region encounters the same difficulty as the TIDL in [3.12].

## 3.1.2 Subthreshold Frequency-Domain Temperature Sensor Design

The previous super-threshold temperature sensors in Sec. 3.1.1 using temperature proportional to propagation-delay/frequency difference were both no longer suitable for ultra-low voltage temperature measurement. It is because that the sub-threshold device conduction current is now exponentially changed based on (3.3). Also, the relative change of TCC is now a positive few hundredths per degree in weak inversion region. It will become more sensitive as the transistor goes deeper into weaker inversion.



Figure 3.3: The proposed ultra-low voltage frequency-domain temperature sensor.



Figure 3.4: Timing diagram of the proposed fixed pulse width generator.

A frequency-domain temperature sensor is proposed in Fig. 3.3 for ultra-low voltage

temperature measurement. It composes of a sub-threshold temperature sensitive ring oscillator (SB-TSRO), a fixed pulse width generator, a 2-input AND, and an S-bit counter. The proposed sensor is designed to have the frequency ratio between the SB-TSRO and the clock source, CLK, of the fixed pulse width generator proportional to the test temperature. Thus, the proposed temperature sensor can be regarded as a temperature-tofrequency-ratio generator. An N-bit counter and a D flip-flop construct the fixed pulse width generator. The CLK for the N-bit counter is created from the divided system clock, and its frequency equals to  $f_{o1}$ . Using the most significant bit of the N-bit counter,  $C_{msb}$ , to reset D flip-flop can produce the desired pulse width without a comparator. Once START is inserted enabling CLK to trigger N-bit counter, the  $C_{msb}$  will become 1 after  $2^{N-1}$  positive edge of CLK. It, then, resets the output of the D flip-flop, Q, and the N-bit counter. The desired pulse width is generated from the D flip-flop output, Q. The fixed pulse width period equals to  $2^{N-1}/f_{o1}$ . The timing diagram of the proposed fixed pulse width generator is shown in Fig. 3.4. Note that the difference of the D flip-flop delay time between Q changing from 0 to 1,  $T_{d1}$ , and from 1 to 0,  $T_{d2}$ , is negligible since the pulse width, W, is longer enough. Also, it can remove some of the fast voltage fluctuations when the period of voltage variation is much shorter than the fixed pulse width period. Moreover, the SB-TSRO is designed to generate a frequency,  $f_{o2}$ , linearly proportional to the measured temperature. Using the 2-input AND, the clock output of the SB-TSRO can only trigger the S-bit counter within the pulse width period, W. Therefore, the digital output of S-bit counter is equal to  $2^{N-1} f_{o2}/f_{o1}$ .

#### 3.1.2.1 Design Principles

One of the key components of the proposed sensor is the sub-threshold temperature sensitive ring oscillator (SB-TSRO). It should produce an output clock with frequency as linearly proportional to the measured temperature as possible. The frequency of SB-TSRO constructed by the inverters is proportional to the conduction current since  $f = \frac{I_D}{(V_{DD} \times C_{eq})}$ .

$$f_{SB-TSRO} \propto I_{D\_sb}.$$
(3.6)

Note that supply voltage,  $V_{DD}$ , and equivalent capacitor of an inverter,  $C_{eq}$ , are assumed to be temperature independent. The inversion layer effective mobility depends on temperature according to [3.17]

$$\mu^* = \mu_0 \left(\frac{T}{T_0}\right)^a,\tag{3.7}$$

where a is typically between -1 and -2. Also, the thermal voltage,  $U_T$ , is equal to

$$U_T = \frac{k_B T}{q}.\tag{3.8}$$

By substituting (3.7) and (3.8) into (3.3), the equation becomes

$$I_{D\_sb} = \mu_0 C_{OX} \left(\frac{W}{L}\right) (m-1) \left(\frac{T}{T_0}\right)^a \left(\frac{k_B T}{q}\right)^2 \exp\left\{\frac{q \left[V_{GS} - V_{th} \left(T\right)\right]}{m k_B T}\right\}.$$
(3.9)

Using Taylor series expansion for exponential function, the equation becomes

$$I_{D\_sb} \cong \mu_0 C_{OX} \left(\frac{W}{L}\right) (m-1) \left(\frac{T}{T_0}\right)^a \left(\frac{k_B T}{q}\right)^2 \left\{1 + \frac{q \left[V_{GS} - V_{th} \left(T\right)\right]}{m k_B T}\right\}.$$
(3.10)

After simplification,

$$I_{D\_sb} \cong X_A T^{2+a} \left\{ 1 + \frac{q \left[ V_{GS} - V_{th} \left( T \right) \right]}{m k_B T} \right\} \approx X_A T^{2+a} \left\{ \frac{q \left[ V_{GS} - V_{th} \left( T \right) \right]}{m k_B T} \right\}$$
(3.11)

where  $X_A = \mu_0 C_{OX} \left(\frac{W}{L}\right) (m-1) \left(\frac{k_B^2}{q^2 T_0^a}\right)$ . It is not temperature related. Note that the second term within the curly brackets is much larger than 1.

Based on [3.18], the threshold voltage,  $V_{th}$ , can be expressed as

$$V_{th}(T) = V_{th}(T_0) + \alpha (T - T_0), \qquad (3.12)$$

where  $\alpha$  is a negative coefficient. Thus, the term within curly brackets of (3.11) is related to threshold voltage,  $V_{th}$ , and thermal voltage,  $U_T$ . It is proportional to temperature. It also means the frequency of SB-TSRO is proportional to temperature based on (noa5). Note that (3.11) is proportional to  $T^{1\sim2}$  since coefficient a is typically between -1 to -2. The accuracy of this temperature sensor is degraded a few because SB-TSRO is not strict linear.

In order to ensure proposed SB-TSRO operates in sub-threshold region, the design principle of the proposed SB-TSRO device threshold voltage is

$$V_{th}(T) = V_{DD}, \quad T > T_{MAX},$$
 (3.13)



Figure 3.5: Inverter used in sub-threshold temperature sensitive ring oscillator.

where the supply voltage,  $V_{DD}$ , is equal to  $V_{GS}$ . The  $T_{MAX}$  represents the maximum temperature operation range of the sensor. The inverter with enable function used in proposed SB-TSRO is shown in Fig. 3.5(a). The threshold voltage behavior can be adjusted by using different multi-threshold CMOS (MTCMOS) setting or increasing the effective channel length. Based on (3.13), the threshold voltage of MOSFETs within proposed SB-TSRO at 125°C is implemented to be  $V_{DD}$  for the design convenience. The relationship of SB-TSRO output clock frequency versus temperature is an approximate linear function as shown in Fig. 3.5(b).

On the other hand, the fixed pulse width generator in Fig. 3.3 requires CLK to create a fixed temperature insensitive pulse width. The CLK can be easily synthesized from system clock using a simple frequency divider. The frequency of the CLK,  $f_{o1}$ , equals to the system clock divided by M. The value of M depends on the frequency generated by the SB-TSRO and the required digital output resolution of the proposed sensor. The immunity of the CLK to the variations relies on the external system clock generator. Meanwhile, the temperature sensitivity of CLK is not required to be exactly zero. Only if the approximation line of CLK frequency versus temperature is not parallel to the SB-TSRO approximation line.

#### 3.1.2.2 Simulation Results

An 11-bit frequency-domain temperature sensor is simulated at 0.4V supply voltage using TSMC 65nm CMOS technology. The SB-TSRO uses regular threshold voltage (RVT) CMOS. The device effective length of the RVT CMOS is adjusted to have its threshold voltage satisfying (3.13). The temperature digital output inaccuracy is  $-3.0^{\circ}C \sim +3.0^{\circ}C$ 



Figure 3.6: The proposed frequency-domain temperature sensor under (a) process variation, and (b) voltage variation.

(without process/voltage variations) over  $0^{\circ}C \sim 100^{\circ}C$  temperature range after one-point calibration. The conversion rate of the proposed temperature sensor can be as fast as 50k samples/sec.

The effects of process/voltage variations on the proposed ultra-low voltage temperature sensor are shown in Fig. 3.6. The major source of voltage variation is the supply voltage bouncing caused by digital circuit switching. However, the bouncing noise will be averaged since the frequency of the proposed sensor is much slower than the system clock. As a result, the effect of process variation is worse than that of voltage variation. The process variation induced inaccuracy is  $\pm 48^{\circ}$ C while voltage variation induced inaccuracy is  $-15.5^{\circ}$ C $\sim 5.1^{\circ}$ C. of the frequency-domain temperature sensor.

### 3.1.3 Ultra-Low Voltage Frequency-Domain Temperature Sensor with Process Variation Immunity Enhancement

In order to remove the effect of process variation, the CLK provided by system clock divided by M is replaced by a near-threshold temperature sensitive ring oscillator (Near-TSRO) as shown in Fig 3.7. The frequency of the Near-TSRO is  $f_{o3}$ . The S-bit counter is still triggered by the SB-TSRO with  $f_{o2}$  frequency. Hence, the output pulse width of fixed pulse width generator becomes  $2^{N-1}/f_{o3}$ . The corresponding digital output of S-bit counter will be  $2^{N-1}f_{o2}/f_{o3}$ .



Figure 3.7: Block diagram of the proposed ultra-low voltage frequency-domain temperature sensor with process variation immunity enhancement.

#### 3.1.3.1 Design Principles

There are two temperature sensitive ring oscillators (TSROs) in the modified frequencydomain temperature sensor for process variation immunity enhancement. One is operated in sub-threshold region, named SB-TSRO, and its frequency is proportional to the conduction current,  $I_{D\_sb}$ . The other one is operated in near-threshold region, named Near-TSRO, and its frequency is proportional to the conduction current,  $I_{D\_near}$ , based on  $f = \frac{I_D}{(V_{DD} \times C_{eq})}$ .

$$f_{Near-TSRO} \propto I_{D-near}.$$
 (3.14)

Based on (3.6) and (3.14), the digital output of S-bit counter can be represented by

$$2^{N-1} f_{o2} / f_{o3} \propto 2^{N-1} I_{D\_sb} / I_{D\_near}.$$
(3.15)

Considering (3.2) and (3.3),  $I_{D\_sb}/I_{D\_near}$  becomes

$$\frac{I_{D\_sb}}{I_{D\_near}} = \frac{(m-1) U_T^2 \exp\left(\frac{V_{GS} - V_{th2}}{mU_T}\right)}{V_{DS} \left(V_{GS} - V_{th3} - \frac{1}{2}V_{DS}\right)},\tag{3.16}$$

where  $V_{th2}$  is the device threshold voltage of SB-TSRO, and  $V_{th3}$  is the device threshold voltage of Near-TSRO. Note that the  $\mu^* C_{OX} \left(\frac{W}{L}\right)$  term is cancelled. Given  $V_{GS} = V_{DS} = V_{DD}$ , the above equation can be simplified as

$$\frac{I_{D\_sb}}{I_{D\_near}} = \frac{(m-1)\left(\frac{k_BT}{q}\right)^2 \exp\left\{\frac{q[V_{DD}-V_{th2}(T)]}{mk_BT}\right\}}{V_{DD}\left[\frac{1}{2}V_{DD}-V_{th3}(T)\right]},$$
(3.17)

where  $U_T = \frac{k_B T}{q}$ . Using Taylor series expansion for exponential function, the equation becomes

$$\frac{I_{D\_sb}}{I_{D\_near}} = \frac{\left(m-1\right) \left(\frac{k_BT}{q}\right)^2 \left\{1 + \frac{q[V_{DD} - V_{th2}(T)]}{mk_BT}\right\}}{V_{DD} \left[\frac{1}{2}V_{DD} - V_{th3}(T)\right]} \approx \frac{\left(m-1\right) \left(\frac{k_BT}{q}\right)^2 \left\{\frac{q[V_{DD} - V_{th2}(T)]}{mk_BT}\right\}}{V_{DD} \left[\frac{1}{2}V_{DD} - V_{th3}(T)\right]} \quad (3.18)$$

Note that the second term within the curly brackets of the numerator is much larger than 1.

The numerator in (3.18) is proportional to temperature when supply voltage equals to SB-TSRO threshold voltage ( $V_{DD}=V_{th2}$ ). Meanwhile, the denominator of (3.18) is approximately proportional to T. Therefore, the output of proposed temperature sensor with enhanced process variation immunity is approximately proportional to T. However, the device threshold voltage of SB-TSRO decreases as temperature increases. Equation (3.18) is going to be proportional to  $T^{1\sim2}$  when the term within the curly brackets of the numerator is approximately proportional to T. It is important to point out that the SB-TSRO threshold voltage is not required to be exactly equal to supply voltage. Only if the approximation line of SB-TSRO frequency versus temperature is not parallel to the Near-TSRO approximation line, will the proposed frequency-domain temperature sensor function correctly.

Equation (3.18) is only valid provided that  $f_{o2}$  is generated in sub-threshold region whereas  $f_{o3}$  is generated in near-threshold region. In order to ensure the SB-TSRO ( $f_{o2}$ ) and the Near-TSRO ( $f_{o3}$ ) operate in sub-threshold and near-threshold region, respectively, the design principles of the device threshold voltage within the two TSROs for the proposed temperature sensor with enhanced process variation immunity are

$$V_{th2}(T) = V_{DD}, \quad T > T_{MAX}$$
 (3.19)

$$V_{th3}(T) = \frac{1}{2}V_{DD}, \quad T < T_{MIN} ,$$
 (3.20)

where  $T_{MAX}$  and  $T_{MIN}$  represent the maximum and minimum temperature operation range of the sensor respectively.

On the other hand, the enhanced process variation immunity is achieved by the temperature-to-frequency-ratio structure. Some process parameters of  $I_{D\_sb}$  are cancelled with those of  $I_{D\_near}$ , including inversion layer mobility, gate oxide capacitance, effective



Figure 3.8: The effect of process variation on the proposed process invariant temperature sensor.

channel width, and effective channel length. The simulation results of the proposed temperature sensor under process variation are shown in Fig. 3.8. Compared to Fig. 3.6(a), the effect of process variation is reduced significantly.

#### 3.1.3.2 Implementation



Figure 3.9: The implementation of the proposed process invariant temperature sensor.

An ultra-low voltage process invariant frequency-domain temperature sensor is implemented in TSMC 65nm bulk CMOS technology. The block diagram is shown in Fig. 3.9. In the proposed temperature sensor, the SB-TSRO still uses regular threshold voltage (RVT) CMOS. For the design convenience, the device effective length of the RVT CMOS is adjusted for having its threshold voltage equals to  $V_{DD}$  at 125°C satisfying (3.19). The clock of the fixed pulse width generator is provided by the Near-TSRO instead of system clock. The low threshold voltage (LVT) CMOS is adopted to construct the inverters within the Near-TSRO. The device effective length of the LVT CMOS is adjusted for having its threshold voltage identical to one half of  $V_{DD}$  at -25°C based on (3.20). In order to achieve sufficient temperature resolution, the Near-TSRO has 51 stages; while the SB-TSRO has 13 stages. Noted that the EN of both the Near-TSRO and the SB-TSRO are controlled by a signal several SYSCLK cycles delay of PW where SYSCLK is the system clock.



Figure 3.10: Timing diagram of the proposed process invariant temperature sensor.

With 0.4V supply voltage, the proposed temperature sensor has two input signals, SYSCLK and START. The SYSCLK is provided from the system clock directly, and it is only used for the control unit. The frequency of the SYSCLK is very flexible, and the only requirement of it is faster than 500kHz. That is sufficient for the control unit since the simulated maximum conversion rate of the proposed temperature is 50kHz. The START triggers the proposed temperature sensor to perform on-chip temperature measurement. Each positive edge of the START can enable the measurement one time, and have the Q of the D flip-flop inserted. The  $S_{rst}$  and  $N_{rst}$  are then inserted after several SYSCLK cycles to reset those 11-bit/10-bit counters, and RDY is reset to 0. Meanwhile, the PW becomes 1 to enable both the SB-TSRO and the Near-TSRO. The SB-TSRO is used for the clock signal of the 11-bit digital output counter; while the Near-TSRO is used for the clock signal of the 10-bit counter. The 10-bit counter of the fixed pulse width generator continues counting until the most significant bit, N[9], is inserted. It will reset the D flip-flop to make Q become 0. The control unit then resets PW to 0. Also, the RDY is inserted after several SYSCLK cycles to notify the 11-bit digital output, TS, is ready. The TS equals to  $512 \times f_{o2}/f_{o3}$ . The timing diagram of the proposed temperature sensor is shown in Fig. 3.10.



#### 3.1.4 Experimental Results in 65nm CMOS

Figure 3.11: Microphotograph of the proposed process invariant temperature sensor.

To verify effectiveness and capabilities of the proposed temperature sensor with enhanced process variation immunity, it was designed by full-custom EDA tools and fabricated in a TSMC general purpose 65-nm one-poly ten-metal (1P10M) CMOS process. Also, the impact of process/voltage variations on the proposed temperature sensor is evaluated in this section. The area of the proposed sensor core is only  $55\mu m \times 18\mu m$  without I/O pads as shown in Fig. 3.11. The proposed process invariant temperature sensor is composed of a near-threshold ring oscillator, a sub-threshold ring oscillator, a fixed pulse width generator, counters, and a control unit. The proposed sensor shared I/O pads with

other designs within the 0.94mm  $\times 0.94$ mm chip.



Figure 3.12: Measurement environment for the test chips.



Figure 3.13: Bare die of the test chip on PCB board.

The measurement environment was set up as shown in Fig. 3.12. Before measuring each test chip, the temperature of the programmable temperature and humidity chamber EZ040-72001 was set to 0°C first and one hour was waited for the chamber temperature to be stable. For 0°C measurement, SYSCLK signal was generated by pulse/function generator 8116A for the control unit of the test chip. Meanwhile, START signal was issued to reset the test chip and activate the proposed sensor conversion. After the counters of the test chip complete one operation, RDY signal will be inserted by the control unit of the test chip. 11-bit digital output TS signal was then recorded by logic analyzer 16900A. It is worth noticing that the test chips were not firmly packaged and the bare die could be seen as shown in Fig. 3.13. Such setting can improve the fidelity of the on-chip sensor temperature detection in the chamber during measurement. The measurement of the proposed sensor was done in 5°C steps over 0°C $\sim$ 100°C temperature range. A 0.5°C/min heating slope was set to increase chamber temperature smoothly. Each temperature measurement was recorded after holding desired temperature point for 10 minutes.



Figure 3.14: Measured error curves for 12 test chips.



Figure 3.15: Measurement results for 12 test chips.

The supply voltage for the test chips is equal to 0.4V. The measurement errors are  $-1.81^{\circ}C \sim +1.52^{\circ}C$  for total 12 test chips after one-point calibration, as shown in Fig. 3.14. To ease chip realization, one-point calibration was fulfilled off-line by linear curve fitting with the digital outputs of 80°C. The corresponding  $3\sigma$  inaccuracy is  $-2.79^{\circ}C \sim +2.78^{\circ}C$ . The average effective resolution of the test chips is measured to be  $0.49^{\circ}C/LSB$ . The average power consumption is 520nW at 0.4V supply voltage and 45k samples/sec conversion rate. The measurement results of 12 test chips are shown in Fig. 3.15 having



Figure 3.16: Measurement error curves for supply voltage variations.

an excellent linearity. Also, the ability of the proposed temperature sensor suppressing the effect of process variation is demonstrated. As shown in Fig. 3.16, the inaccuracy of temperature measurement under voltage variation for  $0.36V\sim0.44V$  (10% supply voltage variation) is  $-6^{\circ}C\sim+8^{\circ}C$ . Like other frequency-domain sensor, the frequencies of TSROs have a strong dependency on supply voltage. The immunity of the sensor against supply voltage variation is rather poor and additional voltage regulator or switched capacitor DC-DC converter is required to reduce voltage variation effect. In Table 3.1, the achieved performance of proposed ultra-low voltage process invariant frequency-domain temperature sensor is compared with recent temperature sensors [3.9–3.14, 3.16, 3.19–3.22]. The ultra-low voltage operation ability of the proposed sensor achieves extreme low power consumption per conversion rate of only 11.6pW/samples/sec.

#### 3.1.5 Summary

A process invariant frequency-domain temperature sensor has been presented to enable on-chip temperature measurement. The sensor was designed to achieve ultra-low voltage operation. It composed of two temperature sensitive ring oscillators (TSROs). One was operated in near-threshold region (Near-TSRO) for the clock source of the proposed fixed pulse width generator. The other one was operated in sub-threshold region (SB-TSRO) for the clock source of the digital output counter. With a 2-input AND circuit, the digital output of the proposed temperature sensor was proportional to the ratio of the SB-TSRO frequency to the Near-TSRO frequency,  $f_{o2}/f_{o3}$ . According to the different conduction

|           |                 |                  | (1)                    | (2)              |           |               |                        |                |
|-----------|-----------------|------------------|------------------------|------------------|-----------|---------------|------------------------|----------------|
| Sensor    | CMOS            | Area             | Power                  | Conv. Rate       | (1)/(2)   | Resolution    | Inaccuracy             | Temp. Range    |
|           | Technology      | $(\mu m^2)$      | Consumption            | (samples/s)      |           | $(O_{\circ})$ | $(O_{\circ})$          | (oC)           |
| [3.9]     | $0.7 \mu m$     | 4500000 (W Pads) | $25\mu A@2.5V-5.5V$    | 10               | 8.250000  | 0.025         | $\pm 0.1 \; (3\sigma)$ | -70~130        |
| [3.10]    | $0.7 \mu m$     | 4500000 (W Pads) | $75\mu A@2.5V-5.5V$    | 10               | 24.750000 | 0.010         | $\pm 0.1 \; (3\sigma)$ | $-55 \sim 125$ |
| [3.11]    | $0.16 \mu m$    | 120000(W/O Pads) | $7.4 \mu W$ @ $1.6 V$  | 10               | 0.740000  | 0.015         | $\pm 0.2 \; (3\sigma)$ | $-30 \sim 125$ |
| [3.12]    | $0.35 \mu m$    | 175000(W/O Pads) | $10\mu W@3.3V$         | 10k              | 0.001000  | 0.160         | -0.7~0.9               | $0{\sim}100$   |
| [3.13]    | $0.35 \mu m$    | 60000(W/O Pads)  | $36.7 \mu W @ 3.3 V$   | $\mathbf{C}_{2}$ | 18.350000 | 0.092         | $-0.25 \sim 0.35$      | $0{\sim}00$    |
| [3.14]    | $0.13 \mu m$    | 120000(W/O Pads) | 1.2mW@1.2V             | 5k               | 0.240000  | 0.660         | $-1.8 \sim 2.3$        | $0 \sim 100$   |
| [3.16]    | 65nm            | 6600(W/O Pads)   | $400\mu W@1.2V$        | 366k             | 0.001300  | 0.043         | $-2.90 \sim 2.75$      | $-40 \sim 110$ |
| [3.19]    | 32nm            | 20000(W/O Pads)  | 1.6mW@1.05V            | 1k               | 1.600000  | 0.450         | S<br>℃                 | $-10 \sim 110$ |
| [3.20]    | $0.18 \mu m$    | 50000(W/O Pads)  | 220 nW @ $1.0V$        | 100              | 0.002200  | 0.300         | $-1.6 \sim 3.0$        | $0{\sim}100$   |
| [3.21]    | 65nm            | 90000(W/O Pads)  | $8.3\mu A@1.2V$        | 2.2              | 4.530000  | 0.030         | $\pm 0.2 \; (3\sigma)$ | -70~125        |
| [3.22]    | 65nm            | 10000(W/O Pads)  | $150\mu W@1.0V$        | 10k              | 0.015000  | 0.139         | $-5.1 \sim 3.4$        | $0{\sim}0$     |
| This Work | $65\mathrm{nm}$ | 990(W/O Pads)    | $520\mathrm{nW}$ @0.4V | 45k              | 0.000012  | 0.490         | $-1.81 {\sim} 1.52$    | $0{\sim}100$   |

Table 3.1: The Performance Comparison of Recent Temperature Sensors

current in near-/sub-threshold region, the effect of process variation on the proposed sensor could be greatly suppressed. Meanwhile, the relationship between temperature and  $f_{o2}/f_{o3}$  was linearly positive related.

The realization in TSMC general purpose 65nm CMOS technology meets the target to be capable of 0.4V supply voltage operation over the temperature range of 0°C to 100°C. The area of the sensor core (without I/O pads) is only  $990\mu m^2$ . The power consumption per conversion rate is 11.6pW/samples/sec, which is a hundredfold improvement over previous work [3.13, 3.16]. All these characteristics make the proposed sensor specially applicable for energy-limited miniature portable platforms.

# 3.2 Near-/Sub-threshold DLL-based Clock Generator with PVT-aware Locking Range Compensation

Figure 3.17: Concept diagram of PVT compensation.

In near-/sub-threshold operations, the device behaviors are affected more seriously by PVT variations than that in the super-threshold region. For the clock generator, the influenced devices make the lock-in delay line having different delay range. Therefore, the clock generator probably cannot be locked to reference clock. Fig. 3.17 shows the concept diagram of PVT compensation. In the typical condition, the reference clock is in the locking range of lock-in delay line. When there are PVT variations, the locking range is shifted. The clock generator cannot be locked to reference clock. After adding the PVT compensation, the locking range can be adjusted. Additionally, the variation-aware logic design is performed for near-/sub-threshold operation.

Many clock multiplication schemes have been proposed for DVFS systems in superthreshold region. Phase-locked loops (PLLs) are usually used as clock generators, but its locking period takes hundred of reference clock cycles. To enhance the flexibility of clock generator for DVFS system, an all-digital clock generator was presented [3.23] to produce output clock by delaying the reference clock dynamically based on the frequency control code. However, delay-locked loop (DLL) was presented for DVFS system, it could not generate fractional clock. Cyclic clock multiplier (CCM) has been presented for DVFS applications, and it has the advantage of creating fractional or multiplied clock. However, the cyclic clock multiplier with time-to-digital converters (TDC) for phase error detection occupied large area and consumed more power. A programmable clock generator is proposed in this paper to achieve reliable operation in near-/sub-threshold region. It adopts the pulse-circulating scheme in [3.24]. The proposed clock generator can produce multiplier and fractional clock without the area overhead. Comparing with DLLs based on clock multiplier, the process-induced phase error can be reduced since the pulse always circulates through the same delay line.

This Section is organized as follows. An unified logical effort models for near-/subthreshold regions is proposed in Sec. 3.2.1. Sec. 3.2.2 describes the system architecture of the proposed clock generator. The PVT compensation technique and the implementation of circuits are discussed in Sec. 3.2.3 and Sec. 3.2.4, respectively. Finally, Sec. 3.2.5 concludes this section.

#### 3.2.1 Unified Logical Effort Models

The logical effort model was a method for estimating super-threshold circuit path delay by simple calculation [3.25]. Based on it, unified logical effort models considering supply voltage and temperature are proposed in this work. Our proposed unified logical effort models are derived under bulk CMOS 65-/45-/32-nm predictive technology models (PTMs) [3.26] and 90-/65-nm UMC technology models.

The delay of a logic gate in [3.25] is defined as

$$d_{abs} = \tau \left( f + p \right) = \tau \left( gh + p \right) \tag{3.21}$$

where  $d_{abs}$  is the absolute delay,  $\tau$  is the basic delay unit. f, g, h, and p are the stage effort, logical effort, electrical effort (fanout), and parasitic delay, respectively. The definition of these parameters is as follows:

$$\tau = v R_{inv} C_{inv}; \quad g = \frac{R_t C_{int}}{R_{inv} C_{inv}}$$
$$h = \frac{C_{out}}{C_{int}}; \quad p = \frac{R_t C_{pt}}{R_{inv} C_{inv}}$$

where v is a constant.  $C_{out}$  and  $C_{pt}$  are the load and parasitic capacitance.  $R_{inv}$  and  $C_{inv}$  are input resistance and capacitance of an inverter template.  $R_t$  and  $C_{int}$  are the input resistance and the capacitance of different logic gate templates. The conventional models do not take into account supply voltage and temperature variations, which may introduce a serious inaccuracy in the delay model. To construct an extension model for environmental variation compensation, logical effort (g) can be rewritten as

$$g = \frac{R_t C_{\text{int}}}{R_{inv} C_{inv}} = \frac{1}{R_{inv} C_{inv}} \bullet \frac{V_{DD} C_{\text{int}}}{I_D} = \frac{k V_{DD} C_{int}}{I_D}$$
(3.22)

From (3.22), logical effort is inverse proportional to  $I_D$ . In order to cover all three super-, near-, and sub-threshold regions of MOSFET in proposed unified logical effort models, the current equations presented by physical alpha-power law [3.27] are simplified as follows:

$$I_{D\_super} = 2\left(\frac{W}{L}\right)\mu_{eff}C_{ox}\left(\frac{E_cL}{\eta}\right)^{\frac{1}{2}}\left(V_{GS} - V_t\right)^{\frac{3}{2}}$$
(3.23)

$$I_{D\_near} = \left(\frac{W}{L}\right) \mu_{eff} C_{ox} \frac{1}{\eta} \left(V_{GS} - V_t\right)^2$$
(3.24)

$$I_{D\_sub} = \left(\frac{W}{L}\right) \mu_0 C_{ox} \frac{\eta}{\beta^2} \exp\left[\left(\frac{\beta}{\eta}\right) \left(V_{GS} - V_t - \frac{\eta}{\beta}\right)\right]$$
(3.25)

Also, the driving strength of PMOS and NMOS are varied differently in those three region. The PMOS to NMOS width ratio (also called  $\beta$  ratio) is set to be 1.5, 2.0, and 2.5 in sub-, near-, and super-threshold region, respectively. Two cascaded FO1 inverters are shown in Fig. 3.18 as the baseline clock buffer. The total width of the buffer is equal to 128 times of minimum total width. The unified logical effort models considering supply voltage and temperature are shown as follows.



Figure 3.18: Two cascaded FO1 inverters.

#### 3.2.1.1 Super-threshold Region

In strong-inversion region, MOSFET operates with strong carrier velocity saturation. Substituting (3.23) into (3.22) gives

$$g = \frac{kV_{DD}C_{in}}{\left(\frac{W}{L}\right)\mu_{eff}C_{ox}\left(\frac{2E_{c}L}{\eta}\right)^{\frac{1}{2}}(V_{DD} - V_{t})^{\frac{3}{2}}} = \frac{V_{DD}}{C_{super} \cdot \mu_{eff} \cdot (V_{DD} - V_{t})^{\frac{3}{2}}}$$
(3.26)

where  $C_{super}$  is a constant. By setting  $V_{t0}$  as the threshold voltage at 0°C and curve fitting method, Super-threshold unified logical effort model considering supply voltage and temperature for baseline clock buffer is defined as

$$g_{u\_super} = \frac{18 V_{DD}}{A (T) \cdot (V_{DD} - V_{t0} + m \cdot T)^{\frac{3}{2}}}$$
(3.27)

where A(T) is a second degree polynomials of temperature depending on the desired technology node shown in TABLE 3.2. m is the slope of a threshold voltage to temperature line. Note that super-threshold logical effort g is set to be 1 when supply voltage  $V_{DD}$  = 1.0V and temperature T = 25°C. The average of absolute model errors ranging from 1.0V to 0.5V  $V_{DD}$  and from -50°C to 125°C are 3.89%, 3.05%, 4.12%, 8.01%, and 6.55% using UMC 90-nm, 65-nm, PTM 65-nm, 45-nm, and 32-nm technology.

#### 3.2.1.2 Near-threshold Region

In moderate-inversion region, MOSFET operates with negligible carrier velocity saturation. Substituting (3.24) into (3.22) gives

$$g = \frac{kV_{DD}C_{in}}{\left(\frac{W}{L}\right)\mu_{eff}C_{ox}\left(\frac{1}{\eta}\right)\left(V_{DD} - V_t\right)^2} = \frac{V_{DD}}{C_{near}\cdot\mu_{eff}\cdot\left(V_{DD} - V_t\right)^2}$$
(3.28)

|         | A(T)                                                                 |
|---------|----------------------------------------------------------------------|
| UMC90nm | $1.77 \times 10^{-5} \cdot T^2 - 6.75 \times 10^{-3} \cdot T + 1.67$ |
| UMC65nm | $3.02 \times 10^{-6} \cdot T^2 - 4.79 \times 10^{-3} \cdot T + 1.93$ |
| PTM65nm | $4.83 \times 10^{-5} \cdot T^2 - 1.63 \times 10^{-2} \cdot T + 2.30$ |
| PTM45nm | $7.21 \times 10^{-5} \cdot T^2 - 2.25 \times 10^{-2} \cdot T + 2.93$ |
| PTM32nm | $5.99 \times 10^{-5} \cdot T^2 - 1.81 \times 10^{-2} \cdot T + 2.30$ |

Table 3.2: Functions of A(T) for super-threshold unified logical effort model considering supply voltage and temperature

where  $C_{near}$  is a constant. Similarly, curve fitting method is used to define near-threshold unified logical effort model considering supply voltage and temperature for baseline clock buffer shown as follows.

$$g_{u\_near} = \frac{1}{B(T)V_{DD}^2 + C(T)V_{DD} + D(T)}$$
(3.29)

where B(T), C(T), and D(T) are all second degree polynomials of temperature depending on the desired technology node shown in TABLE 3.3. Note that near-threshold logical effort g is set to be 1 when supply voltage  $V_{DD}$ =0.5V and temperature T=25°C. The average of absolute model errors ranging from 0.5V to 0.33V  $V_{DD}$  and from -50°C to 125°C are 1.57%, 2.57%, 1.20%, 1.44%, and 5.04% using UMC 90-nm, 65-nm, PTM 65-nm, 45-nm, and 32-nm technology.

#### 3.2.1.3 Sub-threshold Region

In weak-inversion region, Diffusion current dominates the MOSFET drain current. Substituting (3.25) into (3.22) gives

$$g = \frac{kV_{DD}C_{in}}{\left(\frac{W}{L}\right)\mu_0 C_{ox}\frac{\eta}{\beta^2}\exp\left[\left(\frac{\beta}{\eta}\right)\left(V_{GS} - V_t - \frac{\eta}{\beta}\right)\right]} = \frac{V_{DD}}{C_{sub}\cdot\mu_0\exp\left[\left(\frac{\beta}{\eta}\right)\left(V_{GS} - V_t - \frac{\eta}{\beta}\right)\right]} \quad (3.30)$$

where  $C_{sub}$  is a constant. Similarly, curve fitting method is used to define sub-threshold unified logical effort model considering supply voltage and temperature for baseline clock buffer shown as follows.

$$g_{u\_sub} = \frac{1}{E(T) \exp\left[F(T) \cdot (V_{DD} - V_{t0})\right]}$$
(3.31)

| for                          |                                                      |
|------------------------------|------------------------------------------------------|
| F(T)                         |                                                      |
| Id F                         |                                                      |
| ) an                         |                                                      |
| of E(T) and F(               |                                                      |
| of                           |                                                      |
| Functions @                  |                                                      |
| unct                         |                                                      |
| and Funct                    |                                                      |
| l, and                       |                                                      |
| ΟŪ                           |                                                      |
|                              |                                                      |
| hold unified logical effort  |                                                      |
| cal e                        | 1.1.1                                                |
| l logica                     | 0400                                                 |
| fied                         | +0+                                                  |
| unifie                       | in annul toxo fud tomorrous                          |
| lold                         |                                                      |
| ŝ                            | + ~                                                  |
| near-thre                    |                                                      |
|                              | * F F C                                              |
|                              |                                                      |
| D(T)                         | , c::0                                               |
| and D                        | - U                                                  |
| '), a                        | 000                                                  |
| C(T                          | 4                                                    |
| Т),                          | ά                                                    |
| f B(                         | 10012                                                |
| ns o                         | ر ا<br>م                                             |
| .3: Functions of B(T), C(T), | and thursdal initial locinal affaut worded according |
| Fun                          | י<br>ר                                               |
| Table 3.3: I                 | o que                                                |
| le 3                         | + h                                                  |
| Tat                          | 4.50                                                 |
|                              |                                                      |

| sub-threshold | d unifie | sub-threshold unified logical effort model considering supply voltage and temperature                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | ply voltage and temperature                               |                                                                                                                                       |
|---------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
|               |          | B(T)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | C(T)                                                      | D(T)                                                                                                                                  |
| UMC90nm       |          | $4.76 \times 10^{-4} T^2 - 9.20 \times 10^{-2} T + 84.7$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | $-3.94 \times 10^{-4} T^2 - 6.91 \times 10^{-2} T - 2.3$  | $ -3.94 \times 10^{-4} T^2 - 6.91 \times 10^{-2} T - 2.35 \ \left  \ 7.39 \times 10^{-4} T^2 - 1.11 \times 10^{-2} T + 0.07 \right  $ |
| UMC65nm       |          | $-2.05 \times 10^{-4} T^2 - 4.81 \times 10^{-2} T + 15.9$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | $6.54 \times 10^{-5}T^2 + 5.87 \times 10^{-2}T - 8.75$    | $3.21 \times 10^{-6} T^2 - 1.22 \times 10^{-2} T + 1.30$                                                                              |
| PTM65nm       |          | $5.09 \times 10^{-4}T^2 - 1.96 \times 10^{-1}T + 26.0$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | $-3.36 \times 10^{-4} T^2 + 1.29 \times 10^{-1} T - 15.5$ | $5 \left  5.49 \times 10^{-5} T^2 - 2.10 \times 10^{-2} T + 2.39 \right $                                                             |
| PTM45nm       |          | $1.16 \times 10^{-3}T^2 - 3.20 \times 10^{-1}T + 36.0$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | $-8.37 \times 10^{-4} T^2 + 2.27 \times 10^{-1} T - 23$   | $-8.37 \times 10^{-4} T^2 + 2.27 \times 10^{-1} T - 23.7  1.51 \times 10^{-4} T^2 - 4.01 \times 10^{-2} T + 4.00$                     |
| PTM32nm       |          | $1.25 \times 10^{-3}T^2 - 3.75 \times 10^{-1}T + 42.8$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | $-8.93 \times 10^{-4} T^2 + 2.70 \times 10^{-1} T - 29$   | $-8.93 \times 10^{-4} T^2 + 2.70 \times 10^{-1} T - 29.3  1.59 \times 10^{-4} T^2 - 4.87 \times 10^{-2} T + 5.11$                     |
|               |          | E(J                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | E(T)                                                      | F(T)                                                                                                                                  |
| UMC90nm       | )0nm     | $1.16 \times 10^{-9}T^4 - 2.35 \times 10^{-7}T^3 + 5.64 \times 10^{-6}T^2 + 6.35 \times 10^{-3}T + 0.467$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | $4 \times 10^{-6}T^2 + 6.35 \times 10^{-3}T + 0.467$      | $2.36 \times 10^{-4} T^2 - 1.02 \times 10^{-1} T + 21.8$                                                                              |
| UMC65nm       | 35nm     | $6.88 \times 10^{-10} T^4 - 2.37 \times 10^{-7} T^3 + 2.86 \times 10^{-5} T^2 + 1.20 \times 10^{-2} T + 0.855$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | $86 \times 10^{-5}T^2 + 1.20 \times 10^{-2}T + 0.855$     | $2.90 \times 10^{-4} T^2 - 1.06 \times 10^{-1} T + 21.1$                                                                              |
| PTM65nm       |          | $7.51 \times 10^{-10} T^4 - 1.46 \times 10^{-7} T^3 - 1.06 \times 10^{-6} T^2 + 1.20 \times 10^{-3} T + 1.020$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | $06 \times 10^{-6}T^2 + 1.20 \times 10^{-3}T + 1.020$     | $2.11 \times 10^{-4} T^2 - 9.13 \times 10^{-2} T + 22.2$                                                                              |
| PTM45nm       |          | $6.47 \times 10^{-10}T^4 - 1.44 \times 10^{-7}T^3 + 3.09 \times 10^{-6}T^2 + 1.15 \times 10^{-3}T + 0.989$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | $9 \times 10^{-6}T^2 + 1.15 \times 10^{-3}T + 0.989$      | $2.08 \times 10^{-4} T^2 - 9.39 \times 10^{-2} T + 22.0$                                                                              |
| PTM32nm       | 32nm     | $3.29 \times 10^{-10} T^4 - 1.17 \times 10^{-7} T^3 + 1.08 \times 10^{-5} T^2 + 7.29 \times 10^{-4} T + 0.959 \left  1.80 \times 10^{-4} T^2 - 8.95 \times 10^{-2} T + 21.2 \times 10^{-10} T^4 + 1.08 \times 10^{-10} T^4 $ | $8 \times 10^{-5} T^2 + 7.29 \times 10^{-4} T + 0.959$    | $1.80 \times 10^{-4} T^2 - 8.95 \times 10^{-2} T + 21.2$                                                                              |

where E(T) and F(T) are fourth-degree and second-degree polynomials of temperature depending on the desired technology node shown in TABLE 3.3. Note that logical effort g is set to be 1 when supply voltage  $V_{DD}=0.33V$  and temperature  $T=25^{\circ}C$ . The average of absolute model errors ranging from 0.33V to 0.1V  $V_{DD}$  and from -50°C to 125°C are 6.01%, 8.40%, 3.03%, 2.97%, and 5.14% using UMC 90-nm, 65-nm, PTM 65-nm, 45-nm, and 32-nm technology.



#### **3.2.2** Clock Generator Architecture

Figure 3.19: Proposed clock generator for near-/sub-threshold DVFS system.

The architecture of the proposed clock generator is shown in Fig. 3.19. The main blocks of the clock generator are pulse generators (PG), phase detector, counter, lockin delay line, PVT compensation (PVT-comp.) delay line, PVT detector, control unit and frequency divider. In the proposed clock generator, the CLK REF signal enters a PG which produces pulse ( $P_{REF}$ ) with frequency equal to CLKREF. Pulse multiplier generates pulses ( $P_{OUT}$ ) with 8-time frequency of the reference pulse ( $P_{REF}$ ). The divider can divide the input frequency by 2, 4, 6 or 8. Therefore, the proposed clock generator is able to output clock with M/N times of the reference clock, M=(1,8) and N=(2,4,6,8) which are controlled by input frequency selecting signal FS[2:0]. The frequency selection range is from 0.125X to 4X with eight different multiplied output frequencies.

In order to produce POUT with 8-time frequency of  $P_{REF}$ , a pulse-circulating scheme is adopted. Each pulse of  $P_{REF}$  enters the circulating path and circulates 8 times. The path is determined by path selection signal SEL. When SEL=1 the pulse from  $P_{REF}$  can enter the delay line. Otherwise, the circulating path is built. The counter is used to count the number of times that pulse flows in the pulse-circulating path. It informs phase detector and control block by the signal *Eight8* when it is equal to eight. Therefore, the phase detector compares the phases of  $P_{OUT}$  and  $P_{REF}$  when the counting number is equal to eight.



Figure 3.20: Proposed finite state machine (FSM).

Fig. 3.20 demonstrates the procedure of proposed clock generator operation. After the clock generator is reset, the finite state machine passes through three steps: PVT compensation, SAR (successive approximation register) control, and lock. In the first step, the system goes into the PVT compensation state. To compensate the locking range for the delay variations, the clock generator uses the PVT compensation technique to provide adequate delay for the lock-in delay line. In this step, the locking range of the two delay lines are modified to ensure the period of the reference clock within the locking range. After PVT compensation, the FSM enters the second step, SAR control. It adopts binary search algorithm to trace the reference clock. In the step, the control unit changes the control codes (C[5:0]) according to the comparison results of the phase detector (*LEAD* and *LAG*). The total delay of lock-in delay line and PVT-comp. delay line is tuned to be equal to the period of the reference clock. It also means the pulse multiplier is locked. Finally, the FSM enters lock state, and the clock generator can output clock with multiplied or divided frequency. The feedback loop consists of output clock, phase detector, and control unit. Because SAR algorithm has an open loop characteristic which means it cannot track the environmental variations, we used another strategy to make the system in a closed loop when the SAR control finished. That is, the control unit continues the tracking procedure by adding or subtracting C[5:0] by 1 at a time because keeping the circuit in close loop guarantees the clock generator is still locked to reference clock. Fig. 3.21 show the timing diagram of the proposed FSM operating from Reset to Lock state. The control codes C[5:0] change every two clock cycles. It also shows that the proposed clock generator has the ability of the PVT compensation for the range of the lock-in delay line and takes only two reference clock cycles.



Figure 3.21: Timing diagram of our FMS operating from Reset to Lock state.

77777

#### 3.2.3 PVT-Aware Delay Line Design

#### 3.2.3.1 Variation-Aware Lock-in Delay Line Design

The lock-in delay line (LIDL) is modified from the nested lattice delay line (NLDL) [3.28], as Fig. 3.22 shown. Compared with the NLDL, the LIDL saves some circuit area by using the 14-stage FO2-NAND instead of the lattice delay line (LDL) as a block delay. It still keeps the advantages of the NLDL. First, the LIDL has equal rising and falling times. Second, while the tuning range increases, the maximum operating frequency will be the same. Finally, the variation is only half compared to conventional configuration. The locking range of the lock-in delay line is from  $4D_{NAND,FO2}$  to  $130D_{NAND,FO2}$ . Initially,



Figure 3.22: Lock-in delay line (lattice delay line [3.28]) used in our proposed clock generator.

the lock-in delay line is set to be about the middle point of locking range,  $64D_{NAND,FO2}$ . When the supply voltage is down to sub-threshold region, there are two critical factors affect functionality [3.29]. First, the ratio of  $I_{ON}$  to  $I_{OFF}$  is decreased in logic gates. Second, random-dopant-fluctuation is a source of local variations in sub-threshold region [3.4]. These two factors result in not only reduced output swings in CMOS logic gates but also skewed voltage transfer curve (VTC). Upsizing transistor is one technique for mitigating local variation. Researches in [3.30] showed that standard deviation of Vt varies inversely with the square root of the channel area. In the proposed clock, we use the back-to-back configuration to find the length and width of the transistors and to make sure the function work correctly.

#### 3.2.3.2 PVT Compensation Delay Line Design

Fig. 3.23 shows the PVT compensation (PVT-comp.) delay line, it is also similar to the nested lattice delay line (NLDL). Fig. 3.19 shows that the PVT-comp. delay line is controlled by D[5:0]. In the PVT compensation state, the PVT detector senses the environmental conditions, which are recorded in a counted number count. Then count is decoded to control code D[5:0], the PVT-comp. delay line can provide adequate delay.



Figure 3.23: PVT compensation delay line used in our proposed clock generator.



The PVT detector is shown in Fig. 3.24. It consists of a PVT sensing circuit, a counter and a decoder. The PVT sensing circuit uses a ring oscillator which can be switched on or off. When the clock generator is in PVT compensation state, the switch signal is turned on for one reference clock cycle.

The ring oscillator of the PVT sensing circuit is composed of 62-stage FO1-INV and 1-stage NAND. According to the Monte Carlo simulation results, the period of the ring oscillator's output is nearly equal to the 128-stage FO1-INV delay,  $128D_{INV}$ . Thus, the counted number count is equal to

$$count = \frac{T_D}{128 \times D_{INV}} \tag{3.32}$$

The delay period relationship between FO1-INV and FO2-NAND is expressed in (3.33), and it defines the relationship we adopted in Sec. 3.2.3.3.

$$D_{NAND,FO2} = 2 \times D_{INV} \tag{3.33}$$

 $D_{NAND,FO2}$  represents delay of FO2-NAND. Equation (3.32) becomes

$$count = \frac{T_D}{64 \times D_{NAND,FO2}} \tag{3.34}$$

The pulse signal propagates through the delay line eight times because the clock generator adopts the pulse-circulating scheme with the output pulses of 8 times frequency. For locking to the reference clock, the target delay of both delay lines should be equal to  $T_D/8$ . Equation (3.34) becomes

$$\frac{T_D}{8} = count \times 8 \times D_{NAND,FO2} = (count \times 8 - 64) \times D_{NAND,FO2} + 64 \times D_{NAND,FO2}$$
(3.35)

The delay of entire delay line is divided into two parts: the delays provided by PVT comp. delay line and by lock-in delay line. The initial delay of the lock-in delay line is set at  $64 \times D_{NAND,FO2}$ . From (3.35), the remaining delay is compensated by PVT comp. delay line. The unit delay step of the PVT-comp. delay line is  $32 \times D_{NAND,FO2}$ . To calculate control codes D[5:0], we divide the delay provided by PVT-comp. delay line in (3.35) by  $32 \times D_{NAND,FO2}$ .

$$D[5:0] = \frac{1}{32 \times D_{NAND,FO2}} \left[ (count \times 8 - 64) \times D_{NAND,FO2} \right] = \frac{count}{4} - 2 \qquad (3.36)$$

Note that the minimum value of D[5:0] is 0 because the delay provided by PVT-comp. delay line cannot be negative. To realize the decoder to derive (3.36), the divisor of count can be accomplished with shift 2-bit to reduce area overhead.

#### 3.2.3.3 Delay Ratio of FO1-INV to FO2-NAND

The delay ratio of inverter with fan-out 1 (FO1-INV) to NAND gate with fan-out 2 (FO2-NAND) is demonstrated in this subsection. The characteristic is used for PVT compensation to adjust locking range of delay line. The FO1-INV is taken as the cell of PVT sensing circuits in the PVT detector. The FO2-NAND delay is used as a unit delay step which can be tuned in the lock-in delay line. The sizes of NMOS and PMOS are



Figure 3.25: Monte Carlo simulations for periods of ring oscillators (composed of FO1-INV and FO2-NAND) (a) 0.2V supply voltage, and (b) 0.5V supply voltage.

the same in FO1-INV. Fig. 3.25 shows Monte Carlo simulation results of the oscillators at 0.2V (sub-threshold region) and 0.5V (near-threshold region). Whether the supply voltage is 0.2V or 0.5V, the delay ratios of FO2-NAND to FO1-INV both approximately equal to 2. This ratio is unchanged under various PVT conditions. This property is used in PVT compensation delay line for locking range tuning.

#### 3.2.4 Circuits Implementation

3.2.4.1 Control Unit



Figure 3.26: Control unit including (a) lock-in delay line controller, and (b) SEL generator.

The control unit generates the control signals, C[5:0] and SEL. It is composed of two parts including the lock-in delay line controller and the SEL generator. The lock-in delay line controller generates the signal C[5:0], which is the lock-in delay line control codes to adjust the delay of the lock-in delay line and make the output clock close to the reference clock. The SEL generator produces the signal SEL, which is used to select the input clock in the pulse-circulating path according to the signal Eight8 and the reference pulse.

The lock-in delay line controller, shown in Fig. 3.26(a), combines two categories of locking strategy: SAR (Successive Approximation Register) controlled and counter controlled. The SAR controlled strategy adopts binary search algorithm, which achieves short locking time and low hardware complexity. Nevertheless, its open-loop characteristic doesn't track the environmental variations. To solve this problem, the counter controlled strategy is added. It is aimed at tracking of the environmental variations for its close-loop characteristic. When the clock generator starts, it uses the SAR strategy first for fast locking. After the SAR controlled strategy finished, it is changed to the counter controlled strategy. C[5:0] is the lock-in delay line control codes. It is sent back to the combination logic blocks. The multiplexer chooses which lock-in strategy to be used. When the clock generator is in locked state, it chooses the counter controlled locking strategy tracking the environmental variations.

In Fig. 3.19, the SEL signal selects the path of the pulses from  $P_{OUT}$  or  $P_{REF}$ . If the pulse signal is from  $P_{REF}$ , the circulating pulses are re-adjusted. If the pulse signal is from  $P_{OUT}$ , the pulse-circulating path is built. Fig. 3.26(b) shows the block diagram of the SEL generator, it has two different modes at states SAR and Lock. When the state is SAR, *SEL* will be inversed every negative edge of  $P_{REF}$ . When the state is Lock, *SEL* is decided by  $P_{REF}$  and *Eight8*. *SEL* will be high when  $P_{REF}$  is high or 8th pulse of  $P_{OUT}$  arrives, and the latter is designed to avoid 9th pulse propagating through the pulse-circulating path early.

#### 3.2.4.2 Phase Detector

In Fig. 3.27(a), the phase detector compares the arrival time of  $P_{REF}$  and 8th  $P_{OUT}$ . Conventional phase detector uses only two D flip-flops, which is not suitable in pulsecirculating scheme because they are easily affected by other pulse signals. Here we added



Figure 3.27: (a) Phase detector, and (b) RSTPD generator.

another two D flip-flops in front of them. The *Eight*8 signal will make the modified phase detector to work correctly. Therefore, the modified phase detector can compare only  $P_{REF}$  and the 8th pulse  $P_{OUT}$  without noised by the other pulses of  $P_{OUT}$ . In addition, we used the *RSTPD* signal, which resets the four D flip-flops, to control the modified phase detector. Fig. 3.27(b) shows the RSTPD generator which consists of two D flip-flops and the reference clock as the input. The phase comparison is performed every two clock cycles.

#### 3.2.4.3 Simulation Results

The proposed programmable clock generator for near-/sub-threshold DVFS system is implemented in UMC 65nm CMOS technology. It can operate in the voltage range from 0.2V to 0.5V. At 0.2V, the frequency of reference clock is 156kHz. It consumes  $0.18\mu$ W with maximum output frequency 625kHz. At 0.5V, the frequency of reference clock is 5MHz. It consumes  $5.17\mu$ W with maximum output frequency 20MHz. Fig. 3.28 demonstrates the PVT compensation for the locking range of clock generator. Before compensation the reference clock is not in the locking range because of the effects of environmental variations. The clock generator is not able to output multiplied clock. After PVT compensation, the reference clock is in the locking range for various environmental conditions. Table 3.4 gives the performance summary of the proposed clock generator. The layout view of the proposed clock generator is shown in Fig. 3.29. The core area of this clock generator is  $77\mu$ m x  $125\mu$ m.



Figure 3.28: PVT compensation for locking range of proposed generator at (a) 0.2V, TT, w/o compensation, (b) 0.2V, TT, with compensation, (c) 0.2V, FF, w/o compensation, (d) 0.2V, FF, with compensation, (e) 0.5V, TT, w/o compensation, (f) 0.5V, TT, with compensation, (g) 0.5V, FF, w/o compensation, (h) 0.5V, FF, with compensation.



Figure 3.29: Layout view of our DLL-based clock generator under UMC 65nm bulk CMOS technology.



Table 3.4: Specifications of the proposed DLL-based clock generator

| Supply Voltage           | 0.2V-0.5V                                                                                         |
|--------------------------|---------------------------------------------------------------------------------------------------|
| Process                  | UMC65nm                                                                                           |
| Active Area              | $0.077 \times 0.125 mm^2$                                                                         |
| Reference Clock          | $156 { m kHz} @ 0.2 { m V} \ / \ 5 { m MHz} @ 0.5 { m V}$                                         |
| Maximum Output Frequency | 625kHz@0.2V / 20MHz@0.5V                                                                          |
| Minimum Output Frequency | 1 8 19.5kHz@0.2V / 625kHz@0.5V                                                                    |
| Output Jitter            | $60 \mathrm{ns}@625 \mathrm{kHz},0.2 \mathrm{V}$ / $4 \mathrm{ns}@20 \mathrm{MHz},0.5 \mathrm{V}$ |
| Power Consumption        | $0.18\mu W@625 kHz, 0.2 V / 5.17\mu W@20 MHz, 0.5 V$                                              |
|                          |                                                                                                   |

#### 3.2.5 Summary

A near-/sub-threshold programmable clock generator is proposed in this section [3.31]. Firstly, unified logical effort models [3.32] are presented in Sec. 3.2.1 to extend traditional one with voltage and temperature extensions across all MOS operation regions. Secondly, the major challenge of the ultra-low voltage circuits is that the lock-in range of the delay line is easily affected by the environmental variations. In the proposed clock generator, there is a PVT compensation unit which consists of a set of delay line and a PVT detector. The unit is responsible for adjusting the lock-in range of clock generator to guarantee successful clock lock. In addition, the variation-aware logic design is performed in the clock generator, which improves the reliability on process variation. Also, the adoption of pulse-circulating scheme suppresses process induced output clock jitter. Furthermore, it has the ability to generate the output clock with frequency from 1/8 to 4 times of the reference clock. The clock generator has been designed using UMC 65nm CMOS technology. The frequencies of reference clock are 625 kHz at 0.2V and 5MHz at 0.5V. The power consumptions are  $0.18\mu$ W and  $5.17\mu$ W, respectively, at 0.2V and 0.5V. The core area of this clock generator is  $0.01mm^2$ .



## Chapter 4

## **Ultra-Low Voltage Memory Design**

Embedded SRAMs dominate the power consumption, area, performance, and yield of the emerging portable electronic devices. These devices require low energy consumption to allow long operational lifetimes as they are often battery powered. Design of subthreshold SRAMs is popular utilized because lowering supply voltage can quadratically reduce the energy consumption [4.1]. However, as the supply voltage is below the transistor threshold voltage, the variability of SRAM increases severely in design and process parameters regarding proper ratio of device strengths [4.2]. Major subthreshold SRAM stability issues include process-induced device variation, decreasing  $I_{ON}$ - $I_{OFF}$ -ratio, and threshold voltage random variation  $(\sigma_{VT})$  [4.3]. The standard 6T bit-cell fails to perform reliable weak-inversion operations because of read current disturbance induced static noise margin (SNM) degradation. Various more-than-6T bit-cells were presented to address the read reliability issue, such as 8T bit-cells [4.4, 4.5]. They added two transistors as the read buffer to isolate the storage node from the bitline resulting in better read stability. In [4.5], a 64Kbit SRAM utilizes the reverse short channel effect (RSCE) in the bit-cell, which improves read performance and write margin without peripheral circuits assisting. Despite tolerant read destruction, undesired read failure still happens if read bit-line discharge owning to leakage from unselected bit-cell. Moreover, a fully differential 10T bit-cell is proposed for high read stability [4.6]. The 10T subthreshold SRAM also employs efficient bit-interleaving structure to deal with soft-error immunity. The Schmitt Trigger II (ST-2) bit-cell [4.7] bases on differential sensing subthreshold SRAM, which can cope

with the read versus write confliction design requirement. A built-in feedback mechanism is also incorporated in the ST-2 bit-cell to enhance process variation tolerance.



Figure 4.1: Wireless sensor node block diagram for the WBAN system.

On the other hand, asynchronous first-in first-out (FIFO) memory is a key component of communication system for buffering and flow control. It helps passing data and control information between two independent clock domains to provide a significant resourcesharing advantage. One famous FIFO application is chip multiprocessors with globally asynchronous locally synchronous (GALS) clocking styles [4.8, 4.9]. They utilized a large amount of asynchronous FIFOs to hide much of the GALS performance penalty and transfer information across inter-processor clock domain boundaries. Recently, wireless body area sensor network (WBAN) is a breakthrough personal healthcare technology for body condition monitoring and diagnosis. Due to limited energy source and long-term stability requirement for a WBAN system, robust ultra-low power designs are indispensable [4.10, 4.11]. As shown in Fig. 4.1, one primary component of the wireless sensor node is an asynchronous FIFO memory. It dominates the total die area and power consumption. Accordingly, reducing power consumption of the FIFO memory is an urgent design consideration for optimal WBANs. Voltage scaling is a popular method to reduce energy in digital circuit due to quadratic saving in energy. To achieve high reliability and energy-efficient operation for asynchronous FIFO memory, a dual-port-SRAM-based FIFO memory operating in near-/sub-threshold regions is applicable.

## 4.1 9T Subthreshold SRAM Design with Bit-Interleaving Scheme

Soft-error problem becomes more critical for ultra-low voltage SRAMs than superthreshold SRAMs because the critical charge in storage node is much less. As reported in [4.12], soft-error rate (SER) increases by 18% for every 10% supply voltage reduction. In order to enhance subthreshold SRAMs soft-error immunity, bit-interleaving scheme is always preferred. It can spatially separates bits of a word in the row, and only simple single-bit error correction coding (ECC) is needed. However, the read-buffered 8T bit-cell designed with bit-interleaving scheme suffered from write-half-select disturbance. To solve the issue, an array architecture and circuits with 12% area overhead compared to 8T SRAM design were presented in [4.13]. The array architecture addressed halfselect problem by decoupling large bitline capacitance from half-selected cells. In [4.6], a fully differential 10T bit-cell was presented for high read stability. Meanwhile, it can be designed with bit-interleaving scheme by vertical and horizontal wordlines. It required boosted wordline technique to maintain robust write operation. Recently, a fully differential 8T SRAM with a column-based dynamic supply scheme was presented in [4.14]. By utilizing different cell supply voltages for basic modes, it successfully separated the read/write/standby operations to allow it bit-interleaved.

In Sec. 4.1, we propose a 9T bit-cell with enhanced write ability by inserting a pass transistor into the cross-coupled inverter pair. To allow bit-interleaving array scheme for 9T bit-cells, two additional write-wordlines (WWL/WWLb) are used. Advanced iso-area SRAM stability analysis and fabricated test chip experimental results are also proposed. Sec. 4.1.1 describes basic operations and layout considerations of our 9T bit-cell. Advanced iso-area SRAM  $V_{min}$  analysis are discussed in Sec. 4.1.2. Sec. 4.1.3 shows the implementation of a 1Kbit 9T SRAM with bit-interleaving array scheme and the measurement results. The summary is discussed in Sec. 4.1.4.



Figure 4.2: Block diagram of the proposed 9T bit-cell. The relative threshold voltage ratio of high  $V_t$  MOSFET to regular  $V_t$  one is 1.3 to 1.

# 4.1.1 9T Subthreshold SRAM Bit-Cell Design

The block diagram of the proposed 9T bit-cell is shown in Fig. 4.2. It adapts multiple threshold CMOS (MTCMOS) technique including high  $V_t$  and regular  $V_t$  devices to deliver benefits of saving leakage and increasing write margin/hold static noise margin (HSNM), respectively. Three n-type transistors, MAR, MAW, and MDR, construct the access buffer. Their device length increase to 100nm utilizing reverse short channel effect [4.15] for better  $I_{ON}$ - $I_{OFF}$ -ratio and less threshold voltage variation caused by random dopant fluctuation. The voltage drop of write operation is reduced by choosing the access transistors, MAR and MAW, regular  $V_t$  devices. The pass transistors, MNP and MNN, are inserted into cross-coupled inverter pair for write ability enhancement. However, they also cause the HSNM degradation which is negligible by having them regular  $V_t$  device. Other MOSFETs in the proposed bit-cell are high  $V_t$  devices for leakage reduction. Also, single bitline scheme and virtual ground signal, VVSS, are adopted for leakage reduction. The VVSS is attached to  $V_{DD}$  except it discharged to ground in read mode. To enable SRAM design with bit-interleaving scheme, there are three wordlines, WL, WWL, and WWLb, in our 9T bit-cell. The write-wordline, WWL, and its complementary signal, WWLb, are enabled (disabled) only in write mode, whereas WL is enabled in both read and write modes. The operation truth table for 9T bit-cell different operation modes is shown in Table 4.1.

| Mode  | WL   | WWL  | WWLb | VVSS |
|-------|------|------|------|------|
| Hold  | low  | low  | high | high |
| Read  | high | low  | high | low  |
| Write | high | high | low  | high |

Table 4.1: Proposed 9T Bit-Cell Basic Operations Truth Table

# 4.1.1.1 Basic Operations



Figure 4.3: (a) Proposed 9T bit-cell in hold operation, and (b) HSNM performance comparison.

During hold mode as shown in Fig. 4.3(a), *MAR* and *MAW* of the access buffer are turned off to form a cascaded transistor structure. It reduces the bitline leakage current considerably. Meanwhile, single bit-line scheme and regular  $V_t$  access transistors are utilized to reduce the voltage drop in HSNM as shown in Fig. 4.3(b).

In read mode as shown in Fig. 4.4(a), *MAW* is turned off to isolate the read path and storage node, thus eliminating read disturbance. Because of the isolation, its read static noise margin (RSNM) is nearly the same as its HSNM. The channel length of read buffer, *MAR* and *MDR*, increases to 100nm for single-ended read delay time and stability enhancement. Performing Monte Carlo simulation 10000 times, the distribution of RSNM is depicted in Fig. 4.4(b). Due to the read and write conflict in convention 6T bit-cell, it has a poor RSNM than other subthreshold bit-cells. For our 9T bit-cell, the proposed access buffer structure provides mean RSNM value of 78mV. Note that the foot of read



Figure 4.4: (a) Proposed 9T bit-cell in read operation, and (b) RSNM performance comparison.

buffer is connected to virtual ground, VVSS. While the foot of the selected word during read operation is pulled to GND, all the other feet of read buffers are connected to VDDin all rows of the SRAM arrays for leakage reduction. However, the read delay of this 9T bit-cell is still  $1.38 \times$  slower than the differential 10T bit-cell [4.16] with iso-area condition at 0.3V supply voltage. It is acceptable since speed is not the primary constraint in subthreshold design.



Figure 4.5: (a) Proposed 9T bit-cell in write operation, and (b) write margin performance comparison.

Writing-"1" is the worst case of this 9T bit-cell due to it is much harder to pass "1" than "0" through the two n-type MAR and MAW in series of the access buffer as shown in Fig. 4.5(a). However, the pass transistor, MNP and MNN, is OFF to improve our 9T

bit-cell write margin by breaking the positive feedback loop of the cross-coupled inverter pair. Meanwhile, the virtual ground, VVSS, is attached to VDD in write mode thus helping writing-"1" operation. The write margin of this 9T bit-cell is  $1.64 \times$  better than the 10T bit-cell with iso-area condition at 0.3V supply voltage as shown in Fig. 4.5(b). Note that the disturbance caused by virtual ground during writing-"0" operation makes the write delay increase a little.



# 4.1.1.2 Layout Considerations

Figure 4.6: Layout view of the proposed 9T bit-cell. Its size is  $1.92 \times$  larger than 6T mincell.

1896

The proposed 9T bit-cell layout is using 65nm bulk CMOS technology using logic design rules. In Fig. 4.6, the proposed 9T bit-cell size is  $2.34\mu m \times 0.83\mu m$  with 2 polypitch thin-cell style. The write-wordline, WWL, is in the direction of the bitline. Its complementary signal, WWLb, and virtual ground, VVSS, are in the direction of the wordline. The 9T bit-cell occupies  $1.55 \times$  larger area compared with an 8T bit-cell area. This is due to i) the additional pass transistors and ii) the three-transistor access buffer with 100nm channel length.

# 4.1.2 Iso-Area SRAM Bit-Cell V<sub>min</sub> Analysis

# 4.1.2.1 Iso-Area Bit-Cells

To analyze the minimum operation voltage  $(V_{min})$  for SRAM bit-cell hold/read/write operations, it is only fair to compare the bit-cells under iso-area condition. In this work,

 $2\times$  area of 6T mincell is adopted as a benchmark since the proposed 9T bit-cell consumes approximately  $1.92\times$  area as shown in Fig. 4.6. In a thin-cell layout approach, the SRAM bit-cell area is dominated by the contact and the diffusion spacing. If we expand the bit-cell area by increasing the channel length along the bitline direction, the bitline power consumption will be increased because of the larger bitline capacitance. Therefore, the best bit-cell upsizing means is increasing the device widths along the wordline since bitline power is the major part of the SRAM overall power consumption [4.16]. Monte Carlo simulations are performed using 65nm bulk CMOS technology models which include global and local process variations. Bit-cell failure probability is estimated assuming Gaussian distribution of the threshold voltage.

- 6T Iso-Area Bit-Cell: In this work, we use 6T mincell device widths of 120, 120, and 240nm for pull-up/access/pull-down transistors, respectively. For 2× larger area, the 6T mincell device widths need to be upsized by 4×. All transistors in the 6T mincell are upsized uniformly to improve the read-stability and write-ability simultaneously.
- 8T Iso-Subarray-Area Bit-Cell: The single-ended 8T SRAM designs often prefer hierarchical bitline architecture to improve performance. The architecture is adopted because of large signal sensing and evaluation-delay/noise-immunity tradeoff at the local bitline node. Thus, single-ended 8T bitcell array efficiency is about 15%-30% lower than the 6T bit-cell array design. Considering the difference in the array efficiency, the 8T iso-area bitcell V<sub>min</sub> should be evaluated at iso-subarray-area condition. In this work, any additional area increase is used for the write access transistors to improve the write-V<sub>min</sub> because the 8T bit-cell has a buffered read structure. We use 8T iso-subarray-area bit-cell device widths of 120, 360, 240, and 120nm for pull-up/access/pull-down/buffered-read transistors, respectively.
- 10T Iso-Area Bit-Cell: A 10T bit-cell [4.6] with separated read/write operation was presented to enable bit-interleaving scheme. Differential read path is designed to ensure reliable operation instead of single-ended one. It is also a read-disturb-free design. Thus, any additional area is used to upsize the write-access transistors for

iso-area comparison. The differential 10T bit-cell consumes about  $1.66 \times$  larger area compared with the 6T mincell [4.16]. In this work, we use 10T iso-area bit-cell device widths of 120, 240, 240, and 120nm for pull-up/access/pull-down/buffered-read transistors.



Figure 4.7: Hold-failure probability comparison.

189

# 4.1.2.2 Hold-Failure Probability

Hold static noise margin (HSNM) is used to quantify the hold-stability of the SRAM bit-cells. Hold-failure probability  $(P_{\text{hold-fail}})$  is estimated as

$$P_{\text{hold-fail}} = Prob.(HSNM < kT). \tag{4.1}$$

If HSNM is lower than the thermal voltage (kT=26mV at 300K), the bit-cell contents can be flipped due to thermal noise. Hold-V<sub>min</sub> is determined at the 3-sigma hold-failure probability (i.e.,  $P_{\text{hold}-\text{fail}}=10^{-5}$ ). Fig. 4.7 shows hold-failure probability versus supply voltage for 6T mincell, 6T/8T/10T iso-area bit-cell, and the proposed 9T iso-area bit-cell. As the size of the cross-coupled inverter pair in the 8T/10T bit-cell is same as that for the 6T mincell, it would result in similar hold-failure probability. Also, the HSNM of the upsizing 6T iso-area bit-cell is better than the 6T mincell. In this work, our 9T bit-cell HSNM is slightly degraded because of the pass transistor inserted into the cross-coupled inverter pair.



Figure 4.8: Read-failure probability comparison.

# 4.1.2.3 Read-Failure Probability

Similar to the hold stability case, read-stability is estimated by computing the read static noise margin (RSNM). Fig. 4.8 plots the read-failure probability variation versus supply voltage for 6T mincell, 6T/8T/10T iso-area bit-cell, and the proposed 9T iso-area bit-cell. As shown in inset, read-failure probability ( $P_{\text{read-fail}}$ ) is calculated as

$$P_{\text{read-fail}} = Prob.(RSNM < kT). \tag{4.2}$$

Read-V<sub>min</sub> is determined at the 3-sigma read failure probability (i.e.,  $P_{\text{read-fail}}=10^{-5}$ ). For 8T and 10T bit-cells, read-stability is same as the hold-stability as bit-cell nodes are not disturbed during the read operation. As explained above, the cross-coupled inverter pair size in 8T/10T bit-cell and 6T mincell are same, it would show similar hold-failure probability as shown in Fig. 4.7. It is also observed that upsizing 6T device dimensions enhance RSNM. In this work, MAW of the access buffer in our 9T bit-cell is turned off to prevent read disturb. Therefore, its read-failure probability is same as hold-failure probability.



Figure 4.9: Write-failure probability comparison.

# 4.1.2.4 Write-Failure Probability

Write-ability gives an indication of how easy or difficult it is to write to the bit-cell. Write margin (WM) is defined as  $V_{DD}$ -Min.[V(WWL)]. Min.[V(WWL)] is the minimum write-wordline voltage required for flipping the bit-cell. The higher write margin, the easier the data is written into bit-cell. Fig. 4.9 shows the write-failure probability versus supply voltage. As shown in inset, write-failure probability ( $P_{\text{write-fail}}$ ) is estimated as

$$P_{\text{write-fail}} = Prob.(WM < 0mV). \tag{4.3}$$

Write- $V_{min}$  is determined at the 3-sigma write-failure probability (i.e.,  $P_{write-fail}=10^{-5}$ ). The 10T bit-cell has higher write-failure probability than the 6T/8T bit-cells under isoarea condition because of the series access transistors. In this work, cutting off the positive feedback loop of the cross-coupled inverter pair in 9T bit-cell fairly improves its writeability.

# 4.1.2.5 Iso-Area $V_{min}$ Comparison

Table 4.2 compares the estimated  $V_{min}$  for various bit-cell topologies.  $V_{min}$  is calculated as the maximum value of Hold- $V_{min}$ , Read- $V_{min}$ , and Write- $V_{min}$ . The  $V_{min}$  of 8T and 10T bit-cells is limited by the write operation. The proposed 9T bit-cell enhances the writeability by utilizing the pass transistor within the cross-coupled inverter pair. However, HSNM/RSNM is degraded a little. Our 9T bit-cell has better  $V_{min}$  in these state-of-theart bit-cells. Note that the effect of various read/write assist techniques is not taken into account.

| Bit-Cell Topology                     | Hold $V_{min}$    | $\mathbf{Read}~V_{min}$ | Write $V_{min}$   | $\mathrm{V}_{\mathrm{min}}$ |
|---------------------------------------|-------------------|-------------------------|-------------------|-----------------------------|
| 6T mincell (1X Area)                  | $450 \mathrm{mV}$ | $1000 \mathrm{mV}$      | $826 \mathrm{mV}$ | $1000 \mathrm{mV}$          |
| 6T bit-cell (4X upsized, Iso-Area)    | $313 \mathrm{mV}$ | $599 \mathrm{mV}$       | $588 \mathrm{mV}$ | $599 \mathrm{mV}$           |
| 8T bit-cell [4.4] (Iso-Subarray-Area) | $450 \mathrm{mV}$ | $450 \mathrm{mV}$       | $591 \mathrm{mV}$ | $591 \mathrm{mV}$           |
| 10T bit-cell [4.6] (Iso-Area)         | $450 \mathrm{mV}$ | $450 \mathrm{mV}$       | $630 \mathrm{mV}$ | $630 \mathrm{mV}$           |
| This Work (Iso-Area)                  | 470mV             | 470mV                   | $415 \mathrm{mV}$ | $470 \mathrm{mV}$           |

Table 4.2: V<sub>min</sub> Comparison of Various Bit-Cell Topologies

# 4.1.3 1Kbit 9T SRAM Implementation and Measurement Results in 65nm CMOS

# 4.1.3.1 Bit-Interleaving Scheme for Soft Error Rate Reduction

Soft errors are caused by radiation of energetic particles, thermal neutrons, random noise, or signal integrity. A soft error is a signal or data which is wrong, but is not assumed to imply such a mistake or breakage. Since the circuit will work correctly again if the data is rewritten, soft errors may flip the data but not change to the circuit. There are some physical methodologies to minimize soft error rate, including judicious device design, critical node isolation using deep N-well structure, using 210Pb free chip package and substrate materials, and increasing the capacitance of critical nodes in layout geometry [4.17].

Since contiguous bit-cells could be corrupted at one radiation injection, the interleaving scheme takes a benefit that the effect of soft error will associated with different logical words. Most soft error events are single-bit errors. Such single error correction would be quite effective by properly implementing error correct code (ECC). In [4.18], ECC can reduce failure rate by over four orders of magnitude. The ECC requires system latency, throughput, and area overhead. However, a non-interleaving scheme may encounter more bit-errors in one word because of continuous bit-cells structure. A soft-error may flip adjacent multiple bits simultaneously. Therefore, more effective and complex ECC design for an acceptable reliability is required [4.19]. A better way to reduce soft error rate is to implement a SRAM bit-cell with bit-interleaving scheme.



Figure 4.10: Standard 4-to-1 bit-interleaved SRAM array.



Figure 4.11: Schematic illustration of the proposed 9T bit-cells free of write-half-select problem.

The proposed 9T bit-cell not only effectively enhances write operation robustness but also provides efficient bit-interleaving scheme to achieve soft error tolerance with simple error correction codes. A standard 4-to-1 bit-interleaved SRAM array is adopted as shown in Fig. 4.10. In order to show our 9T bit-cell free of the write-half-select problem, schematic illustration with four bit-cells in different operation modes is shown in Fig. 4.11.



Figure 4.12: HSNM distributions of write-half-selected 9T/8T bit-cells.

Monte Carlo simulations of HSNM are performed using 65nm bulk CMOS technology models which include global and local process variations. For 8T write-half-selected bitcell, its HSNM distribution is degraded by the disturbance as shown in Fig.4.12. In this work, the write-half-selected bit-cell in the same row,  $SNM_-R$ , is disturbance free because MAW is turned off by WWL. Meanwhile, the bit-cell in the same column,  $SNM_-C$ , is not affected by disturbance because MAR is turned off by WL. The HSNM distributions of them are nearly the same as the hold bit-cell,  $SNM_-Hold$ , as shown in Fig. 4.12.

4.1.3.2 1Kbit 9T Bit-Interleaved SRAM Array Implementation



Figure 4.13: Block diagram of 1Kbit 9T bit-interleaved SRAM.

The block diagram of the proposed 1Kbit 9T subthreshold SRAM is shown in Fig.

4.13. It consists of address decoders, write drivers, sense amplifiers, word-line pulse width controller, replica column, and storage element. The storage element is composed of bit-interleaved 9T SRAM array that described in the previous sections. Meanwhile, 9T SRAM replica columns for both read and write are designed to automatically adjust the word-line pulse width for PVT variation tolerance.

An address decoder of SRAM array is a device which converts an N-bit address into  $2^N$  select lines to physical words in SRAM array. Since the proposed SRAM performs 4-to-1 bit-interleaved SRAM array architecture, the last significant 2 bits of address (A[1:0]) are decoded to select the accessed columns that going to read or write. The 4 select-signals (sel\_A, sel\_B, sel\_C, and sel\_D) select interleaved bit-lines for write drivers and sense amplifiers with MUXs. The other bits of address (A[5:2]) are decoded to select the accessed row. In other words, the row decoder converts the selected address (A[5:2]) on the address bus to corresponding row address word-lines (WL and WWLb) as depicted in Fig. 4.13.



Figure 4.14: Read replica column and read pulse control circuit.

• Read Pulse Control Circuit: The word-line active time in read mode should be long enough for the sense amplifier to function reliably, but it should be turned off soon after the read operation is finished to cut off the marginal compensation current in order to reduce the power consumption. A 9T SRAM replica column and a read



Figure 4.15: Write pulse control circuit.

pulse control circuit are designed to automatically adjust the word-line pulse width for PVT variation tolerance as shown in Fig. 4.14. The replica column creates the worst case of discharging the bit-line voltage to ground (All bit-cells save "1"s). It means that it takes the longest delay time for the replica column than any other column. In this way, the replica column can generate the longest word-line pulse width needed for sense amplifier to accurately capture the read data. All of the bitcells in the replica column are hardwired to "1"s so that the R<sub>-</sub>ok pulse is generated in every read cycle. Finally, a delay line is inserted in the output of sense amplifier of the replica column to provide enough margin of word-line width for variations tolerance.

• Write Pulse Control Circuit: The write pulse control circuit is shown in Fig. 4.15. For write control signal generation, at the positive clock edge, if CEN=0 and WEN=0, the write pulse control circuit will generate a write pulse signal (WP) and disable it by the W\_ok pulse. The write pulse signal, WP, properly control the write behavior of the proposed 9T SRAM array, including word-line width control and write driver control. The word-line width is implemented as long as possible to ensure a robust write operation.

# 4.1.3.3 Measurement Results

The test chip fabricated in UMC 65nm CMOS technology contains 1Kbit (64-word by 16-bit) 9T 4-to-1 bit-interleaved SRAM design with core size  $182.25 \times 45.46 \mu m^2$  using logic design rules. The die photo and layout view are shown in Fig. 4.16. The proposed 9T bit-cell size is  $1.92 \times$  larger than the standard 6T thin cell layout based on the same

|                                                                                               | T OF OTOMT                |                           | TABLE TO: TOR ATTEMPT THOMAN ATTAIN PARTITING ATTA AATTEMPT TO |                                 |                         |
|-----------------------------------------------------------------------------------------------|---------------------------|---------------------------|----------------------------------------------------------------|---------------------------------|-------------------------|
|                                                                                               | 2008JSSC [4.4]            | 2009JSSC [4.5]            | 2009JSSC [4.6]                                                 | 2011TCAS-I [4.14]               | This Work               |
| <b>Bit-cell</b>                                                                               | 8T                        | $^{8T}$                   | 10T                                                            | 8T                              | 9T                      |
| Bit-cell Size*                                                                                | 1                         | 1.2                       | $1.26^{**}$                                                    | -                               | 1.55                    |
| #WL                                                                                           | 1RWL+1WL                  | 1WWL+1RWL                 | 1WWL+1WL                                                       | $1 \mathrm{WL}(+1 \mathrm{CS})$ | 2WWL+1WL                |
| #BL                                                                                           | 2WBL+1RBL                 | 2WBL+1RBL                 | 2BL                                                            | 2BL                             | 1BL                     |
| <b>Bit-interleaving</b>                                                                       | N                         | Ν                         | Υ                                                              | Υ                               | Υ                       |
| Technology                                                                                    | 65nm CMOS                 | 130nm CMOS                | 90nm CMOS                                                      | 65nm CMOS                       | 65nm CMOS               |
| Memory Size                                                                                   | 256Kbit                   | 64Kbit                    | 32Kbit                                                         | 8Kbit                           | 1Kbit                   |
| Chip Size                                                                                     | $1.12 \times 1.89 \ mm^2$ | $0.72 \times 0.85 \ mm^2$ | $4 \times 2 mm^2$                                              | N/A                             | $0.9 \times 0.9 \ mm^2$ |
| Operating Voltage                                                                             | 0.35V                     | 0.23V                     | 0.16V                                                          | 0.2V                            | 0.3V                    |
| Frequency                                                                                     | $25 \mathrm{kHz}$         | 100kHz                    | $0.5 \mathrm{kHz}$                                             | 41kHz                           | $909 \mathrm{kHz}$      |
| Active Power                                                                                  | $3.39 \mu W$              | $0.989 \mu W$             | $0.123 \mu W$                                                  | $0.012\mu W^{***}$              | $3.51 \mu \mathrm{W}$   |
| Power/Frequency                                                                               | 135.6 pJ                  | $9.89 \mathrm{pJ}$        | 246 pJ                                                         | $0.29 \text{pJ}^{****}$         | $3.86 \mathrm{pJ}$      |
| *: The bit-cell size is normalized to [4.4]. **: The 10T bit-cell size was reported in [4.16] | normalized to [4.4].      | **: The 10T bit-ce        | ll size was reported                                           | in [4.16]                       | •                       |

Table 4.3: Test chips measurement summary and comparison

5 ÷. ž . . . \*\*\*: It was estimated from Fig. 20 in [4.14]. \*\*\*\*: The best value was 0.13pJ at 0.3V supply voltage



Figure 4.16: Die photo and layout view for 1Kbit 9T SRAM test chip fabribated in 65nm bulk CMOS process.

design rules. Several circuits design including address decoder, wordline driver, replica column, and read/write pulse controller were presented in [4,20]. Note that a replica column of 9T SRAM and a read pulse controller are implemented to adaptively control wordline pulse width for PVT variations tolerance. The test patterns are generated from logic analyzer 16900A, and the outputs of the test chips are captured by logic analyzer and digital oscilloscope. There are 18 dies being measured, and Table 4.3 shows the test chips measurement summary. The chip can successfully operate from 33MHz at supply voltage 0.6V to 0.48MHz at supply voltage 0.27V. The average leakage current of test chips is 585nA at supply voltage 0.3V. Up to  $3.91 \times$  energy saving is achieved by scaling supply voltage from 0.6V to 0.3V. The minimum energy point is at supply voltage 0.3V takes  $3.86\mu$ W/MHz energy consumption as shown in Fig. 4.17.

# 4.1.4 Summary

A subthreshold 9T SRAM with bit-interleaving scheme fabricated in 65nm bulk CMOS process is able to operate at supply voltage 0.27V. By inserting the pass transistor into



Figure 4.17: Measured power of 1Kbit 9T SRAM versus VDD.

the cross-coupled inverter pair, the Write- $V_{min}$  can be lowered to 415mV at the 3-sigma write-failure probability. At minimum energy point 0.3V, test chip operates at 909kHz with  $3.51\mu$ W active power consumption. The proposed 9T SRAM design meets the requirements for emerging ultra low power applications.

# 4.2 Energy-Efficient 10T SRAM-based FIFO Mem-



Figure 4.18: Standard FIFO memory and its power consumption ratio.

A standard FIFO memory consists three major parts including storage elements, read/write pointers, and read/write control units. Storage elements and read/write pointers usually occupy most of power consumption of a FIFO memory. Thus, the key method of power minimization is to reduce the power consumption of them. For the consideration

of high density and low power, the SRAM-based storage elements are more suitable than registers and latches. Nevertheless, the degraded voltage margin and the increased device variability are serious challenges to near-/sub-threshold SRAMs [4.21]. A robust way of read/write pointers implementation is another major research topic of ultra-low voltage FIFO design.

For ultra-low voltage SRAM-based FIFO, the power dissipation during read/write operation is significant due to the largest capacitive on bit-line and word-line. Recently, modified read/write control circuitries with adaptive timing adjustment were presented to reduce active power and track process, voltage, and temperature (PVT) variations. In addition, the worst delay of read/write operation was considered as writing data "0" and sensing data "0". However, the worst case under PVT variations is not deterministic for single-ended scheme as the supply voltage scaling down. Hence, a worst case detector is necessary for robust ultra-low voltage operations. On the other hand, a typical way to construct the read/write pointers of FIFO memory is the utilization of ring shift registers [4.22]. However, the shift-register-based pointers account for a relatively large portion of the total power consumption due to a large number of flip-flops and long metal lines as shown in Fig. 4.18. Such design is not suitable for highly energy-constrained systems such as WBANs. In order to implement an ultra-low power FIFO memory, a counter-based pointer structure is an alternative solution for aggressive power reduction.

In Sec. 4.2.1, a 10T bit-cell capable of read/write abilities enhancement is discussed. It improves read static noise margin, and reduces write variations. Also, the bitline leakage issue in ultra-low voltage regime is reduced by it. With single-ended write scheme, our 10T bit-cell can reduce leakage and switch power during write operation. Advanced isoarea dual-port SRAM stability analysis is proposed in Sec. 4.2.2. Sec. 4.2.3 shows the implementation of a 16Kbit 10T near-threshold SRAM-based FIFO memory design in 90nm CMOS for WBANs. The summary is discussed in Sec. 4.2.4

# 4.2.1 10T Near-/Sub-threshold SRAM Bit-Cell Design

Conventional storage element for FIFO memory is a dual-port (DP) 8T bit-cell as shown in Fig. 4.19. It adds two access transistors into standard 6T bit-cell to build up



Figure 4.19: Conventional dual-port 8T bit-cell.

independent read and write paths. Such DP 8T bit-cell can provide simultaneous read and write abilities. Read and write operations of DP 8T bit-cell is similar to 6T bit-cell, but extra peripheral circuit is required. As technology and supply voltage scaling down, the conventional DP 8T bit-cell fails to maintain reliable operations. Exponential effect of threshold voltage variation, reduction of signal level, and degradation of  $I_{on}$ - $I_{off}$ -ratio are critical issues of sub/near-threshold circuitry. In detail, processing variation causes sideways offsets [4.23]. Threshold voltage shifts due to random dopant fluctuations, lineedge roughness, and local oxide thickness variations [4.24]. Furthermore, the reduction of signal level directly hurts the noise margin of logics. The degradation of the  $I_{on}$ - $I_{off}$ -ratio limits the sharing elements of the array logic such as the memory element. The combined effect of low supply voltage and process variation results in memory operation failure such as read disturb, write failure, and bit-line leakage.

To solve above mentioned problems, several effective techniques have been proposed. Read buffer could eliminate read-disturb [4.25–4.27]. Write-ability could be improved by lowing cell supply voltage [4.28], or by boosting the write-word-line voltage for the access transistors [4.6], or by applying negative voltage on write-bit-line [4.29]. Bit-line leakage could be reduced by altering bit-cell topologies [4.25, 4.26], or by pulling the feet of all the unaccessed read-buffers up to VDD [4.4] to mitigate bit-line leakage. However, the above techniques require additional peripheral circuitries and overhead power. However,

A robust 10T SRAM bit-cell shown in Fig. 4.20 is proposed to be the storage element



Figure 4.20: Proposed dual-port 10T bit-cell.

of near-/sub-threshold FIFO memory. It improves read static noise margin (SNM), and reduces write variations. It also reduces bit-line leakage in near-/sub-threshold voltage regime. With single-ended write port scheme, WBL, our 10T bit-cell can reduces leakage and switch power during write operation. It also consists of a cross-coupled inverter pair, a write access transistor (MN1), a pass transistor (MP1), and a decoupled read-out structure (MP2, MN2, MN3, MN4). The MP1 is utilized to cut-off feedback loop of the inverter pairs, and eliminates the voltage dividing effect between MN1 and inverter pair during write operation. In order to reduce leakage currents, all of the MOSFETs are high- $V_t$  devices except MN1 and MP1. The regular  $V_t$  device can reduce  $V_t$  loss through MN1 and MP1 to improve the hold SNM and write margin.

### 4.2.1.1 Layout Considerations

For improving mismatch and dimension control, the proposed DP 10T bit-cell regular layout is design as "straight line layout" which can facilitate lithography and reduce sensitivity to overlay errors. As show in Fig. 4.21, the cell layout is shaped in thin cell style; therefore, the length of bitline can be shorter to reduce the equivalent RC value. The cell is design in UMC 90nm bulk CMOS standard process technology. Four metal layers are utilized in the bit-cell layout. VDD and GND are routed in second and third metal layer, Read/Write bitlines (RBL/WBL) are routed in third metal layer and read/write word-lines (RWL/WWL) are routed in fourth metal layer. For performance comparison, the conventional 8T bit-cell and several state-of-the-art bit-cells are used as shown in Fig.



Figure 4.21: Layout view of the proposed dual-port 10T bit-cell in UMC 90nm CMOS technology.

4.22. They all have dual-port structure to enable them as a candidate of FIFO memory storage element.

# 4.2.1.2 Read Ability Improvement

In read mode, read-wordline (RWL) is set to "High," while read-bitline (RBL) is precharged to "High" before the bit-cell is accessed as shown in Fig. 4.23(a). Thus, MP2is turned off while MN2 and MN4 are turned on. Depending on the cell data, the RBL is conditionally discharge to GND through MN2, MN3 and MN4. Therefore, the proposed SRAM bit-cell can keep the storage node away from noise disturbance and enlarge the read SNM as large as the hold SNM. As shown in Fig. 4.23(b), our 10T bit-cell has much better Read SNM than the convention DP 8T.

Fig. 4.24 shows the distribution of read SNM in Monte Carlo simulation (100,000 times). Although the proposed 10T bit-cell has minor SNM drop ( $\Delta\mu$ =18mV,  $\Delta\sigma$ =6.3mV) due to *MP*1, the proposed 10T bit-cell has 1.9X read SNM and better variation immunity comparing with conventional DP 8T bit-cell.



Figure 4.22: Prior dual-port SRAM bit-cells configurations.



Figure 4.23: (a) Proposed 10T bit-cell in read operation, and (b) Read SNM comparison in read mode.

# 4.2.1.3 Write Ability Improvement

In write mode, write-wordline (WWL) is set to "High," while WBL is precharged to "High" before the cell is accessed. As shown in Fig. 4.25, WWL turns on MN1 and turns off MP1 simultaneously. Thus, the write-in data passes through MN1, inverter A and B to the node VC. The proposed scheme can cut off the positive feedback loop of inverter pairs. Our 10T bit-cell enlarges write margin without any peripheral circuit especially in near-/sub-threshold regions.

Fig. 4.26 shows the distribution of write margin at supply voltage of 0.4V in Monte



Figure 4.24: Read SNM distributions of Monte Carlo simulations (100,000 times).



Figure 4.25: Proposed 10T bit-cell in write operation.

Carlo simulations (100,000 times) Noted that write margin is defined as the minimum word-line voltage required to flip the cell data. Our 10T bit-cell has 3.2X write margin and better variation immunity comparing with other SRAMs.

# 4.2.1.4 Bitline Leakage Reduction

In hold mode, the proposed 10T bit-cell eliminates the data-dependent bit-line leakage by turning on MP2. The drain voltage of MP2 becomes VDD and forces the leakage current to flow from the cell into RBL regardless the cell data as shown in Fig. 4.27(a). The hold SNM is very similar to conventional DP 8T bit-cell as shown in Fig. 4.27(b).

Fig. 4.28 shows the simplified view of proposed data-independent bitline leakage reduction scheme. The logic low is decided by the balance between the pull up leakage current of unaccessed cells and the pull down read current of the accessed cells. The logic



Figure 4.26: Write margin distributions of Monte Carlo simulations (100,000 times).



Figure 4.27: (a) Proposed 10T bit-cell in hold operation, and (b) Hold SNM comparison in hold mode.

1896

high level is close to VDD because both bitline leakage current and cell current are pulling up the RBL. Consequently, the sensing margin is improved significantly especially in high temperature environment.

To verify our leakage reduction scheme, our 10T bit-cell is simulated in the worst case scenario, e.g. FF corner with 256-bit/read-bitline. The sensing margin of single-ended (SE) 8T [4.27] drops from 300mV at 0°C to zero at 50°C as shown in Fig. 4.29. Meanwhile, a buffer footer can be attached to the SE 8T bit-cell [4.4] for performance enhancement. For our 10T bit-cell, we have 6% better temperature variation tolerance than SE 8T bit-cell with additional peripheral circuit as shown in Fig. 4.29.



Figure 4.28: Data-independent bitline leakage reduction scheme.



Figure 4.29: Sensing margin comparisons under the worst case scenario.

# 4.2.2 Iso-Area Dual-Port SRAM Bit-Cell $V_{min}$ Analysis

# 4.2.2.1 Iso-Area Bit-Cells

The proposed dual-port (DP) 10T SRAM bit-cell consumes  $1.8 \times$  and  $1.98 \times$  area overhead comparing with the conventional DP 8T bit-cell and single-ended (SE) 8T bitcell [4.27] respectively. The V<sub>min</sub> analysis can only be fair when all the SRAM bit-cells under iso-area condition. Using identical standard process design rules, the minimum size layout views of the conventional DP 8T and SE 8T bit-cells are shown in Fig. 4.30(a) and Fig. 4.30(b) respectively. In order to enlarge the mincells to have the same area as our 10T bit-cell, the upsizing direction of bit-cell and subarray efficiency are required to be



Figure 4.30: Thin-cell layout style (a) conventional DP 8T mincell, and (b) SE 8T mincell.

discussed. The vertical dimension along the bitline is unchanged ( $2 \times$  poly-pitch) because of the bitline capacitance. Thus, all the bit-cell area upsizing are in the lateral direction to affect wordline capacitance only. It can minimize the power usage increasing since the switch power consumption is mainly consumed by bitline. For subarray efficiency, the SE 8T bit-cell designs often prefer hierarchical bitline architecture to improve performance. Thus, SE 8T bit-cell array efficiency is about 15%-30% lower than the conventional DP 8T bit-cell. For this work, only single sensing is used to enhance the subarray efficiency. After calculation as listed in Table 4.4, the conventional DP 8T bit-cell and SE 8T bit-cell require to upsize to  $1.69 \times$  and  $1.21 \times$  respectively.

Using UMC 90nm bulk CMOS technology, the conventional DP 8T mincell devices width are 200, 200, 200, and 400nm for pull-up/write-access/read-access/pull-down transistors respectively. To achieve iso-area condition, the write-access transistors of the DP 8T bit-cell are upsized by  $4.5\times$ , and all the other transistors of the DP 8T bit-cell are upsized by  $2\times$  as shown in Fig. 4.31(a). Meanwhile, the write-access transistors of the SE 8T bit-cell are upsized by  $4\times$ , and the read-access transistors of the SE 8T bit-cell are upsized by  $4\times$ , and the read-access transistors of the SE 8T bit-cell are upsized by  $4\times$ , and the read-access transistors of the SE 8T bit-cell are upsized by  $2\times$  as shown in Fig. 4.31(b). On the other hand, the single-ended 10T bit-cells [4.25, 4.26] occupy  $1.4\times$  area compared with the DP 8T mincell area. Thus, their



Figure 4.31: Thin-cell layout style (a) conventional DP 8T iso-area bit-cell, and (b) SE 8T iso-area bit-cell.

|                         | DP 8T  | SE 8T             | This Work |
|-------------------------|--------|-------------------|-----------|
| Bit-cell area           | 1X     | $0.91 \mathrm{X}$ | 1.8X      |
| No. of bit-cells        | Ν      | Ν                 | Ν         |
| Total bit-cell area     | NX     | 0.91NX            | 1.8NX     |
| Subarray efficiency     | 70%    | 50%               | 85%       |
| Peripheral circuit area | 0.43NX | 0.91NX            | 0.32NX    |
| Total subarray area     | 1.43NX | 1.82NX            | 2.12NX    |
| Iso-area factor         | 1.69   | 1.21              | 1.8       |

Table 4.4: Iso-area calculation considering subarray efficiency

write-access transistors are upsized by  $3 \times$  for iso-area condition. Various devices sizing of the conventional DP 8T, SE 8T, SE 10, and our proposed 10T bit-cells are listed in Table 4.5. (Their block diagrams are shown in Fig. 4.22.)

| Topology          | NA/NB | PA/PB | AXR1/2 | AXW1/2 | P1/P2 | N1/N2 | N3/N4 |
|-------------------|-------|-------|--------|--------|-------|-------|-------|
| DP 8T (mincell)   | 400   | 200   | 200    | 200    | -     | -     | -     |
| DP 8T (iso-area)  | 800   | 400   | 400    | 900    | -     | -     | -     |
| SE 8T (mincell)   | 400   | 200   | 200    | 200    | -     | -     | -     |
| SE 8T (iso-area)  | 400   | 200   | 400    | 800    | _     | -     | -     |
| SE 10T (mincell)  | 400   | 200   | -      | 200    | -/200 | -/200 | 200   |
| SE 10T (iso-area) | 400   | 200   | -      | 600    | -/200 | -/200 | 200   |
| This Work         | 200   | 200   | _      | -      | 200   | 200   | 200   |

Table 4.5: Device sizing for various bit-cell topologies

#### 4.2.2.2 Hold-Failure Probability

Hold static noise margin (HSNM) is used to quantify the hold-stability of the SRAM bit-cells. Hold-failure probability  $(P_{\text{hold-fail}})$  is estimated as

$$P_{\text{hold-fail}} = Prob.(HSNM < kT). \tag{4.4}$$

If HSNM is lower than the thermal voltage (kT=26mV at 300K), the bit-cell contents can be flipped due to thermal noise. Hold-V<sub>min</sub> is determined at the 3-sigma hold-failure probability (i.e.,  $P_{\text{hold-fail}}=10^{-5}$ ). Fig. 4.32 shows hold-failure probability versus supply voltage for DP 8T, SE 8T, SE 10T, and our proposed 10T iso-area bit-cells. The width of transistors of DP 8T bit-cell upsizing can gain robust inverter characteristics. Thus, the HSNM of the DP 8T iso-area bit-cell is better than the DP 8T mincell. In this work, our 10T bit-cell HSNM is slightly degraded because of the pass transistor inserted into the cross-coupled inverter pair.

### 4.2.2.3 Read-Failure Probability

Similar to the hold stability case, read-stability is estimated by computing the read static noise margin (RSNM). Fig. 4.33 plots the read-failure probability variation versus supply voltage for DP 8T, SE 8T, SE 10T, and the proposed 10T bit-cells. As shown in inset, read-failure probability ( $P_{\text{read-fail}}$ ) is calculated as

$$P_{\text{read-fail}} = Prob.(RSNM < kT). \tag{4.5}$$



Figure 4.32: Hold-failure probability comparison.

Read-V<sub>min</sub> is determined at the 3-sigma read failure probability (i.e.,  $P_{\text{read-fail}}=10^{-5}$ ). Due the read disturb noise free scheme, SE 8T iso-area bit-cell has better read-stability than all the other bit-cells. The read-failure probability is same as the hold-stability as bit-cell nodes are not disturbed during the read operation. In the work, MP2 is turned off as shown in Fig. 4.23 to provide read disturb isolation. With the buffer, our 10T bit-cell has better read-failure probability than the conventional DP 8T iso-area bit-cell.

### 4.2.2.4 Write-Failure Probability

Write-ability gives an indication of how easy or difficult it is to write to the bit-cell. Write margin (WM) is defined as  $V_{DD}$ -Min.[V(WWL)]. Min.[V(WWL)] is the minimum write-wordline voltage required for flipping the bit-cell. The higher write margin, the easier the data is written into bit-cell. Fig. 4.34 shows the write-failure probability versus supply voltage. As shown in inset, write-failure probability ( $P_{\rm write-fail}$ ) is estimated as

$$P_{\text{write-fail}} = Prob.(WM < 0mV). \tag{4.6}$$

Write- $V_{min}$  is determined at the 3-sigma write-failure probability (i.e.,  $P_{write-fail}=10^{-5}$ ). As shown in Fig. 4.34, the write-failure ability of the proposed 10T iso-area bit-cell is



Figure 4.33: Read-failure probability comparison.

simulated in two different conditions, including write-1 and write-0 operations. The worst case of our bit-cell is the write-0 case. Although the access transistors upsizing of DP 8T, SE 8T, and SE 10T iso-area bit-cell can gain lower write-failure probability, they require additional write assiste techniques to have better write-failure probability like our bit-cell. It means the pass transistor, MP1, used for cutting off the inverter fair is an effective topology for ultra-low voltage dual-port SRAM bit-cell design.

# 4.2.2.5 Iso-Area V<sub>min</sub> Comparison

Table 4.6 compares the estimated  $V_{min}$  for various bit-cell topologies.  $V_{min}$  is calculated as the maximum value of Hold- $V_{min}$ , Read- $V_{min}$ , and Write- $V_{min}$ . The  $V_{min}$  of SE 8T and SE 10T bit-cells is limited by the write operation. The proposed 10T bit-cell enhances the write-ability by utilizing the pass transistor within the cross-coupled inverter pair. However, HSNM/RSNM is degraded a little. Our 10T bit-cell has better  $V_{min}$  in these state-of-the-art bit-cells. Note that the effect of various read/write assist techniques is not taken into account.



Figure 4.34: Write-failure probability comparison.

| Bit-Cell Topology          | Hold $V_{min}$    | Read $V_{min}$    | Write $V_{min}$   | $V_{\min}$        |
|----------------------------|-------------------|-------------------|-------------------|-------------------|
| DP 8T mincell (1X Area)    | $334 \mathrm{mV}$ | 788mV             | $746 \mathrm{mV}$ | $788 \mathrm{mV}$ |
| DP 8T bit-cell (Iso-Area)  | $250 \mathrm{mV}$ | $574 \mathrm{mV}$ | $508 \mathrm{mV}$ | $574 \mathrm{mV}$ |
| SE 8T mincell $[4.27]$     | $334 \mathrm{mV}$ | $334 \mathrm{mV}$ | $746 \mathrm{mV}$ | $746 \mathrm{mV}$ |
| SE 8T bit-cell (Iso-Area)  | $334 \mathrm{mV}$ | $334 \mathrm{mV}$ | 511mV             | $511 \mathrm{mV}$ |
| SE 10T mincell [4.26]      | $334 \mathrm{mV}$ | $334 \mathrm{mV}$ | $746 \mathrm{mV}$ | $746 \mathrm{mV}$ |
| SE 10T bit-cell (Iso-Area) | $334 \mathrm{mV}$ | $334 \mathrm{mV}$ | $550 \mathrm{mV}$ | $550 \mathrm{mV}$ |
| SE 10T mincell $[4.25]$    | $334 \mathrm{mV}$ | $334 \mathrm{mV}$ | $746 \mathrm{mV}$ | $746 \mathrm{mV}$ |
| SE 10T bit-cell (Iso-Area) | $334 \mathrm{mV}$ | $334 \mathrm{mV}$ | $550 \mathrm{mV}$ | $550 \mathrm{mV}$ |
| This Work (Iso-Area)       | $398 \mathrm{mV}$ | $398 \mathrm{mV}$ | $335 \mathrm{mV}$ | $398 \mathrm{mV}$ |

Table 4.6: V<sub>min</sub> Comparison of Various Bit-Cell Topologies

# 4.2.2.6 Leakage Current Analysis

Fig. 4.35 shows the comparison of leakage current versus supply voltage with the conventional DP 8T bit-cell, the SE 8T bit-cell [4.27], two SE 10T bit-cells [4.25, 4.26], and the proposed 10T SRAM bit-cell. Since our 10T bit-cell has single-ended writebitline and single-ended read-bitline, which lessens the leakage path to VDD or ground. Threshold voltage  $V_t$  of the pass transistor (MP1) may result in some raised voltage in VR and increasing leakage power consumption when the VR is in "0" state. However, such event does not have significantly impact on leakage power consumption. Thus, the proposed 10T bit-cell consumes the least leakage power among the conventional DP 8T bit-cell and the selected bit-cells.



Figure 4.35: Write-failure probability comparison.

# 4.2.3 16Kbit Near-threshold SRAM-based FIFO memory in 90nm CMOS for WBANs

The block diagram of the proposed 16Kbit near-threshold SRAM-based FIFO memory is shown in Fig. 4.36. Due the first-in first-out data behavior, an adaptive power control circuit is proposed in Sec. 4.2.3.1. It can turn off the power supply of the read-out words to minimize the leakage power consumption. Our 10T bit-cell presented in Sec. 4.2.1 is adopted for robust ultra-low voltage operations. Also, a counter-based pointer structure and smart replica read/write control units are implemented in Sec. 4.2.3.2 and Sec. 4.2.3.3 respectively.

# 4.2.3.1 Adaptive Power Control Unit

The key idea of leakage power minimization is to reduce voltage swing on un-functioning hardware. To take Fig. 4.37 for example, grey blocks represent words that contain data, while white blocks represent words that are empty. Empty words does not need data reten-



Figure 4.36: Block diagram of the proposed 16Kbit SRAM-based FIFO memory.

tion ability, thus, the don't-care word can be power gated for leakage power minimization. Since the status of all the words in FIFO memory is predicable due to first-in first-out data behavior, an adaptive power control system capable of cutoff the power supply of don't-care words can be utilized to efficiently reduce leakage power consumption.



Figure 4.37: FIFO memory operation example.

A finite state machine (FSM) as shown in Fig. 4.38(a) is designed to generate a control signal, *power\_on*, for turning on/off power MOS. In the beginning, each word is in cutoff state. Whenever the accessed word is going to write, it changes to active state and the cell supply of the word is charged to VDD. Each written word stays in active state until the word data is read out. As shown in Fig. 4.38(b), the power gating circuit is inserted

in each word of the proposed FIFO memory. The leakage current of each don't-care word is suppressed by adaptively turning off its power MOS.



Figure 4.38: (a) Adaptive power control finite state machine, and (b)  $(i + 1)_{th}$  word of storage element.

#### 4.2.3.2 Counter-based Pointer Structure

The independent read pointer and write are used as the address pointer that select the accessed word in the FIFO memory. A typical way to construct the read/write pointers of FIFO memory is the utilization of ring shift registers. However, as the depth of FIFO increases, the flip-flops and long metal lines of shift-register-based pointers increase exponentially. Such design is no longer suitable for highly energy-constrained systems, e.g. WBANs. A counter-based pointer structure is proposed to construct energy-efficient pointers as shown in Fig. 4.39. Since system performance is not the major concern anymore, the delay of read/write pointers caused by it is acceptable. A synchronous counter is used to provide clock pulse counting without skew problem. As supply voltage decreases, the most energy efficient flip-flop architecture depends on switching probabilities, where PowerPC achieves better EDP at low activities [4.30]. Therefore, the  $C^2MOS$ -based flip-flop is chosen to be the basic element.



Figure 4.39: Block diagram of the proposed counter-based pointer.

In the proposed counter-based pointer, the N-bit address (A0 AN-1) is generated by a synchronous counter which is triggered by the clock as shown in Fig. 4.40(a). The N-bit address is decoded to  $2^{N}$ -1 bits using an N-to-2N decoder. Therefore, in these  $2^{N}$  bits, only one bit is asserted as the selected wordline, and the timing is controlled properly by the signal *Pulse* which is generated by smart replica control unit. For hardware, this counter-based pointer only needs seven registers and seven long address lines (A0, A1K AN-1) shared with four decoders. In addition, every two blocks are shared with a decoder in order to reduce the amount of them. Thus, the registers and long metal lines in counter-based pointer are less than those in shift-register-based pointer. It can reduce power consumption by 34% of read/write pointers as shown in Fig. 4.40(b).



Figure 4.40: The synchronous counter-based pointer (a) schematic view, and (b) power consumption comparisons.

#### 4.2.3.3 Smart Replica Read/Write Control Units

As the process variation increases seriously with scaling down the supply voltage, the worst case of write operation is uncertain for single-ended write port scheme. Because of threshold voltage of the access transistor, the delay time of write"1" is sensitive to the variation of process and temperature as shown in Fig. 4.41. The worst case of write delay (write "0" and write "1") is varied with different process corner and temperature. Therefore, the proposed adaptive replica management unit is utilized to detect the worst case to ensure robust write operation and reduce active power consumption.



Figure 4.41: SRAM write delay in different process corner and temperature.

1896

The adaptive replica management unit consists of a 10T SRAM replica column, a read window control circuit, and a write window control circuit, as shown in Fig. 4.42. The cell data of replica cell is fixed at logic "1" by wiring the  $VC_{rp}$  to VDD. Because the  $WBL_{rp0}$  and  $RBL_{rp}$  of replica column are shared with read and write window control circuit, respectively. The proposed adaptive replica management unit only needs one replica column.

In read mode, read-pulse signal (RP) initially enables the accessed read-wordline  $(RWL_i)$  of the SRAM array and sense amplifier including the sense amplifier in replica column. The read-wordline (RWL) active time should be long enough for the sense amplifier to function reliably, but it should be turned off soon after the read operation is finished. It cuts off the marginal compensation current in order to reduce the power consumption. For read tracking, the set data for replica cells duplicates the worst case data pattern for discharging RBL to ground. Therefore, the read window control circuit



Figure 4.42: Proposed smart replica read/write control units.

utilizes  $RBL_{rp}$  to track the sense behavior across various PVT conditions. As soon as RBL are discharged to ground, signal R - ok would be triggered and disable WP. Thus, the read window control circuit can adaptively control read window.

In a write operation, write-pulse signal (WP) initially enables the accessed writewordline  $(WWL_i)$  of the SRAM array and the write-wordline  $(WWL_{rp})$  of the duplicate bit-cell of the replica column. WP additionally turns on enable signals of all write drivers including the write driver in replica column. The data, Din [15:0], are then written to the accessed word. At the same time, "0" and "1" is written to the in the replica write cell, respectively. For write tracking, the worst case detector detects two conditions in write operation: write "0" and write "1". The set data in replica column creates the worst case of discharging  $WBL_{rp0}$  to ground to write "0" as the major effect of write "0" is the capacitance and leakage of bit-line. On the other sides, The  $WBL_{rp1}$  is hold at VDD to write "1" as the profound effect of write "1" is the strength of MN1. The proposed worst case detector provides information to tell whether write "0" or write "1" is the worst case. After replica data is written to the duplicate bit-cell successfully, W - ok, delayed by an inverter delay line for wider window margin and disables WP. Accordingly, the adaptive replica management unit can guarantee the sufficient write window in different PVT environment.

## 4.2.3.4 Implementations and Simulation Results

Fig. 4.43 shows the floorplan and layout views of the proposed 16Kb SRAM-based FIFO memory. It is implemented in UMC 90nm CMOS technology. The specifications are shown in Table 4.7. At 0.4V supply voltage, the maximum read and write frequencies achieve 3.05MHz and 5.5MHz. According to the requirement of WBANs, the read/write frequencies of FIFO are set to be 625kHz/50kHz. With 0.4V supply voltage and the frequency of specification, the proposed design consumes  $2.1\mu$ W in average per read/write access.

| Technology            | UMC 90nm CMOS                                     |  |
|-----------------------|---------------------------------------------------|--|
| Memory Size           | 16Kbit (1024×16-bit)                              |  |
| Supply Voltage        | $0.4\mathrm{V}$                                   |  |
| Max. Read/Write Freq. | $3.05 \mathrm{MHz}/5.5 \mathrm{MHz}$              |  |
| Operating Temp.       | -20°C-80°C                                        |  |
| Average Power         | $2.1 \mu W$                                       |  |
| Active Area           | $666 \mu m \times 508 \mu m (338 \mu m^2 / word)$ |  |

Table 4.7: V<sub>min</sub> Proposed 10T SRAM-based FIFO memory



Figure 4.43: (a) Floorplan and layout views of our 16Kbit 10T SRAM-based FIFO memory, and (b) power reduction ratio by the proposed energy-efficient techniques.

## 4.2.4 Summary

Sec. 4.2 has presented a 16Kb robust near-/sub-threshold asynchronous SRAM-based FIFO memory in UMC 90nm standard bulk CMOS technology. A 10T SRAM bit-cell is proposed to provide the advantage of read SNM enhancement, write margin improvement, and bit-line leakage reduction. Using the adaptive power control circuit, counter-based pointer, and smart replica read/write control unit, these techniques result in a 57% reduction in total power consumption and tracking the worst case under the serious PVT variations. All the above presented FIFO memory design techniques enable energy-efficient and robust operation for WBAN applications.

## Chapter 5

# **Dynamic Voltage Frequency Scaling**

# Platform



Figure 5.1: Micro-watt wireless wearable healthcare ECG microsystem block diagram

Driven by the growing demands on battery-operated or self-powered mobile applications, high energy efficiency becomes an important design issue. For most scenarios, energy harvested from the ambient is in the orders of micro-watts, necessitating the circuits to be very efficient in terms of energy consumption [5.1]. The growing aging population and skyrocketing healthcare costs are the main driving forces to propel the fundamental transformation of the current hospital-centered healthcare system. The cost-effective and responsive ways to deliver healthcare services are individual-centered system. The new paradigm of personal-Health aims to wireless wearable healthcare microsystem. The target micro-watt wireless personal healthcare ECG microsystem block diagram is shown in Fig. 5.1. It utilizes an ECG-based wireless sensor (WiBoC chipset), and transmit those signals to a mobile phone that has an embedded expert system integrated. With the aid of 3G system, a remote healthcare monitoring center receives those uploaded signals and stores into the application server. According to the events from this server, the people on service responses to the emergency and takes the corresponding process [5.2]. The major features of the system includes:

- Capacity of persons on service: 10000
- Monitoring update: every 6 second
- Service time: 24 hours/day; 365 days/year
- $\bullet\,$  Service coverage: Taiwan area corresponds to GSM/3G coverage area
- Start of abnormal EKG message to start of hospital response: 2 minutes
- Event summary report: every 1 month
- Event trace-back duration (by user): 1 year
- Event trace-back duration (by system storage): 20 year
- GPS positioning: update every 1 to 10 minutes (optional) with assisted GPS system
- Monitoring items: single-lead ECG; positioning
- Device size for body monitoring: 4cm x 5cm x 2cm
- WiBoC chipset battery duration: continuous 1 days (rechargeable)
- Accumulated number of persons on testing: 150
- Cooperation hospital: Taipei medical university/ Wan-Fang medical center

Also, the WiBoC wireless chipset specification is as follows:

- Form factor: SiP/PoP on flexible substrate
- Channel: 1395-1400MHz (WMTS, FCC compliance)

- Bandwidth: 5MHz
- Information rate: 4kbps (16bits, 250Hz)
- System information rate: 960kbps (additional 240kbps system overhead)
- Maximum data rate: 7.27Mbps (uplink/downlink)
- WSN duty cycle: 0.46
- Multiple access: time division multiple access (TDMA)
- Modulation: OFDM/QPSK
- Power consumption: 31.4mW/24.4mW (TX/RX, exclude ADC)
- Operation duration: more than 1 week (600mAH battery)

The proposed microsystem is self-powered by photovoltaic (PV) cell and rechargeable battery. The supply voltage for entire system is below 0.5V, and system-in-package (SiP) or three-dimensional integrated circuit (3D-IC) will be applied to ensure wearability. Advances in sub-threshold circuit design have recently demonstrated capabilities compatible with aggressive energy consumption reduction. However, the drawback of sub-threshold design is that the increased energy efficiency comes at the cost of performance loss. In order to allocate resources effectively for the systems with time-varying throughput constraint, dynamic voltage frequency scaling (DVFS) platform is a popular solution to have energy efficiency and performance concurrently. In other words, if the throughput constraint is cycling between different operating modes, adjusting the supply voltage for the requirements of each mode can provide significant energy savings.



Figure 5.2: A wireless sensor node with two operating modes: *Low-power Mode* and *High-performance Mode*.

One of the systems with time-varying throughput constraint is healthcare monitoring wearable body area sensor networks (WBANs) driven by growing aging population worldwide [5.3]. Wearable medical microsystems have been recognized as an enabling technologies for continuous and noninvasive measurements of vital signs, e.g. electrocardiogram (ECG), heart rate (HR), and blood pressure (BP). In order to achieve long system lifetime from limited energy sources, DVFS scheme is suitable for it. Fig. 5.2 shows a scenario for a wireless wearable healthcare ECG sensor node with two different operating modes: *Low-power Mode* and *High-performance Mode*. Sensor node stays mostly in *low-power mode* throughput its lifetime during which it records human ECG signals at a very low rate. However, during short time intervals, real-time data acquisition and transmission for professional ECG analysis is provided by switching to *high-performance mode*. In this chapter, a dynamic voltage scaling 8T-SRAM-based FIFO is designed for operation in both near-threshold and sub-threshold regions as a demonstration DVFS platform.

## 5.1 Near-/Sub-threshold Robust 8T SRAM Design

Ubiquitous personal healthcare inspection (uPHI) in a wireless body area network (WBAN) requires continuous signal monitoring up to several days or even longer. To enable cable-free body monitoring with micro-watt biomedical acquisition devices, a powerefficient asynchronous SRAM-based FIFO is required for long-term physiological conditions storage. Due to the loose timing constraint of the WBAN sensor node, ultra-low supply voltage is suggested to be an effective method to minimize active energy [5.4]. However, SRAMs operated in weak inversion region are several orders of magnitude lower than in strong inversion region. They are also far more sensitive to threshold voltage because currents vary exponentially with it. Correspondingly, robustness and reliability of SRAMs are major design considerations for ultra-low voltage design.

Fig. 5.3 shows proposed 8T SRAM cell. All MOSFETs are high  $V_t$  devices to reduce leakage power except the write-assisted pass transistor (MNP) within the inverter pair. The MNP is a regular  $V_t$  device adapted to cut off the positive feedback loop of the inverter pair during write operation. The write pass transistor (MNA) utilizes reverse short channel effect (RSCE) to reduce write operation variation because local  $V_t$  variation,  $\sigma V_t$ , is inverse proportional to the square root of the product of channel length and width [5.5]. They can widely enlarge the write margin ( $\mu$ =166mV@0.4V) of this 8T SRAM. Also,



Figure 5.3: Proposed 8T SRAM bit-cell

single write-bitline scheme is adapted to reduce active power during write operation.



Figure 5.4:  $V_t$ ,  $I_{on}$ - $I_{off}$ -ratio, and delay versus channel length of proposed 8T SRAM bit-cell

Meanwhile, the read buffer (MRA-MRB) utilizes RSCE to improve  $I_{read}$ - $I_{leakage}$ -ratio  $(I_{on}$ - $I_{off}$ -ratio) because  $V_t$  is decreasing when channel length is increasing. Higher  $I_{read}$ - $I_{leakage}$ -ratio enhances not only the capacity of each read-bitline but also the margin of the sense amplifiers. Optimal channel length for better  $I_{on}$ - $I_{off}$ -ratio (4.1X enhancement) and minimum delay is equal to 100nm as shown in Fig. 5.4. The MRA-MRB also keeps

the storage node away from disturb noise and eliminates the conventional dual-port 8T SRAM read SNM limitation. The read SNM of this 8T SRAM ( $\mu$ =117mV@0.4V) is as good as previous read-buffered SRAMs even with the *MNP*.

## 5.1.1 Basic Operations



Figure 5.5: Hold mode of proposed 8T SRAM bit-cell

For successful ultra-low voltage SRAM operation, various SRAM bit-cell topologies have been presented, such as 8T [5.6] and 10T [5.7,5.8] schemes. Both read-wordline and write-wordline are "0" to turn off access transistors, *MNA* and *MRA*, in hold mode as shown in Fig. 5.5. Although proposed 8T SRAM bit-cell has a pass transistor, *MNP*, between the inverter pair, hold SNM performance is still as good as conventional dual-port SRAM and state-of-the-art SRAMs.



Figure 5.6: Read mode and butterfly curve of proposed 8T SRAM bit-cell

Read-buffer structure (MRA-MRB) is adapted in proposed 8T SRAM bit-cell similar



Figure 5.7: The distributions of read SNM of Monte Carlo simulation

to [5.6][5.7]. The structure keeps the storage node away from disturb noise, and eliminates the read SNM limitation in conventional dual-port SRAM. The write-assisted pass transistor (MNP) for write ability enhancement causes 7mV read SNM drop compared to [5.6][5.7] as shown Fig. 5.6. It also has a slight impact on the distribution of read SNM (Monte Carlo simulation 10000 times) as shown in Fig. 5.7.



Figure 5.8: Read-bitline leakage reduced by read-buffer-footers

To further improve of  $I_{read}$ - $I_{leakage}$ -ratio, read-buffer-footers are attached to each readbuffer in all rows of the SRAM array as shown in Fig. 5.8. During read operation, all feet of read-buffers, vvss, remain at VDD except the accessed word. For feet of accessed bit-cells, the voltage is pulled to GND. Therefore, unwanted leakage current of the readbuffers can be reduced. On the other hand, the advantages of high drive current, low device capacitance, less sensitivity to random dopant fluctuations, and better subthreshold swing can be provided by employing RSCE on circuits. The length of read-buffer-footers having best RSCE are set to be 100nm (shown in Fig. 5.4). Also, a hierarchical read-bitline with



Figure 5.9: (a) Hierarchical read-bitline scheme with footer in global read-bitline, and (b)  $I_{read}$ - $I_{leakage}$ -ratio of 512-bit(dot-line)/32-bit(solid-line) per read-bitline with/without RSCE and read-buffer-footer

RSCE footer scheme is applied as shown in Fig. 5.9(a) for sensing margin enhancement and process, voltage, and temperature variations toleration in ultra-low voltage region. As shown in Fig. 5.9(b), both 512-bit/read-bitline and 32-bit/read-bitline utilizing RSCE and read-buffer-footer techniques can have 10X larger  $I_{read}$ - $I_{leakage}$ -ratio than those without them.

During write operation, write-wordline is set to be VDD to turn on write pass transistor (MNA), and turn off write-assisted pass transistor (MNP). The equivalent circuit of the proposed 8T SRAM bit-cell in write operation is shown in Fig. 5.10. The write-assisted pass transistor (MNP) widely enlarges the write margin ( $\mu$ =166mV@0.4V) by cutting off the positive feedback loop of the inverter pair. Moreover, the MNP along with the MNA utilized RSCE can have 68.8% reduction in write margin variation compared with conventional dual-port SRAM as shown in Fig. 5.11(a). It shows the distribution of proposed 8T SRAM write margin at supply voltage of 0.4V in Monte Carlo simulation 10000 times. In addition to write ability and write stability improvement, write delay of proposed 8T SRAM bit-cell has 46.5% enhancement compared with conventional dual-port SRAM at supply voltage of 0.4V. The write delay is equivalent to the propagation



Figure 5.10: Equivalent circuit of the proposed 8T SRAM bit-cell in write operation



Figure 5.11: (a) The distributions of write margin performing Monte Carlo simulation, and (b) write delay performance comparison

delay from write-bitline through MNA, Q, and  $Q_b$  to  $Q_c$ . As supply voltage getting lower, the enhancement of write delay is getting better as shown in Fig. 5.11(b).

## 5.1.2 Layout Considerations

The layout view of proposed 8T SRAM bit-cell is shown in Fig. 5.12. The longer channel length in transistors utilizing RSCE and one more write-assisted pass transistor (MNP) lead to larger area overhead comparing to previous SRAM designs. Nonetheless, proposed 8T SRAM bit-cell provides better read/write delay performance, write margin



Figure 5.12: Layout view of the proposed 8T SRAM bit-cell

|                                     | $6\mathrm{T}$ | 8T [5.6] | 10T [5.7] | This Work |
|-------------------------------------|---------------|----------|-----------|-----------|
| #WL                                 | 1             | 2        | 2         | 2         |
| #BL                                 | 2             | 3        | 2         | 2         |
| #VGND                               |               | 1        | 1         | 1         |
| RSCE                                |               | N        | Y         | Y         |
| $\mathbf{Sub}\text{-}V_t$ Operation | N             | Y        | Y         | Y         |
| Normalized Area                     | 0.8           | 1        | 1.61      | 1.55      |
| Half Select Disturbance             | Yas           | Y        | Ν         | Ν         |

Table 5.1: Comparison of various SRAM bit-cells

variation reduction, and less switching activity. A detail comparison table of various SRAM bit-cells is shown in TABLE 5.1.

Proposed 8T SRAM bit-cell is implemented in UMC 65nm SP 1P10M CMOS technology. Four metal layers are utilized in the SRAM layout, where *VDD*, *vvss*, and *GND* are routed in 2nd metal layer, bitline and column-based write-wordline are routed in 3rd metal layer, and rest word-line are routed in 4th metal layer.

# 5.2 Asynchronous 8T-SRAM-based FIFO Memory Design in 65nm CMOS

As shown in Fig. 5.13, the proposed asynchronous 8T-SRAM-based FIFO composes of read/write pointers, read/write control circuitries, adaptive power control system, and



Figure 5.13: Block diagram of proposed asynchronous 8T-SRAM-based FIFO

8T SRAM array discussed in previous section. The address of 8T-SRAM-based storage elements are auto-generated by read/write pointers. Asynchronous clock signals, CKR and CKW, are connected to the read/write control circuitries with clock gating.

## 5.2.1 Adaptive Power Control System

Leakage power minimization of 8T-SRAM-based FIFO is another critical issue. As discussed in Sec. 4.2.3.1, similar adaptive power control system is adopted. The proposed adaptive power control system (APCS) is shown in Fig. 5.14(a). The finite state machine (FSM) and the equivalent adaptive power control circuit are implemented to generate control signal, *power\_on*. Signals, *WWL* and *P*, are generated by write and read pointer respectively. The basic function of the adaptive control signal generation for each word can be described as follows.

if(CEN), power\_on=0; //cutoff mode elseif(WWL), power\_on=1; //active mode



Figure 5.14: (a) The adaptive power control system (b)  $i_{th}$  word of storage element

elseif(P), power\_on=0; //cutoff mode else power\_on=power\_on; //else

As shown in Fig. 5.14(b), the power gating circuit is inserted in each word of the FIFO memory. The leakage current of each word with invalid data can be minimized by proposed APCS. For proposed 8T-SRAM-based FIFO, APCS only has 9.7% overhead, but saves 73% reduction of total power.

## 5.2.2 Read/Write Pulse Control Circuit Design

#### 5.2.2.1 Read Pulse Control Circuit

The read-wordline (*RWL*) active time in read operation should be long enough for the sense amplifier sensing data reliably, but it should be turned off soon after the read operation is finished to reduce power consumption caused by the marginal compensation current. To adaptively adjust the pulse width of *RWL* according to current process, voltage, and temperature status, a replica column is designed for worst case scenario to provide proposed read pulse control circuit a trigger signal,  $R_{ok}$ . The replica column monitors the time period needed to discharge voltage of *RWL* to ground (all bit-cells are set to be "0"). In this way, the replica column can generate the longest *RWL* pulse width



Figure 5.15: The replica column for read operation and read pulse control circuit

required by sense amplifier accurately sensing readout data. All the 8T SRAM bit-cells in replica column are hardwired to "0" to ensure the worst case scenario happening. Also, an additional inverter delay line is inserted to delay  $R_{ok}$  output timing of the replica column to further guarantee enough pulse width margin of RWL for variation tolerance. Once the  $R_{ok}$  is received by write pulse control circuit, it disables RP. The waveform of the signals for read pulse control circuit is shown in Fig. 5.15. Note that the control circuit generates read window signal, RP, and read pointer signal, READ. They are used to control read pointer and adaptively turn off read-wordline,  $RWL_i$ .

#### 5.2.2.2 Write Pulse Control Circuit

The write-bitline is one of the largest capacitive parts of memory because it inevitably connects a large amount of bit-cells. Therefore, power dissipated in write-bitlines occupies about half of the SRAM active power consumption during write operation. To efficiently control the pulse width of *WWL* and ensure the selected word written reliably, a replica column is implemented for worst case scenario to provide proposed write pulse control



Figure 5.16: The replica column for write operation and write pulse control circuit

circuit a trigger signal,  $W_{ok}$ . The replica column and write pulse control circuit are shown in Fig. 5.16 along with the corresponding waveform. In a write operation, write pulse signal (WP) initially enables the selected write-wordline  $(WWL_i)$  of the SRAM array and the write-wordline  $(WWL_{rp})$  of the replica column for write operation. Meanwhile, WPalso turns on the enable signals of all write drivers including replica column. The data,  $D_{in}$ , are then written to the selected word. Simultaneously, "0" is written to the bit-cell of replica column where all cells are originally stored "1" since it caused the longest write delay scenario needed for us to monitor. Similarly, an additional inverter delay line is inserted to delay  $W_{ok}$  output timing of the replica column to further guarantee enough pulse width margin of WWL for variation tolerance. Once the  $W_{ok}$  is received by write pulse control circuit, it disables WP. Note that the write pulse control circuit generates write window signal, WP, and write pointer signal, WRITE. They are used to control write pointer and adaptively turn off write-wordline,  $WWL_i$ . Also, the control scheme of replica columns is identical to regular bit-cells in proposed adaptive power control system.

# 5.3 1Kbit Dynamic Voltage Frequency Scaling 8T-SRAM-based FIFO Memory in 65nm CMOS for DVFS Platform

Dynamic voltage frequency scaling (DVFS) is widely used as a strategy to manage switching power consumption in battery powered devices. DVFS is an approach to reduce energy consumption by adjusting the system supply voltage over a large range depending on the performance requirement. Low voltage modes are used to minimize power consumption associated with components such as CPUs, DSPs, and memories. Once in significant computational modes, the voltage is then raised. Some DVS systems and applications, including video coding and medical monitoring, are illustrated in [5.4]. Beside, a 64Kbit reconfigurable SRAM fabricated in 65nm low-power CMOS process operating from 250mV to 1.2V is proposed in [5.9]. For high reliability in ultra-low supply voltage and high efficient power delivery in micro-power system, a subthreshold microcontroller with integrated SRAM and power-efficient switched capacitor DC-DC converter is presented in [5.10].

The proposed DVFS platform is with two different operating modes: Low-Power Mode and Performance Mode because the well-known signals of the main characteristics of cardiac activity, e.g. heart rate and ECG, are at a very low rate. The DVFS FIFO operates in Low-Power Mode to record various physiological signals throughout its life time while in Performance Mode shortly to process and transmit real-time informative cardiovascular parameters to a host. This Low-Power Mode dominated scenario is capable of further reducing total energy consumption by applying DVFS technique, the benefit of which is attributed to the quadratic savings in active  $CVDD^2$  energy. In this work device in Low-Power Mode and Performance Mode will perform sub-threshold operation and near-threshold operation respectively.

Fig. 5.17 shows the system block diagram the proposed DVFS SRAM-based FIFO,



Figure 5.17: Block diagram of the proposed dynamic voltage frequency scaling 8T-SRAMbased FIFO as a demonstration DVFS platform.

where also shows the simple diagram of switch capacitance (SC) DC-DC converter, DVFS controller, supply switch, and sub/near-threshold programmable clock generator. For generating read/write clock signals with different frequency, the clock generator is employed to create output clock with frequency 1/8 4 times of the reference clock discussed in Sec. 3.2. The SC DC-DC converter, DVFS controller, and supply switch will be gone into details in the followed sections.

### 5.3.1 Switched Capacitor DC-DC Converter

To realize the full energy savings of subthreshold operation, a switched capacitor DC-DC converter supplying ultra-low voltages at high efficiencies is essential. Since the power consumption of the SRAM load circuits drops exponentially at subthreshold voltages, the DC-DC converter was designed to deliver a maximum of  $100\mu$ W of load power. This reduced load power demand makes switched capacitor DC-DC conversion an ideal choice for this application. DC-DC converters achieve dual voltage supply by offering a method to decrease voltage from the system voltage source such as battery thereby saving space instead of using multiple voltage sources to accomplish the same thing. DC-DC converter is composed of comparator, non-overlapping clock generator and switch matrix [5.11]. Multiple topologies make DC-DC converter to achieve scalable voltage generation while minimizing conduction loss. It can provide a variable output voltage by adjusting the ratio of charge transfer capacitors, according to the relationship of  $V_{ref}$  and  $V_L$  decides charging or discharging to charge transfer capacitors.



Figure 5.18: Switched capacitor DC-DC converter.

An on-chip switched capacitor DC-DC converter which can provide regular supply voltage is considered in this work. Fig. 5.18 shows a switched capacitor DC-DC converter which uses pulse frequency modulation (PFM) mode (comparator is clocked by the signal clk) to regulate the output voltage [5.12]. A PFM mode control is crucial to achieving high efficiency for the extremely low power system being built. When the output voltage  $(V_L)$  is above the reference voltage  $(V_{ref})$ , the switches are all set to the  $\Phi_1$  mode, where  $V_L$  will be discharged. When  $V_L$  falls below  $V_{ref}$ , the comparator triggers a  $\Phi_2$  pulse, which charges up the output load capacitor  $(C_L)$ . The non-overlapping clock generator block prevents any overlap between the active phases of  $\Phi_1$  and  $\Phi_2$ . Besides, the switched capacitor matrix block contains the charge transfer switches and the charge transfer capacitors.

## 5.3.2 Supply Switch and DVFS Controller

Switching between voltage supplies during run-time results in supply grid noise, possible shorting between supplies, and corruption of stored data. Therefore, the run-time supply switch is utilized to properly control the power gating of different voltage levels and converting time. Switching between power supplies is performed dynamically, which allows for reduced power consumption without a significant impact on performance and robustness.



Figure 5.19: DVFS controller and its timing diagram.

In this work, the supply switch controls the header switching to provide either 0.5V or 0.3V as vddf, as shown in Fig. 5.17. The voltage of 0.3V is provided by the SC DC-DC converter which is mentioned earlier, and 0.5V is provided by the  $V_{BAT}$ . A DVFS controller commands enable signals (CE, WE, and RE) and handshaking signal (level) to the FIFO and supply switch, respectively. Following a request for a voltage switch (where the signal MODE changes), correct operation of the FIFO is guaranteed by disabling read/write enable signals (WE=1 and RE=1) before the actual switching of voltage. After the FIFO finishes all operations, the supply switch begins to switch between power supplies. Stalling of FIFO prevents processor operation during the period when the voltage supply is not completely connected. The stored data in FIFO are preserved within the internal circuits. When charging or discharging is finished, a confirmation signal (busy) is transmitted back to the DVFS controller. Finally, the FIFO will be enabled. The finite

state machine (FSM) of DVFS controller and the timing diagram are shown in Fig. 5.19.

# Passivation Protection

## 5.3.3 Implementation and Simulation Results

Figure 5.20: Layout view and die photo of 1Kbit asynchronous DVFS 8T-SRAM-based FIFO.

The layout of the proposed FIFO is shown in Fig. 5.20. Tolerating -40°C to 125°C temperature variation and all process corners, 1Kbit asynchronous 8T-SRAM-based FIFO is implemented in UMC 65nm technology with 0.3V/0.5V supply voltage and 625kHz/20kHz read/write frequency shown in Table 5.2. With 0.5V and 0.3V supply voltage, the read/write/standby power consumptions are  $0.624\mu$ W /  $0.527\mu$ W /  $0.468\mu$ W and  $0.196\mu$ W /  $0.160\mu$ W /  $0.159\mu$ W, respectively.

$$P_{average} = 1\% P_{standby} + 93\% P_{write} + 5\% P_{simultaneous} + 1\% P_{read}$$
(5.1)

As the average power calculation equation shown in (5.1), the proposed design has only  $0.535\mu$ W average power consumption. Note that the read operation would fail at 0.3V supply voltage, SS corner below 10°C. Because of WSN baseband module only reads out the data accumulated in FIFO in the performance mode (vddf=0.5V). Accordingly, the proposed DVFS FIFO only processes write operation in the low-power mode (vddf=0.3V) and a read operation is forbidden in this mode. In most of the time, the system stays in low-power mode, thus saving energy consumption.

| Supply Voltage        | 0.5V/0.3V (read/write)                                             |  |
|-----------------------|--------------------------------------------------------------------|--|
| Process               | UMC65nm                                                            |  |
| Active Area           | $0.95 \times 0.85 mm^2$                                            |  |
| Operation Frequency   | 625kHz/20kHz (read/write)                                          |  |
| Operation Temperature | $-40^{\circ}\text{C}-125^{\circ}\text{C}$                          |  |
| Power at 0.5V         | $0.624 \mu W/0.527 \mu W/0.468 \mu W \text{ (read/write/standby)}$ |  |
| Power at 0.3V         | $0.196 \mu W/0.160 \mu W/0.159 \mu W \text{ (read/write/standby)}$ |  |
| Average Power         | $0.535 \mu W/0.163 \mu W \text{ (read/write)}$                     |  |
| Leakage Current       | $0.935\mu A/0.499\mu A \text{ (read/write)}$                       |  |

Table 5.2: Specifications of 1Kbit asynchronous DVFS 8T-SRAM-based FIFO

## 5.3.4 Energy Consumption Analysis

Adjusting the supply voltage involves charging and discharging the large capacitance of FIFO. This raises two important considerations: 1) energy overhead associated with changing the supply voltage level should at least be compensated by the energy savings in the low-power mode and 2) the FIFO read/write operation should stall until the voltage transients dampens to an acceptable range [5.9]. The overhead energy in the former consideration comes from discharged power of switch capacitance DC-DC converter and the power of supply switch control circuit. The overhead energy also associated with charging and discharging of capacitances in the supply node of FIFO.

To calculate the energy overhead of the FIFO during charging and discharge, the amount of capacitances connected to supply node (VVDD) of the FIFO needs to be determined. The capacitances include  $C_{gd}$  of the power gating pMOS transistors,  $C_{body}$ at NWELLs since the bodies of pMOS transistors of the FIFO are all connected to vddf, wiring capacitance, and decoupling capacitors. Energy consumption can be formed as (5.2).

$$E = E_{active} + E_{leakage} + E_{short-circuit} + E_{other}$$
(5.2)

where  $E_{active} = \alpha C_L V_{DD}^2$ ,  $E_{leakage} = \int V_{DD} I_{leakage} dt$ , and  $E_{short-circuit} = t_{sc} V_{DD} I_{peak}$ . The main benefit of voltage scaling is attributed to the quadratic savings in active energy,  $E_{active}$ . Furthermore, leakage energy is also reduced at low voltages not only because of

smaller potential difference across devices but also due to second order effects determining device  $V_t$  such as drain-induced barrier lowering (DIBL).

The simulation results of proposed 1Kbit DVFS 8T-SRAM-based FIFO show the charging energy and discharging energy are 16.150pJ and 1.951pJ respectively. The average power consumption for the proposed 1Kbit 8T-SRAM-based FIFO at 0.5V and 0.3V are  $0.535\mu$ W and  $0.163\mu$ W respectively. Assuming that  $T_L$  represents the period in *Low-Power Mode* and  $T_P$  represents the period in *Performance Mode*, N represents the converting times, the energy of proposed DVFS 1Kbit 8T-SRAM-based FIFO can be expressed as

$$E_{w/DVFS} = T_P \times 0.535\mu W + T_L \times 0.163\mu W + N \times (16.15pJ + 1.951pJ)$$
(5.3)

The energy of the 1Kbit 8T-SRAM-based FIFO without DVFS can be expressed as

$$E_{wo/DVFS} = (T_P + T_L) \times 0.535 \mu W$$
(5.4)

Solving the following inequality, (5.3); (5.4):

$$T_P \times 0.535 \mu W + T_L \times 0.163 \mu W + N \times (16.15 pJ + 1.951 pJ) < (T_P + T_L) \times 0.535 \mu W$$
 (5.5)

$$T_{L} > 48.659 \mu Sec$$
 (5.6)

The inequality (5.6) means that if the FIFO stays in the low-power mode longer than  $48.659\mu$ s (N=1), it is more advantageous to apply DVFS. The scenario is shown in Fig. 5.21. For WBAN healthcare applications, the period of sensor nodes in low-power mode is usually no less than  $200\mu$ s.

## 5.4 Summary

A robust asynchronous 8T-SRAM-based FIFO memory with adaptive power control system is presented in this chapter. A novel 8T SRAM bit-cell is utilized as the storage element to improve write margin and to reduce write variation. The proposed 8T bitcell is more suitable for WBAN applications because of its lower area overhead. RSCE is utilized to decrease  $V_t$  and to rise  $I_{on}$ - $I_{off}$ -ratio. For improving read ability, RSCE hierarchical read-bit-lines are designed. With the adaptive power control system and



Figure 5.21: Energy consumption comparisons of 1Kbit 8T-SRAM-based FIFO with DVFS and without DVFS.

complementary power gating technique, along with clock-gated shift register based logic pointers, leakage power in the FIFO memory array is minimized. Dynamic voltage scaling is a well-known technique to reduce energy consumption under time-varying performance constrained scenarios.

In order to demonstrate a benefit of dynamic voltage frequency scaling system, we convert two operating supply voltages, 500mV and 300mV, for high performance and low power modes. In ultra-low voltage operation, a robust dynamic voltage scalable SRAM-based FIFO memory with adaptive power control is presented. The simulation shows that if the FIFO stays in subthreshold mode longer than  $48.66\mu$ s, it is more advantageous to apply DVFS platform. In conclusion, the proposed DVFS FIFO is suitable for WBAN applications.

## Chapter 6

## **Conclusions and Future Works**

Energy efficient and ultra-low voltage circuits is a key focus in emerging design trend. In this thesis, a fully integrated area-efficient and ultra-low power temperature sensor is presented. It can provide vital environmental data to overcome PVT variations and enhance energy efficiency. With our sensor, a locking range compensation technique of near-/sub-threshold regions is applied for our DLL-based clock generator. A prototype test chip of ultra-low voltage FIFO memory in dynamic voltage frequency scaling (DVFS) platform is presented to combine the benefits of proposed energy efficient designs with a practical implementation of power management controller for energy efficient chips.

The previous works on energy efficient techniques and wireless body area sensor networks (WBANs) applications are introduced in Chapter 2. A frequency-domain temperature sensor is presented in Chapter 3 to enable on-chip temperature measurement. The sensor was designed to achieve ultra-low voltage operation with reasonable process variation immunity. A test chip had been fabricated in TSMC general purpose 65nm CMOS technology meets the target to be capable of 0.4V supply voltage operation over the temperature range of 0°C to 100°C. The power consumption per conversion rate is 11.6pW/samples/sec, which is a hundredfold improvement over previous work [6.1, 6.2]. A programmable clock generator was proposed for a near-/sub-threshold dynamic voltage and frequency scaling system. With the proposed PVT compensation technique, the clock generator could be prevented from the PVT variations under the ultra-low voltage operations from 0.2V to 0.5V. The proposed clock generator has been implemented in UMC 65nm CMOS technology. The measurement results report the corresponding power consumptions are 5.17 $\mu$ W at 0.5V, 20MHz and 0.18 $\mu$ W at 0.2V, 625kHz, respectively. It is suitable to be the clock source for emerging ultra-low voltage energy-constrained applications.

In Chapter 4, a 9T bit-cell is proposed to enhance write ability by cutting off the positive feedback loop of SRAM cross-coupled inverter pair. In read mode, an access buffer is designed to isolate storage node from read path for better read robustness and leakage reduction. Bit-interleaving scheme is allowed by incorporating the proposed 9T SRAM bit-cell with additional write-wordlines (WWL/WWLb) for soft error tolerance. A 1Kbit 9T 4-to-1 bit-interleaved SRAM is implemented in 65nm bulk CMOS technology. The experimental results demonstrate that the test chip minimum energy point occurs at 0.3V supply voltage. It can achieve an operation frequency of 909kHz with  $3.51\mu W$ active power consumption. Meanwhile, an ultra-low power (ULP) 16Kb SRAM-based first-in first-out (FIFO) memory is proposed for wireless body area networks (WBANs). The proposed FIFO memory is capable of operating in ultra-low voltage (ULV) regime with high variation immunity. An ULP near-/sub-threshold 10 transistors (10T) SRAM bit-cell is proposed to be the storage element for improving write variation in ULV regime and eliminate the data-dependent bit-line leakage. The proposed SRAM-based FIFO memory also features adaptive power control circuit, counter-based pointers, and a smart replica read/write control unit. The proposed FIFO is implemented to achieve a minimum operating voltage of 400mV in UMC 90nm CMOS technology. The write power is  $2.09\mu$ W at 50kHz and the read power is  $2.25\mu$ W at 625kHz.

An ultra-low voltage asynchronous first-in first-out (FIFO) memory is proposed for wireless body area networks (WBANs) in Chapter 5. For the ultra-low power consideration, a novel ultra-low power subthreshold 8-transistor (8T) SRAM cell is presented, which improves write margin and reduces write variation in subthreshold regime. Reverse short-channel effect (RSCE) is utilized in read-buffer and write access transistor to improve read/write ability. In addition, an adaptive write-word-line window control scheme is proposed for lower write power and process-voltage-temperature (PVT) tracking. A 1kb dynamic voltage scaling 8T SRAM-based FIFO memory is implemented to operate between 0.5V (near-threshold) and 0.3V (subthreshold) in UMC 65nm technology with  $0.535\mu$ W at 625kHz and  $0.163\mu$ W at 20kHz power consumption, respectively. The proposed DVS FIFO memory can provide up to 69.5% power savings when low-power mode is always engaged, and there is no power overhead if the period of low-power mode is longer than 48.66 $\mu$ s. It is suitable for healthcare applications equipped with DVFS capability.



Figure 6.1: Proposed power management system architecture.

In this work, only fixed dual voltage supply scheme is chosen because it is enough for energy efficient WBANs applications. However, dynamic multiple voltage supply scheme is preferred for minimum energy point tracking. In order to provide multiple voltage sources for DVFS energy efficient chips, we presented a power management system for solar energy harvesting applications in [6.3]. It receives power from photovoltaic (PV) cell and generate different voltage levels, they are 1V-0.3V for analog circuitry and low power digital circuitry, -1.2V for super-cutoff technique in memory circuitry, and 10V for FLASH memory or I/O components. Among the proposed power management system, high power efficient switched capacitor (SC) DC-DC converter and voltage regulator are two key components. Fully digital controlled voltage regulator was first presented in [6.4]. It has high current efficient of 99.8% with only 164.5 $\mu$ A quiescent current. However, the accuracy of the digital error detector in the proposed voltage regulator is heavily affected by PVT variations. Therefore, variation-aware voltage regulator and SC DC-DC converter require more research efforts for DVFS platform.

Another interesting research topic is utilizing the proposed temperature sensor in Chapter 3 for our proposed DVFS platform shown in Fig. 6.2 and TSV 3D-IC package



Figure 6.2: PVT-aware ultra-low voltage DVFS FIFO system.



Figure 6.3: PVT sensors for 3D-IC package technology.

technology shown in Fig. 6.3. Our high area-/energy-efficient temperature sensor can be deployed several hundred sensors within every layers of 3D-IC chips. With rich temperature information, an advanced dynamic temperature management unit is an interesting research topic to provide a smart solution for hot-spot issue.

# Bibliography for Chapter 1

- [1.1] A. P. Chandrakasan, D. C. Daly, J. Kwong, and Y. K. Ramadass, "Next generation micropower systems," in *IEEE Symp. on VLSI Circuits*, Jun. 2008, pp. 2–5.
- [1.2] G. Gammie, A. Wang, H. Mair, R. Lagerquist, M. Chau, P. Royannez, S. Gururajarao, and U. Ko, "SmartReflex power and performance management technologies for 90 nm, 65 nm, and 45 nm mobile application processors," *Proc. IEEE*, vol. 98, no. 2, pp. 144–159, Feb. 2010.



# Bibliography for Chapter 2

- [2.1] R. Jorgenson, L. Sorensen, D. Leet, M. Hagedorn, D. Lamb, T. Friddell, and W. Snapp, "Ultralow-power operation in subthreshold regimes applying clockless logic," *Proc. IEEE*, vol. 98, no. 2, pp. 299–314, Feb. 2010.
- [2.2] G. Chen, S. hanson, D. Blaauw, and D. Sylvester, "Circuit design advances for wireless sensing applications," *Proc. IEEE*, vol. 98, no. 11, pp. 1808–1827, Nov. 2010.
- [2.3] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, "Exploring variability and performance in a sub-200-mV processor," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 881–891, Apr. 2008.
- [2.4] M. A. Hanson, H. C. Powell, A. T. Barth, K. Ringgenberg, B. H. Calhoun, J. H. Aylor, and J. Lach, "Body area sensor networks: Challenges and opportunities," *Proc. IEEE*, vol. 42, no. 1, pp. 58–65, Jan. 2009.
- [2.5] M. Horowitz, D. Stark, and E. Alon, "Digital circuit design trends," IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 757–761, Apr. 2008.
- [2.6] A. Wang and A. P. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, Jan. 2005.
- [2.7] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1778–1786, Sep. 2005.
- [2.8] J. T. Kao and A. P. Chandrakasan, "Dual-threshold voltage techniques for low-power digital circuits," *IEEE J. Solid-State Circuits*, vol. 35, no. 7, pp. 1009–1018, Jul. 2000.
- [2.9] N. Verma, J. Kwong, and A. P. Chandrakasan, "Nanometer MOSFET variation in minimum energy subthreshold circuits," *IEEE Trans. Electron Devices*, vol. 55, no. 1, pp. 163–174, Jan. 2008.
- [2.10] S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K. K. Das, W. Haensch, E. J. Nowak, and D. M. Sylvester, "Ultralow-voltage, minimum-energy CMOS," *IBM Journal of Research and Development*, vol. 50, no. 4/5, pp. 469–490, Jul. 2006.
- [2.11] S. Ghosh and K. Roy, "Parameter variation tolerance and error resiliency: New design paradigm for the nanoscale era," *Proc. IEEE*, vol. 98, no. 10, pp. 1718–1751, Oct. 2010.
- [2.12] H. Yamauchi, "A discussion on SRAM circuit design trend in deeper nanometer-scale technologies," *IEEE Trans. VLSI Syst.*, vol. 18, no. 5, pp. 763–774, May 2010.
- [2.13] T. Burd, T. Pering, A. Stratakos, and R. Brodersen, "A dynamic voltage scaled microprocessor system," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, Nov.

2000.

- [2.14] A. P. Chandrakasan, D. C. Daly, D. F. Finchelstein, J. Kwong, Y. K. Ramadass, M. E. Sinangil, V. Sze, and N. Verma, "Technologies for ultradynamic voltage scaling," *Proc. IEEE*, vol. 98, no. 2, pp. 191–214, Feb. 2010.
- [2.15] B. H. Calhoun and A. P. Chandrakasan, "Ultra-dynamic voltage scaling (UDVS) using sub-threshold operation and local voltage dithering," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 238–245, Jan. 2006.
- [2.16] Body Area Networks (BAN). IEEE 802.15 WPAN Task Group 6. [Online]. Available: http://www.ieee802.org/15/pub/TG6.html/
- [2.17] L. L. Lewyn, T. Ytterdal, C. Wulff, and K. Martin, "Analog circuit design in nanoscale CMOS technologies," Proc. IEEE, vol. 97, no. 10, pp. 1687–1714, Oct. 2009.
- [2.18] S. A. Vitale, P. W. Wyatt, N. Checka, J. Kedzierski, and C. L. Keast, "FDSOI process technology for subthreshold-operation ultralow-power electronics," *Proc. IEEE*, vol. 98, no. 2, pp. 333–342, Feb. 2010.
- [2.19] Y. Kikuchi, M. Takahashi, T. Maeda, M. Fukuda, Y. Koshio, H. Hara, H. Arakida, H. Yamamoto, Y. Hagiwara, T. Fujita, M. Watanabe, H. Ezawa, T. Shimazawa, Y. Ohara, T. Miyamori, M. Hamada, M. Takahashi, and Y. Oowaki, "A 40 nm 222 mW H.264 full-HD decoding, 25 power domains, 14-core application processor with x512b stacked DRAM," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 32–41, Jan. 2011.
- [2.20] V. F. Pavlidis and E. G. Friedman, "Interconnect-based design methodologies for threedimensional integrated circuits," *Proc. IEEE*, vol. 98, no. 1, pp. 123–140, Jan. 2009.
- [2.21] A. W. Topol, D. C. L. Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini, and M. Ieong, "Three-dimensional integrated circuits," *IBM Journal of Research and Development*, vol. 50, no. 4/5, pp. 491–506, Jul./Sep. 2006.
- [2.22] G. Gammie, A. Wang, H. Mair, R. Lagerquist, M. Chau, P. Royannez, S. Gururajarao, and U. Ko, "SmartReflex power and performance management technologies for 90 nm, 65 nm, and 45 nm mobile application processors," *Proc. IEEE*, vol. 98, no. 2, pp. 144–159, Feb. 2010.
- [2.23] A. P. Chandrakasan, D. C. Daly, J. Kwong, and Y. K. Ramadass, "Next generation micro-power systems," in *IEEE Symp. on VLSI Circuits*, Jun. 2008, pp. 2–5.
- [2.24] D. Harris, R. F. Sproull, and I. E. Sutherland, Logical effort: designing fast CMOS circuits. San Francisco, CA: Morgan Kaufmann, 1999.
- [2.25] J. Keane, H. Eom, T.-H. Kim, S. Sapatnekar, and C. Kim, "Stack sizing for optimal current drivability in subthreshold circuits," *IEEE Trans. VLSI Syst.*, vol. 16, no. 5, pp. 598–602, May 2008.
- [2.26] M.-H. Chang, C.-Y. Hsieh, M.-W. Chen, and W. Hwang, "Logical effort models with voltage and temperature extension in super-/near-/sub-threshold regions," in *Int'l Symp.* on VLSI Design, Automation, and Test, Apr. 2011, pp. 213–216.
- [2.27] B. H. Calhoun, J. F. Ryan, S. Khanna, M. Putic, and J. Lach, "Flexible circuits and architectures for ultralow power," *Proc. IEEE*, vol. 98, no. 2, pp. 267–282, Feb. 2010.
- [2.28] F. Fallah and M. Pedram, "Standby and active leakage current control and minimization

in CMOS VLSI circuits," *IEICE Trans. on Electronics*, vol. E88-C, no. 4, pp. 509–519, 2005.

- [2.29] Y. Chen, H. Li, K. Roy, and C.-K. Koh, "Gated decap: Gate leakage control of on-chip decoupling capacitors in scaled technologies," *IEEE Trans. VLSI Syst.*, vol. 17, no. 12, pp. 1749–1752, Dec. 2009.
- [2.30] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," *Proc. IEEE*, vol. 91, no. 2, pp. 305–327, Feb. 2003.
- [2.31] S. Hanson, M. Seok, Y.-S. Lin, Z. Y. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "A low-voltage processor for sensing applications with picowatt standby mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1145–1155, Apr. 2009.
- [2.32] J. Gu, H. Eom, J. Keane, and C. H. Kim, "Sleep transistor sizing and adaptive control for supply noise minimization considering resonance," *IEEE Trans. VLSI Syst.*, vol. 17, no. 9, pp. 1203–1211, Sep. 2009.
- [2.33] D.-S. Chiou, S.-H. Chen, and S.-C. Chang, "Sleep transistor sizing for leakage power minimization considering charge balancing," *IEEE Trans. VLSI Syst.*, vol. 17, no. 9, pp. 1330–1334, Sep. 2009.
- [2.34] A. Abdollahi, F. Fallah, and M. Pedram, "A robust power gating structure and power mode transition strategy for MTCMOS design," *IEEE Trans. VLSI Syst.*, vol. 15, no. 1, pp. 80–89, Jan. 2007.
- [2.35] S. Kim, S. V. Kosonocky, D. R. Knebel, K. Stawiasz, and M. C. Papaefthymiou, "A multi-mode power gating structure for low-voltage deep-submicron CMOS ICs," *IEEE Trans. Circuits Syst. II*, vol. 54, no. 7, pp. 586–590, Jul. 2007.
- [2.36] Y. Shin, S. Heo, H.-O. Kim, and J. Y. Choi, "Supply switching with ground collapse: simultaneous control of subthreshold and gate leakage current in nanometer-scale CMOS circuits," *IEEE Trans. VLSI Syst.*, vol. 15, no. 7, pp. 758–766, Jul. 2007.
- [2.37] M. Khellah, D. Somasekhar, Y. Ye, N. S. Kim, J. Howard, G. Ruhl, M. Sunna, J. Tschanz, N. Borkar, F. Hamzaoglu, G. Pandya, A. Farhang, K. Zhang, and V. De, "A 256-Kb dual-V<sub>CC</sub> SRAM building block in 65-nm CMOS process with actively clamped sleep transistor," *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 233–242, Jan. 2007.
- [2.38] K. K. Kim, H. Nan, and K. Choi, "Ultralow-voltage power gating structure using low threshold voltage," *IEEE Trans. Circuits Syst. II*, vol. 56, no. 12, pp. 926–930, Dec. 2009.
- [2.39] B. H. Calhoun, F. A. Honore, and A. P. Chandrakasan, "A leakage reduction methodology for distributed MTCMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 818–826, May 2004.
- [2.40] K. K. Kim and Y.-B. Kim, "A novel adaptive design methodology for minimum leakage power considering PVT variations on nanoscale VLSI systems," *IEEE Trans. VLSI Syst.*, vol. 17, no. 4, pp. 517–528, Apr. 2009.
- [2.41] A. Agarwal, S. Mukhopadhyay, A. Raychowdhury, K. Roy, and C. H. Kim, "Leakage power analysis and reduction for nanoscale circuits," *IEEE Micro*, vol. 26, no. 2, pp. 68–80, Mar.-Apr. 2006.
- [2.42] J. W. Tschanz, S. Narendra, R. Nair, and V. De, "Effectiveness of adaptive supply volt-

age and body bias for reducing impact of parameter variations in low power and high performance microprocessors," *IEEE J. Solid-State Circuits*, vol. 38, no. 5, pp. 826–829, May 2003.

- [2.43] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and V. De, "Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1396–1402, Nov. 2002.
- [2.44] T. Inukai, T. Hiramoto, and T. Sakurai, "Variable threshold voltage CMOS (VTCMOS) in series connected circuits," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Aug. 2001, pp. 201–206.
- [2.45] H. Soeleman, K. Roy, and B. C. Paul, "Robust subthreshold logic for ultra-low power operation," *IEEE Trans. VLSI Syst.*, vol. 9, no. 1, pp. 90–99, Feb. 2001.
- [2.46] D. Markovic, C. Wang, L. Alarcon, T.-T. Liu, and J. Rabaey, "Ultralow-power design in near-threshold region," *Proc. IEEE*, vol. 98, no. 2, pp. 237–252, Feb. 2010.
- [2.47] R. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-threshold computing: Reclaiming Moore's law through energy efficient integrated circuits," *Proc. IEEE*, vol. 98, no. 2, pp. 253–266, Feb. 2010.
- [2.48] N. Verma, "Analysis towards minimization of total SRAM energy over active and idle operating modes," *IEEE Trans. VLSI Syst.*, vol. 19, no. 9, pp. 1695–1703, Sep. 2011.
- [2.49] B. H. Calhoun and A. P. Chandrakasan, "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007.
- [2.50] M.-H. Chang, Y.-T. Chiu, S.-L. Lai, and W. Hwang, "A 1kb 9T subthreshold SRAM with bit-interleaving scheme in 65nm CMOS," in *IEEE Int'l Symp. on Low Power Electronics* and Design, Aug. 2011, pp. 291–296.
- [2.51] J. P. Kulkarni and K. Roy, "Ultralow-voltage process-variation-tolerant schmitt-triggerbased SRAM design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no. 2, pp. 319–332, Feb. 2012.
- [2.52] S. Mukhopadhyay, R. M. Rao, and J.-J. K. C.-T. Chuang, "SRAM write-ability improvement with transient negative bit-line voltage," *IEEE Trans. VLSI Syst.*, vol. 19, no. 1, pp. 24–32, Jan. 2011.
- [2.53] A.-T. Do, J. Low, J. Low, Z.-H. Kong, X. Tan, and K.-S. Yeo, "An 8T differential SRAM with improved noise margin for bit-interleaving in 65 nm CMOS," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 6, pp. 1252–1263, Jun. 2011.
- [2.54] P. Kolar, E. Karl, U. Bhattacharya, F. Hamzaoglu, H. Nho, Y.-G. Ng, Y. Wang, and K. Zhang, "A 32 nm high-k metal gate SRAM with adaptive dynamic stability enhancement for low-voltage operation," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 76–84, Jan. 2011.
- [2.55] P.-T. Huang and W. Hwang, "A 65 nm 0.165 fJ/Bit/Search 256×144 TCAM macro design for IPv6 lookup tables," *IEEE J. Solid-State Circuits*, vol. 46, no. 2, pp. 507–519, Feb. 2011.
- [2.56] M. Qazi, M. E. Sinangil, and A. P. Chandrakasan, "Challenges and directions for low-

voltage SRAM," IEEE Des. Test. Comput., vol. 28, no. 1, pp. 32–43, Jan.-Feb. 2011.

- [2.57] S. Nalam and B. H. Calhoun, "5T SRAM with asymmetric sizing for improved read stability," *IEEE J. Solid-State Circuits*, vol. 46, no. 10, pp. 2431–2442, Oct. 2011.
- [2.58] S.-C. Luo and L.-Y. Chiou, "A sub-200-mv voltage-scalable SRAM with tolerance of access failure by self-activated bitline sensing," *IEEE Trans. Circuits Syst. II*, vol. 57, no. 6, pp. 440–445, Jun. 2010.
- [2.59] M.-H. Tu, J.-Y. Lin, M.-C. Tsai, S.-J. Jou, and C.-T. Chuang, "Single-ended subthreshold SRAM with asymmetrical write/read-assist," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 12, pp. 3039–3047, Dec. 2010.
- [2.60] Y. Wang, U. Bhattacharya, F. Hamzaoglu, P. Kolar, Y.-G. Ng, L. Wei, Y. Zhang, K. Zhang, and M. Bohr, "A 4.0 GHz 291 Mb voltage-scalable SRAM design in a 32 nm high-k + metal-gate CMOS technology with integrated power management," *IEEE J. Solid-State Circuits*, vol. 45, no. 1, pp. 103–110, Jan. 2010.
- [2.61] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, "A reconfigurable 8T ultra-dynamic voltage scalable (U-DVS) sram in 65nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3163–3173, Nov. 2009.
- [2.62] M. Sharifkhani and M. Sachdev, "An energy efficient 40 Kb SRAM module with extended read/write noise margin in 0.13 µm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 620–630, Feb. 2009.
- [2.63] K. Nii, Y. Tsukamoto, M. Yabuuchi, Y. Masuda, S. Imaoka, K. Usui, S. Ohbayashi, H. Makino, and H. . Shinohara, "Synchronous ultra-high-density 2RW dual-port 8T-SRAM with circumvention of simultaneous common-row-access," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 977–986, Mar. 2009.
- [2.64] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65nm sub-v<sub>t</sub> microcontroller with integrated SRAM and switched capacitor DC-DC converter," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 115–126, Jan. 2009.
- [2.65] T.-H. Kim, J. Liu, and C. H. Kim, "A voltage scalable 0.26v, 64kb 8T SRAM with V<sub>min</sub> lowering techniques and deep sleep mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1785–1795, Jun. 2009.
- [2.66] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, "A 32kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 650–658, Feb. 2009.
- [2.67] N. Verma and A. P. Chandrakasan, "A 256kb 65nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [2.68] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2V, 480kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008.
- [2.69] F. shi Lai and C.-F. Lee, "On-chip voltage down converter to improve SRAM read/write margin and static power for sub-nano CMOS technology," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 2061–2070, Sep. 2007.
- [2.70] J. P. Kulkarni, K. Kim, and K. Roy, "A 160 mV robust Schmitt trigger based subthreshold

SRAM," IEEE J. Solid-State Circuits, vol. 42, no. 10, pp. 2303–2313, Oct. 2007.

- [2.71] S. Cosemans, W. Dehaene, and F. Catthoor, "A low-power embedded SRAM for wireless applications," *IEEE J. Solid-State Circuits*, vol. 42, no. 7, pp. 1607–1617, Jul. 2007.
- [2.72] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, "A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 113–121, Jan. 2006.
- [2.73] J. Chen, L. T. Clark, and T.-H. Chen, "An ultra-low-power memory with a subthreshold power supply voltage," *IEEE J. Solid-State Circuits*, vol. 41, no. 10, pp. 2344–2353, Oct. 2006.
- [2.74] N. S. Kim, K. Flautner, D. Blaauw, and T. Mudge, "Circuit and microarchitectural techniques for reducing cache leakage power," *IEEE Trans. VLSI Syst.*, vol. 12, no. 2, pp. 167–184, Feb. 2004.
- [2.75] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann, and A. Chandrakasan, "A 65nm sub-v<sub>t</sub> microcontroller with integrated SRAM and switched capacitor DC-DC converter," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2008, pp. 318–319, 616.
- [2.76] Y.-M. Chang, M.-H. Chang, and W. Hwang, "A 2.1-μW 0.3V-1.0V wide-locking range multiphase DLL using self-estimated SAR algorithm," in *IEEE Int'l System-on-Chip Conf.*, Sep. 2009, pp. 115–118.
- [2.77] D. P. Wang, H. J. Liao, H. Yamauchi, Y. H. Chen, Y. L. Lin, S. H. Lin, D. C. Liu, H. C. Chang, and W. Hwang, "A 45nm dual-port SRAM with write and read capability enhancement at low voltage," in *IEEE Int'l System-on-Chip Conf.*, Sep. 2007, pp. 211–214.
- [2.78] D. Bull, S. Das, K. Shivashankar, G. Dasika, K. Flautner, and D. Blaauw, "A powerefficient 32 bit ARM processor using timing-error detection and correction for transienterror tolerance and adaptation to PVT variation," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 18–31, Jan. 2011.
- [2.79] M. H. Abu-Rahma and M. Anis, "A statistical design-oriented delay variation model accounting for within-die variations," *IEEE Trans. Comput.-Aided Design Integr. Circuits* Syst., vol. 27, no. 11, pp. 1983–1995, Nov. 2008.
- [2.80] S. Mukhopadhyay, K. Kim, K. A. Jenkins, C.-T. Chuang, and K. Roy, "An on-chip test structure and digital measurement method for statistical characterization of local random variability in a process," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 1951–1963, Sep. 2008.
- [2.81] J. Keane, T.-H. Kim, and C. H. Kim, "An on-chip NBTI sensor for measuring pMOS threshold voltage degradation," *IEEE Trans. VLSI Syst.*, vol. 18, no. 6, pp. 947–956, Jun. 2010.
- [2.82] K. K. Kim, W. Wang, and K. Choi, "On-chip aging sensor circuits for reliable nanometer MOSFET digital circuits," *IEEE Trans. Circuits Syst. II*, vol. 57, no. 10, pp. 798–802, Oct. 2010.
- [2.83] T.-H. Kim, R. Persaud, and C. H. Kim, "Silicon odometer: An on-chip reliability monitor for measuring frequency degradation of digital circuits," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 874–880, Apr. 2008.
- [2.84] N. Drego, A. Chandrakasan, and D. Boning, "All-digital circuits for measurement of

spatial variation in digital circuits," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 640–651, Mar. 2010.

- [2.85] R. Rao, K. A. Jenkins, and J.-J. Kim, "A local random variability detector with complete digital on-chip measurement circuitry," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, pp. 2616–2623, Sep. 2009.
- [2.86] Y. Ogasahara, M. Hashimoto, and T. Onoye, "All-digital ring-oscillator-based macro for sensing dynamic supply noise waveform," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1745–1755, Jun. 2009.
- [2.87] S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. T. Blaauw, "RazorII: In situ error detection and correction for PVT and SER tolerance," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 32–48, Jan. 2009.
- [2.88] S.-C. Lin and K. Banerjee, "A design-specific and thermally-aware methodology for trading-off power and performance in leakage-dominant CMOS technologies," *IEEE Trans. VLSI Syst.*, vol. 16, no. 11, pp. 1488–1498, Nov. 2008.
- [2.89] M. Elgebaly and M. Sachdev, "Variation-aware adaptive voltage scaling system," IEEE Trans. VLSI Syst., vol. 15, no. 5, pp. 560–571, May 2007.
- [2.90] N. Azizi, M. M. Khellah, V. K. De, and F. N. Najm, "Variations-aware low-power design and block clustering with voltage scaling," *IEEE Trans. VLSI Syst.*, vol. 15, no. 7, pp. 746–757, Jul. 2007.
- [2.91] H. F. Hamann, A. Weger, J. A. Lacey, Z. Hu, P. Bose, E. Cohen, and J. Wakil, "Hotspotlimited microprocessors: Direct temperature and power distribution measurements," *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 56–65, Jan. 2007.
- [2.92] T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella, "A 90-nm variable frequency clock system for a power-managed itanium architecture processor," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 218–228, Jan. 2006.
- [2.93] H. Mostafa, M. H. Anis, and M. Elmasry, "Analytical soft error models accounting for die-to-die and within-die variations in sub-threshold SRAM cells," *IEEE Trans. VLSI Syst.*, vol. 19, no. 2, pp. 182–195, Feb. 2011.
- [2.94] I. J. Chang, S.-P. Park, and K. Roy, "Exploring asynchronous design techniques for process-tolerant and energy-efficient subthreshold operation," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 401–410, Feb. 2010.
- [2.95] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, "Nanometer device scaling in subthreshold logic and SRAM," *IEEE Trans. Electron Devices*, vol. 55, no. 1, pp. 175–185, Jan. 2008.
- [2.96] P. Chen, S.-C. Chen, Y.-S. Shen, and Y.-J. Peng, "All-digital time-domain smart temperature snesor with an inter-batch inaccuracy of -0.7°C-+0.6°C after one-point calibration," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 5, pp. 913–920, May 2011.
- [2.97] C.-C. Chung and C.-R. Yang, "An autocalibrated all-digital temperature sensor for onchip thermal monitoring," *IEEE Trans. Circuits Syst. II*, vol. 58, no. 2, pp. 105–109, Feb. 2011.
- [2.98] K. Souri and K. Makinwa, "A 0.12 mm<sup>2</sup> 7.4  $\mu$ W micropower temperature sensor with an inaccuracy of  $\pm 0.2^{\circ}$ C (3 $\sigma$ ) from  $-30^{\circ}$ C to 125 °C," *IEEE J. Solid-State Circuits*, vol. 46,

no. 7, pp. 1693–1700, Jul. 2011.

- [2.99] P. Chen, C.-C. Chen, Y.-H. Peng, K.-M. Wang, and Y.-S. Wang, "A time-domain SAR smart temperature sensor with curvature compensation and a 3σ inaccuracy of -0.4°C~+0.6°C over a 0°C to 90°C range," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 600–609, Mar. 2010.
- [2.100] E. Kursun and C.-Y. Cher, "Temperature variation characterization and thermal management of multicore architectures," *IEEE Micro*, vol. 29, no. 1, pp. 116–126, Jan.-Feb. 2009.
- [2.101] M. K. Law and A. Bermak, "A 405-nW CMOS temperature sensor based on linear MOS operation," *IEEE Trans. Circuits Syst. II*, vol. 56, no. 12, pp. 891–895, Dec. 2009.
- [2.102] M. Sasaki, M. Ikeda, and K. Asada, "A temperature sensor with an inaccuracy of -1/+0.8 °C using 90-nm 1-V CMOS for online thermal monitoring of VLSI circuits," *IEEE Trans. Semicond. Manuf.*, vol. 21, no. 2, pp. 201–208, May 2008.
- [2.103] E. Socher, S. M. Beer, and Y. Nemirovsky, "Temperature sensitivity of SOI-CMOS transistors for use in uncooled thermal sensing," *IEEE Trans. Electron Devices*, vol. 52, no. 12, pp. 2784–2790, Dec. 2005.
- [2.104] P. Chen, C.-C. Chen, C.-C. Tsai, and W.-F. Lu, "A time-to-digital-converter-based CMOS smart temperature sensor," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1642– 1648, Aug. 2005.
- [2.105] M. A. P. Pertijs, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS smart temperature sensor with a 3σ inaccuracy of ±0.1°C from -55°C to 125°C," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2805–2815, Dec. 2005.
- [2.106] I. M. Filanovsky and A. Allam, "Mutual compensation of mobility and threshold voltage temperature effects with applications in CMOS circuits," *IEEE Trans. Circuits Syst. I*, vol. 48, no. 7, pp. 876–884, Jul. 2001.
- [2.107] P. Krummenacher and H. Oguey, "Smart temperature sensor in CMOS technology," Sensors and Actuators A: Physical, vol. 22, no. 1-3, pp. 636–638, Jun. 1989.
- [2.108] S.-W. Chen, M.-H. Chang, W.-C. Hsieh, and W. Hwang, "Fully on-chip temperature, process, and voltage sensors," in *IEEE Int'l Symp. on Circuits and Systems*, May 2010, pp. 897–900.
- [2.109] Y. K. Ramadass and A. P. Chandrakasan, "A battery-less thermoelectric energy harvesting interface circuit with 35mv startup voltage," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 333–341, Jan. 2011.
- [2.110] —, "An efficient piezoelectric energy harvesting interface circuit using a bias-flip rectifier and shared inductor," *IEEE J. Solid-State Circuits*, vol. 45, no. 1, pp. 189–204, Jan. 2010.
- [2.111] Y. K. Ramadass, A. A. Fayed, and A. P. Chandrakasan, "A fully-integrated switchedcapacitor step-down DC-DC converter with digital capacitance modulation in 45 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2557–2565, Dec. 2010.
- [2.112] E. Carlson, K. Strunz, and B. Otis, "A 20 mV input boost converter with efficient digital control for thermoelectric energy harvesting," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 741–750, Apr. 2010.

- [2.113] H. Kim, S.-M. Kang, K.-J. Park, C.-W. Baek, and J.-S. Park, "Power management circuit for wireless ubiquitous sensor nodes powered by scavenged energy," *IET Electron. Lett.*, vol. 45, no. 7, pp. 373–374, Mar. 2009.
- [2.114] G. Dhiman and T. S. Rosing, "System-level power management using online learning," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 28, no. 5, pp. 676–689, May 2009.
- [2.115] H. Lhermet, C. Condemine, M. Plissonnier, R. Salot, P. Audebert, and M. Rosset, "Efficient power management circuit: from thermal energy harvesting to above-IC microbattery energy storage," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 246–255, Jan. 2008.
- [2.116] R. Senguttuvan, S. Sen, and A. Chatterjee, "Multidimensional adaptive power management for low-power operation of wireless devices," *IEEE Trans. Circuits Syst. II*, vol. 55, no. 9, pp. 867–871, Sep. 2008.
- [2.117] R. McGowen, C. A. Poirier, C. Bostak, J. Ignowski, M. Millican, W. H. Parks, and S. Naffziger, "Power and temperature control on a 90-nm Itanium family processor," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 229–237, Jan. 2006.
- [2.118] J. A. Paradiso and T. Starner, "Energy scavenging for mobile and wireless electronics," *IEEE Pervasive Computing*, vol. 4, no. 1, pp. 18–27, Jan.-Mar. 2005.
- [2.119] J. Wang and B. H. Calhoun, "Minimum supply voltage and yield estimation for large SRAMs under parametric variations," *IEEE Trans. VLSI Syst.*, vol. 19, no. 11, pp. 2120– 2125, Nov. 2011.
- [2.120] L. Chang, D. Frank, R. Montoye, S. Koester, B. Ji, P. Coteus, R. Dennard, and W. Haensch, "Practical strategies for power-efficient computing technologies," *Proc. IEEE*, vol. 98, no. 2, pp. 215–236, Feb. 2010.
- [2.121] D. Ma and R. Bondade, "Enabling power-efficient DVFS operations on silicon," IEEE Circuits Syst. Mag., vol. 10, no. 1, pp. 14–30, First Quarter 2010.
- [2.122] P. Choudhary and D. Marculescu, "Power management of voltage/frequency islandbased systems using hardware-based methods," *IEEE Trans. VLSI Syst.*, vol. 17, no. 3, pp. 427–438, Mar. 2009.
- [2.123] S. Chandra, K. Lahiri, A. Raghunathan, and S. Dey, "Variation-tolerant dynamic power management at the system-level," *IEEE Trans. VLSI Syst.*, vol. 17, no. 9, pp. 1220–1232, Sep. 2009.
- [2.124] Hagihara, Y. Ikenaga, M. Nomura, Y. Nakazawa, and Yasuhiko, "A circuit for determining the optimal supply voltage to minimize energy consumption in LSI circuit operations," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 911–918, Apr. 2008.
- [2.125] T. Mudge, "Power: A first-class architectural design constraint," *IEEE Computer*, vol. 34, no. 4, pp. 52–58, Apr. 2001.
- [2.126] T. D. Burd and R. W. Brodersen, "Design issues for dynamic voltage scaling," in IEEE Int'l Symp. on Low Power Electronics and Design, Aug. 2000, pp. 9–14.
- [2.127] V. Gutnik and A. Chandrakasan, "Embedded power supply for low-power DSP," IEEE J. Solid-State Circuits, vol. 5, no. 4, pp. 425–435, Dec. 1997.
- [2.128] M. Barai, S. Sengupta, and J. Biswas, "Dual-mode multiple-band digital controller for high-frequency DC-DC converter," *IEEE Trans. Power Electron.*, vol. 24, no. 3, pp. 752–

766, Mar. 2009.

- [2.129] L. Corradini, E. Orietti, P. Mattavelli, and S. Saggini, "Digital hysteretic voltage-mode control for DC-DC converters based on asynchronous sampling," *IEEE Trans. Power Electron.*, vol. 24, no. 1, pp. 201–211, Jan. 2009.
- [2.130] P. Li, L. Xue, P. Hazucha, T. Karnik, and R. Bashirullah, "A delay-locked loop synchronization scheme for high-frequency multiphase hysteretic DC-DC converters," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3131–3145, Nov. 2009.
- [2.131] B. Sahu and G. A. Rincon-Mora, "An accurate, low-voltage, CMOS switching power supply with adaptive on-time pulse-frequency modulation (PFM) control," *IEEE Trans. Circuits Syst. I*, vol. 54, no. 2, pp. 312–321, Feb. 2007.
- [2.132] H.-W. Huang, K.-H. Chen, and S.-Y. Kuo, "Dithering skip modulation, width and dead time controllers in highly efficient DC-DC converters for system-on-chip applications," *IEEE J. Solid-State Circuits*, vol. 42, no. 11, pp. 2451–2465, Nov. 2007.
- [2.133] K.-M. Keung, V. Manne, and A. Tyagi, "A novel charge recycling design scheme based on adiabatic charge pump," *IEEE Trans. VLSI Syst.*, vol. 15, no. 7, pp. 733–745, Jul. 2007.
- [2.134] A. Arakali, S. Gondi, and P. K. Hanumolu, "Analysis and design techniques for supplynoise mitigation in phase-locked loops," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 11, pp. 2880–2889, Nov. 2010.
- [2.135] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, and M. L. Schmatz, "A 1.25-5 GHz clock generator with high-bandwidth supply-rejection using a regulated-replica regulator in 45-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 2901–2910, Nov. 2009.
- [2.136] B. Mesgarzadeh and A. Alvandpour, "A low-power digital DLL-based clock generator in open-loop mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 7, pp. 1907–1913, Jul. 2009.
- [2.137] R.-J. Yang and S.-I. Liu, "A 40-550 Mhz harmonic-free all-digital delay-locked loop using a variable SAR algorithm," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 361–373, Feb. 2007.
- [2.138] H.-H. Chang and S.-I. Liu, "A wide-range and fast-locking all-digital cycle-controlled delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 661–670, Mar. 2005.
- [2.139] Z. Yu and B. Baas, "High performance, energy efficiency, and scalability with GALS chip multiprocessors," *IEEE Trans. VLSI Syst.*, vol. 17, no. 1, pp. 66–79, Jan. 2009.
- [2.140] R. Dobkin, R. Ginosar, and C. P. Sotiriou, "High rate data synchronization in GALS SoCs," *IEEE Trans. VLSI Syst.*, vol. 14, no. 10, pp. 1063–1074, Oct. 2006.
- [2.141] A. Chattopadhyay and Z. Zilic, "GALDS: a complete framework for designing multiclock ASICs and SoCs," *IEEE Trans. VLSI Syst.*, vol. 13, no. 6, pp. 641–654, Jun. 2005.
- [2.142] S. Wooters, B. Calhoun, and T. Blalock, "An energy-efficient subthreshold level converter in 130-nm CMOS," *IEEE Trans. Circuits Syst. II*, vol. 57, no. 4, pp. 290–294, Apr. 2010.
- [2.143] J. C. Chi, H. H. Lee, S. H. Tsai, and M. C. Chi, "Gate level multiple supply voltage assignment algorithm for power optimization under timing constraint," *IEEE Trans. VLSI* Syst., vol. 15, no. 6, pp. 637–648, Jun. 2007.
- [2.144] N. Ickes, G. Gammie, M. E. Sinangil, R. Rithe, J. Gu, A. Wang, H. Mair, S. Datla,

B. Rong, S. Honnavara-Prasad, L. Ho, G. Baldwin, D. Buss, A. P. Chandrakasan, and U. Ko, "A 28 nm 0.6 V low power DSP for mobile applications," *IEEE J. Solid-State Circuits*, vol. pp, no. 99, p. 1, Jan. 2012.

- [2.145] H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "A 320 mV 56 μW 411 GOPS/Watt ultra-low voltage motion estimation accelerator in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 107–114, Jan. 2009.
- [2.146] M. Hammes, C. Kranz, D. Seippel, J. Kissing, and A. Leyk, "Evolution on SoC integration: GSM baseband-radio in 0.13 μm CMOS extended by fully integrated power management unit," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 236–245, Jan. 2008.
- [2.147] D. N. Truong, W. H. Cheng, T. Mohsenin, Z. Yu, A. T. Jacobson, G. Landge, M. J. Meeuwsen, C. Watnik, A. T. Tran, Z. Xiao, E. W. Work, J. W. Webb, P. V. Mejia, and B. M. Baas, "A 167-processor computational platform in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1130–1144, Apr. 2009.
- [2.148] Y. Pu, J. Pineda de Gyvez, H. Corporaal, and Y. Ha, "An ultra-low-energy multistandard JPEG co-processor in 65 nm CMOS with sub/near threshold supply voltage," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 668–680, Mar. 2010.
- [2.149] B. H. Calhoun, J. Lach, J. Stankovic, D. D. Wentzloff, K. Whitehouse, A. T. Barth, J. K. Brown, Q. Li, S. Oh, N. E. Roberts, and Y. Zhang, "Body sensor networks: A holistic approach from silicon to users," *Proc. IEEE*, vol. 100, no. 1, pp. 91–106, Jan. 2012.
- [2.150] J. Ko, C. Lu, M. Srivastava, J. Stankovic, A. Terzis, and M. Welsh, "Wireless sensor networks for healthcare," *Proc. IEEE*, vol. 98, no. 11, pp. 1947–1960, Nov. 2010.
- [2.151] D. C. Daly and A. P. Chandrakasan, "An energy-efficient OOK transceiver for wireless sensor networks," *IEEE J. Solid-State Circuits*, vol. 42, no. 5, pp. 1003–1011, May 2007.
- [2.152] X.-F. Teng, Y.-T. Zhang, C. C.-Y. Pong, and P. Bonato, "Wearable medical systems for p-Health," *IEEE Reviews in Biomedical Engineering*, vol. 1, pp. 62–74, 2008.
- [2.153] T. Schlebusch, L. Rothlingshoofer, S. Kim, M. Kony, and S. Leonhardt, "On the road to a textile integrated bioimpedance early warning system for lung edema," in *Int'l Conf.* on Body Sensor Networks, Jun. 2010, pp. 302–307.
- [2.154] J. Kwong and A. P. Chandrakasan, "An energy-efficient biomedical signal processing platform," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1742–1753, Jul. 2011.
- [2.155] N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P. Chandrakasan, "A micro-power EEG acquisition SoC with integrated feature extraction processor for a chronic seizure detection system," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 804–816, Apr. 2010.
- [2.156] T.-W. Chen, J.-Y. Yu, C.-Y. Yu, and C.-Y. Lee, "A 0.5 V 4.85 Mbps dual-mode baseband transceiver with extended frequency calibration for biotelemetry applications," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 2966–2976, Nov. 2009.
- [2.157] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, and D. Blaauw, "Energy-efficient subthreshold processor design," *IEEE Trans. VLSI Syst.*, vol. 17, no. 8, pp. 1127–1137, Aug. 2009.
- [2.158] W.-H. Sung, J.-Y. Yu, and C.-Y. Lee, "A robust frequency tracking loop for energyefficient crystalless WBAN systems," *IEEE Trans. Circuits Syst. II*, vol. 58, no. 10, pp.

637–641, Oct. 2011.

- [2.159] T.-W. Chen, P.-Y. Tsai, J.-Y. Yu, and C.-Y. Lee, "A sub-mW all-digital signal component separator with branch mismatch compensation for OFDM LINC transmitters," *IEEE J. Solid-State Circuits*, vol. 46, no. 11, pp. 2514–2523, Nov. 2011.
- [2.160] S.-Y. Hsu, J.-Y. Yu, and C.-Y. Lee, "A sub-10-μW digitally controlled oscillator based on hysteresis delay cell topologies for WBAN applications," *IEEE Trans. Circuits Syst. II*, vol. 57, no. 12, pp. 951–955, Dec. 2010.
- [2.161] A. L. Aita, M. A. P. Pertijs, and K. A. A. Makinwa, "A CMOS smart temperature sensor with a batch-calibrated inaccuracy of ±0.25°C (3σ) from -70°C to 130°C," in *IEEE Int'l* Solid-State Circuits Conf., Feb. 2009, pp. 342–343, 343a.
- [2.162] M. Alam, B. Weir, and A. Silverman, "A future of function or failure?" IEEE Circuits Devices Mag., vol. 18, no. 2, pp. 42–48, Mar. 2002.
- [2.163] E. Alon and M. Horowitz, "Integrated regulation for energy-efficient digital circuits," *IEEE J. Solid-State Circuits*, vol. 43, no. 8, pp. 1795–1807, Aug. 2008.
- [2.164] H. Alstad and S. Aunet, "Three subthreshold flip-flop cells characterized in 90 nm and 65 nm CMOS technology," in *IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems*, Apr. 2008, pp. 1–4.
- [2.165] H. P. Alstad and S. Aunet, "Seven subthreshold flip-flop cells," in Norchip, Nov. 2007, pp. 1–4.
- [2.166] A. Arakali, S. Gondi, and P. K. Hanumolu, "Low-power supply-regulation techniques for ring oscillators in phase-locked loops using a split-tuned architecture," *IEEE J. Solid-State Circuits*, vol. 44, no. 8, pp. 2169–2181, Aug. 2009.
- [2.167] A. Bakker and J. H. Huijsing, "Micropower CMOS temperature sensor with digital output," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 933–937, Jul. 1996.
- [2.168] R. Baumann, "The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction," in *IEEE Int'l Electron Devices Meeting*, Dec. 2002, pp. 329–332.
- [2.169] E. Beigne, F. Clermidy, H. Lhermet, S. Miermont, Y. Thonnart, X.-T. Tran, A. Valentian, D. Varreau, P. Vivet, X. Popon, and H. . Lebreton, "An asynchronous power aware and adaptive NoC based circuit," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1167– 1177, Apr. 2009.
- [2.170] K. A. Bowman, B. L. Austin, J. C. Eble, X. Tang, and J. D. Meindl, "A physical alphapower law MOSFET model," *IEEE J. Solid-State Circuits*, vol. 34, no. 10, pp. 218–222, Oct. 1999.
- [2.171] K. A. Bowman, S. G. Duvall, and J. D. Meindl, "Impact of die-to-die and withindie parameter fluctuations on the maximum clock frequency distribution for gigascale integration," *IEEE J. Solid-State Circuits*, vol. 37, no. 2, pp. 183–190, Feb. 2002.
- [2.172] R. W. Brodersen, A. Chandrakasan, and S. Sheng, "Technologies for personal communications," in *IEEE Symp. on VLSI Circuits*, May 1991, pp. 5–9.
- [2.173] B. H. Calhoun, J. Bolus, S. Khanna, A. D. Jurik, A. C. Weaver, and T. N. Blalock, "Subthreshold operation and cross-hierarchy design for ultra low power wearable sensors," in *IEEE Int'l Symp. on Circuits and Systems*, May 2009, pp. 1437–1440.

- [2.174] B. H. Calhoun and A. P. Chandrakasan, "A 256kb sub-threshold SRAM in 65nm CMOS," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2006, pp. 2592–2601.
- [2.175] B. H. Calhoun, S. Khanna, R. Mann, and J. Wang, "Sub-threshold circuit design with shrinking CMOS devices," in *IEEE Int'l Symp. on Circuits and Systems*, May 2009, pp. 2541–2544.
- [2.176] B. H. Calhoun, S. Khanna, Y. Zhang, J. Ryan, and B. Otis, "System design principles combining sub-threshold circuit and architectures with energy scavenging mechanisms," in *IEEE Int'l Symp. on Circuits and Systems*, May 2010, pp. 269–272.
- [2.177] N. Chandra, A. K. Yati, and A. Bhattacharyya, "Extended-sakurai-newton MOSFET model for ultra-deep-submicrometer CMOS digital design," in *Proc. Int. Conf. VLSI De*sign, Jan. 2009, pp. 247–252.
- [2.178] A. Chandrakasan and R. Brodersen, Adaptive Power Supply Systems. Wiley-IEEE Press, 1998.
- [2.179] C.-H. Chang, M.-H. Chang, and W. Hwang, "A flexible two-layer external memory management for H.264/AVC decoder," in *IEEE Int'l System-on-Chip Conf.*, Sep. 2007, pp. 219–222.
- [2.180] L. Chang, R. K. Montoye, Y. Nakamura, K. A. Batson, R. J. Eickemeyer, R. H. Dennard, W. Haensch, and D. Jamsek, "An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 956–963, Apr. 2008.
- [2.181] M.-H. Chang, Multimode wireless transmitter design for radio convergence. Master's thesis, National Chiao Tung University, Hsin-Chu, Taiwan, 2001.
- [2.182] M.-H. Chang, L.-P. Chuang, Y.-M. Chang, and W. Hwang, "A 300-mV 36-μW multiphase dual digital clock output generator with self-calibration," in *IEEE Int'l System-on-Chip Conf.*, Sep. 2008, pp. 97–100.
- [2.183] M.-H. Chang, C.-Y. Hsieh, M.-W. Chen, and W. Hwang;, "Near-/sub-threshold DLLbased clock generator with PVT-aware locking range compensation," in *IEEE Int'l Symp.* on Low Power Electronics and Design, Aug. 2011, pp. 15–20.
- [2.184] M.-H. Chang, J.-Y. Wu, W.-C. Hsieh, S.-Y. Lin, Y.-W. Liang, and W. Hwang, "High efficiency power management system for solar energy harvesting applications," in *IEEE Asia-Pacific Conf. on Circuits and Systems*, Dec. 2010, pp. 879–882.
- [2.185] M.-H. Chang, Z.-X. Yang, and W. Hwang, "A 1.9mW portable ADPLL-based frequency synthesizer for high speed clock generation," in *IEEE Int'l Symp. on Circuits and Systems*, May 2007, pp. 1137–1140.
- [2.186] T.-W. Chen and C.-Y. Lee, *u-PHI specification*, 2010.
- [2.187] C.-Y. Cheng, M.-H. Chang, and W. Hwang, "Power-gating sense amplifier of low power pseudo SRAM," in *Int'l Symp. on VLSI Design, Automation, and Test*, Apr. 2007, pp. 260–263.
- [2.188] W. H. Cheng and B. M. Baas, "Dynamic voltage and frequency scaling circuits with two supply voltages," in *IEEE Int'l Symp. on Circuits and Systems*, May 2008, pp. 1236–1239.
- [2.189] Y.-T. Chiu, M.-H. Chang, H.-Y. Yang, and W. Hwang, "Subthreshold asynchronous FIFO memory for wireless body area networks (WBANs)," in *Int'l Symp. on Medical*

Information and Communication Technology, Mar. 2010, pp. 1–4.

- [2.190] K.-S. Chong, B.-H. Gwee, and J. S. Chang, "Energy-efficient synchronous-logic and asynchronous-logic FFT/IFFT processors," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 2034–2045, Sep. 2007.
- [2.191] L.-P. Chuang, M.-H. Chang, P.-T. Huang, C.-H. Kan, and W. Hwang, "A 5.2mW alldigital fast-lock self-calibrated multiphase delay-locked loop," in *IEEE Int'l Symp. on Circuits and Systems*, May 2008, pp. 3342–3345.
- [2.192] N. Derhacobian, S. C. Hollmer, N. Gilbert, and M. N. Kozicki, "Power and energy perspectives of nonvolatile memory technologies," *Proc. IEEE*, vol. 98, no. 2, pp. 283–298, Feb. 2010.
- [2.193] N. Derhacobian, V. Vardanian, and Y. Zorian, "Embedded memory reliability: the SER challenge," in *IEEE Int'l Workshop on Memory Technology, Design, and Testing*, Aug. 2004, pp. 104–110.
- [2.194] P. Emma and E. Kursun, "Opportunities and challenges for 3D systems and their design," *IEEE Des. Test. Comput.*, vol. 26, no. 5, pp. 6–14, Sep.-Oct. 2009.
- [2.195] S. Fisher, A. Teman, D. Vaysman, A. Gertsman, O. Yadid-Pecht, and A. Fish, "Ultralow power subthreshold flip-flop design," in *IEEE Int'l Symp. on Circuits and Systems*, May 2009, pp. 1573–1576.
- [2.196] B. Fu and P. Ampadu, "Comparative analysis of ultra-low voltage Flip-Flops for energy efficiency," in *IEEE Int'l Symp. on Circuits and Systems*, May 2007, pp. 1173–1176.
- [2.197] S. K. Gupta, A. Raychowdhury, and K. Roy, "Digital computation in subthreshold region for ultralow-power operation: a device-circuit-architecture co-design perspective," *Proc. IEEE*, vol. 98, no. 2, pp. 160–190, Feb. 2010.
- [2.198] P. Hazucha, T. Karnik, J. Maiz, S. Walstra, B. Bloechel, J. Tschanz, G. Dermer, S. Hareland, P. Armstrong, and S. Borkar, "Neutron soft error rate measurements in 90-nm CMOS process and scaling trends in SRAM from 0.25-μm to 90-nm generation," in *IEEE Int'l Electron Devices Meeting*, Dec. 2003, pp. 21.5.1–21.5.4.
- [2.199] T. Hiramoto, "Ultra-low-voltage operation: Device perspective," in IEEE Int'l Symp. on Low Power Electronics and Design, Aug. 2011, pp. 59–60.
- [2.200] W.-C. Hsieh and W. Hwang, "Low quiescent current variable output digital controlled voltage regulator," in *IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2010, pp. 609–612.
- [2.201] K. Itoh, "Adaptive circuits for the 0.5-V nanoscale CMOS era," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2009, pp. 14–20.
- [2.202] E. Karl, P. Singh, D. Blaauw, and D. Sylvester, "Compact in-situ sensors for monitoring negative-bias-temperature-instability effect and oxide degradation," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2008, pp. 410–411, 623.
- [2.203] K. Kim, H. Lee, S. Jung, and C. Kim, "A 366kS/s 400μW 0.0013mm<sup>2</sup> frequency-todigital converter based CMOS temperature sensor utilizing multiphase clock," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2009, pp. 203–206.
- [2.204] T.-H. Kim, H. Eom, J. Keane, and C. H. Kim, "Utilizing reverse short channel effect for optimal subthreshold circuit design," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Oct. 2006, pp. 127–130.

- [2.205] M. Koyanagi, "3D super chip technology to achieve low-power and high-performance system-on-a chip," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Aug. 2011, pp. 61–62.
- [2.206] K. J. Kuhn, "Reducing variation in advanced logic technologies: Approaches to process and design for manufacturability of nanoscale CMOS," in *IEEE Int'l Electron Devices Meeting*, Dec. 2007, pp. 471–474.
- [2.207] J. P. Kulkarni, K. Kim, S.-P. Park, and K. Roy, "Process variation tolerant SRAM array for ultra low voltage applications," in *IEEE Design Automation Conf.*, Jun. 2008, pp. 108–113.
- [2.208] H.-S. Lee, L. Brooks, and C. G. Sodini, "Zero-crossing-based ultra-low-power A/D converters," *Proc. IEEE*, vol. 98, no. 2, pp. 315–332, Feb. 2010.
- [2.209] L. Lewyn and N. Williams, "Is a new paradigm for nanoscale analog CMOS design needed?" Proc. IEEE, vol. 99, no. 1, pp. 3–6, Jan. 2011.
- [2.210] Y. W. Li, H. Lakdawala, A. Raychowdhurya, G. Taylor, and K. Soumyanath, "A 1.05v 1.6mW 0.45°C 3σ-resolution δσ-based temperature sensor with parasitic-resistance compensation in 32nm CMOS," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2009, pp. 340– 341, 341a.
- [2.211] W.-M. Lin, C.-C. Chen, and S.-I. Liu, "An all-digital clock generator for dynamic frequency scaling," in *Int'l Symp. on VLSI Design*, Automation, and Test, Apr. 2009, pp. 251–254.
- [2.212] Y.-S. Lin, D. Sylvester, and D. Blaauw, "An ultra low power 1V, 220nW temperature sensor for passive wireless applications," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2008, pp. 507–510.
- [2.213] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, "Characterization of multi-bit soft error events in advanced SRAMs," in *IEEE Int'l Electron Devices Meeting*, Dec. 2003, pp. 21.4.1–21.4.4.
- [2.214] P. P. Mercier, D. C. Daly, and A. P. Chandrakasan, "An energy-efficient all-digital UWB transmitter employing dual capacitively-coupled pulse-shaping drivers," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1679–1688, Jun. 2009.
- [2.215] K. Muhammad, R. B. Staszewski, and D. Leipold, "Digital RF processing: toward lowcost reconfigurable radios," *IEEE Communications Magazine*, vol. 43, no. 8, pp. 105–113, Aug. 2005.
- [2.216] S. Mukhopadhyay, R. Rao, J. J. Kim, and C. T. Chuang, "Capacitive coupling based transient negative bit-line voltage (Tran-NBL) scheme for improving write-ability of SRAM design in nanometer technologies," in *IEEE Int'l Symp. on Circuits and Systems*, May 2008, pp. 384–387.
- [2.217] A. Mntyniemi, T. Rahkonen, and J. Kostamovaara, "A CMOS time-to-digital converter (TDC) based on a cyclic time domain successive approximation interpolation method," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3067–3078, Nov. 2009.
- [2.218] Y. Osaki, T. Hirose, K. Matsumoto, N. Kuroki, and M. . Numa, "Delay-compensation techniques for ultra-low-power subthreshold CMOS digital LSIs," in *IEEE Int'l Midwest Symp. on Circuits and Systems*, Aug. 2009, pp. 503–506.

- [2.219] C.-S. Peng, M.-H. Chang, and K.-A. Wen, "Early-late gate receiving for Bluetooth packet," in *Int'l Symp. on VLSI Technology, Systems, and Applications*, Apr. 2001, pp. 51–60.
- [2.220] Y. K. Ramadass, "Energy processing circuits for low-power applications," Ph.D. dissertation, Massachusetts Inst. of Technology, Cambridge, MA, Jun. 2009.
- [2.221] Y. K. Ramadass and A. P. Chandrakasan, "Voltage scalable switched capacitor DC-DC converter for ultra-low-power on-chip applications," in *IEEE Power Electronics Specialists Conf.*, Jun. 2007, pp. 2353–2359.
- [2.222] G. W. Roberts and M. Ali-Bakhshian, "A brief introduction to time-to-digital and digital-to-time converters," *IEEE Trans. Circuits Syst. II*, vol. 57, no. 3, pp. 153–157, Mar. 2010.
- [2.223] H. Saito, M. Nakajima, T. Okamoto, Y. Yamada, A. Ohuchi, N. Iguchi, T. Sakamoto, K. Yamaguchi, and M. Mizuno, "A chip-stacked memory for on-chip SRAM-rich SoCs and processors," *IEEE J. Solid-State Circuits*, vol. 45, no. 1, pp. 15–22, Jan. 2010.
- [2.224] T. Sakarai and A. R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," *IEEE J. Solid-State Circuits*, vol. 25, no. 2, pp. 584–594, Apr. 1990.
- [2.225] T. Sakurai, "Designing ultra-low voltage logic," in IEEE Int'l Symp. on Low Power Electronics and Design, Aug. 2011, pp. 57–58.
- [2.226] F. Sebastiano, L. Breems, K. Makinwa, S. Drago, D. Leenaerts, and B. Nauta, "A 1.2V  $10\mu$ W NPN-based temperature sensor in 65nm CMOS with an inaccuracy of  $\pm 0.2^{\circ}$ C ( $3\sigma$ ) from -70°C to 125°C," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2010, pp. 312–313.
- [2.227] A. Shibayama, K. Nose, S. Torii, M. Mizuno, and M. Edahiro, "Skew-tolerant global synchronization based on periodically al-in-phase clocking for multi-core SOC platforms," in *IEEE Symp. on VLSI Circuits*, Jun. 2007, pp. 158–159.
- [2.228] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, "A 45nm 0.5V 8T column-interleaved SRAM with on-chip reference selection loop for sense-amplifier," in *IEEE Asian Solid-State Circuits Conf.*, Nov. 2009, pp. 225–228.
- [2.229] C. Slayman, "Soft error trends and mitigation techniques in memory devices," in The Annual Reliability and Maintainability Symposium, Jan. 2011, pp. 1–5.
- [2.230] K. Souri, M. Kashmiri, and K. Makinwa, "A CMOS temperature sensor with an energyefficient zoom ADC and an inaccuracy of ±0.25°C (3σ) from -40°C to 125°C," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2010, pp. 310–311.
- [2.231] J. A. Starzyk and H. He, "A novel low-power logic circuit design scheme," *IEEE Trans. Circuits Syst. II*, vol. 54, no. 2, pp. 176–180, Feb. 2007.
- [2.232] I. Sutherland, R. F. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits. Morgan Kaufmann, 1999.
- [2.233] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices. Cambridge, U.K.: Cambridge Univ. Press, 1998.
- [2.234] L. Vercesi, A. Liscidini, and R. Castello, "Two-dimensions vernier time-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 45, no. 8, pp. 1504–1512, Aug. 2010.
- [2.235] C. P. L. van Vroonhoven and K. A. A. Makinwa, "A CMOS temperature-to-digital

converter with an inaccuracy of  $\pm 0.5^{\circ}$ C (3 $\sigma$ ) from -55 to 125°C," in *IEEE Int'l Solid-State Circuits Conf.*, FEB 2008, pp. 576–577, 637.

- [2.236] A. C.-W. Wong, G. O. C. O. McDonagh, D.and Kathiresan, O. El-Jamaly, T. C.-K. Chan, P. Paddan, and A. J. Burdett, "A 1v, micropower system-on-chip for vital-sign monitoring in wireless body sensor networks," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2008, pp. 138–139, 602.
- [2.237] K. Woo, S. Meninger, T. Xanthopoulos, E. Crain, D. Ha, and D. Ham, "Dual-DLLbased CMOS all-digital temperature snesor for microprocessor thermal monitoring," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2009, pp. 68–69, 69a.
- [2.238] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, and H. Shinohara, "A 45nm low-standby-power embedded SRAM with improved immunity against process and temperature variations," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2007, pp. 326–327, 606.
- [2.239] H.-I. Yang, M.-H. Chang, S.-Y. Lai, H.-F. Wang, and W. Hwang, "A low-power lowswing single-ended multi-port SRAM," in *Int'l Symp. on VLSI Design, Automation, and Test*, Apr. 2007, pp. 28–31.
- [2.240] H.-I. Yang, M.-H. Chang, T.-J. Lin, S.-H. Ou, S.-S. Deng, C.-W. Liu, and W. Hwang, "A controllable low-power dual-port embedded SRAM for DSP processor," in *IEEE Int'l Workshop on Memory Technology, Design, and Testing*, Dec. 2007, pp. 27–30.
- [2.241] Y.-W. Yang and K. S.-M. Li, "Temperature-aware dynamic frequency and voltage scaling for reliability and yield enhancement," in Asia and South Pacific Design Automation Conf. (ASPDAC), Jan. 2009, pp. 49–54.
- [2.242] C.-Y. Yu, J.-Y. Yu, and C.-Y. Lee, "An ecrystal oscillator with self-calibration capability," in *IEEE Int'l Symp. on Circuits and Systems*, May 2009, pp. 237–240.
- [2.243] J. Yu, F. F. Dai, and R. C. Jaeger, "A 12-bit Vernier ring time-to-digital converter in 0.13µm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 830–842, Apr. 2010.
- [2.244] B. R. Zeydel, D. Baran, and V. G. Oklobdzija, "Energy-efficient design methodologies: High-performance VLSI adders," *IEEE J. Solid-State Circuits*, vol. 45, no. 6, pp. 1220– 1233, Jun. 2010.
- [2.245] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, "Theoretical and practical limits of dynamic voltage scaling," in *IEEE Design Automation Conf.*, Jun. 2004, pp. 868–873.
- [2.246] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Aug. 2005, pp. 20–25.
- [2.247] X. Zhao, J. Tolbert, C. Liu, S. Mukhopadhyay, and S. K. Lim;, "Variation-aware clock network design methodology for ultra-low voltage (ULV) circuits," in *IEEE Int'l Symp.* on Low Power Electronics and Design, Aug. 2011, pp. 9–14.
- [2.248] C. Zheng and D. Ma, "A 10-MHz green-mode automatic reconfigurable switching converter for DVS-enabled VLSI systems," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1464–1477, Jun. 2011.

[2.249] Predictive technology model. Nanoscale Integration and Modeling (NIMO) Group, ASU. [Online]. Available: http://ptm.asu.edu/



- [3.1] S. K. Gupta, A. Raychowdhury, and K. Roy, "Digital computation in subthreshold region for ultralow-power operation: a device-circuit-architecture co-design perspective," *Proc. IEEE*, vol. 98, no. 2, pp. 160–190, Feb. 2010.
- [3.2] A. P. Chandrakasan, D. C. Daly, D. F. Finchelstein, J. Kwong, Y. K. Ramadass, M. E. Sinangil, V. Sze, and N. Verma, "Technologies for ultradynamic voltage scaling," *Proc. IEEE*, vol. 98, no. 2, pp. 191–214, Feb. 2010.
- [3.3] K. Itoh, "Adaptive circuits for the 0.5-V nanoscale CMOS era," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2009, pp. 14–20.
- [3.4] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65nm sub-v<sub>t</sub> microcontroller with integrated SRAM and switched capacitor DC-DC converter," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 115–126, Jan. 2009.
- [3.5] V. F. Pavlidis and E. G. Friedman, "Interconnect-based design methodologies for threedimensional integrated circuits," *Proc. IEEE*, vol. 98, no. 1, pp. 123–140, Jan. 2009.
- [3.6] A. W. Topol, D. C. L. Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini, and M. Ieong, "Three-dimensional integrated circuits," *IBM Journal of Research and Development*, vol. 50, no. 4/5, pp. 491–506, Jul./Sep. 2006.
- [3.7] P. Krummenacher and H. Oguey, "Smart temperature sensor in CMOS technology," Sensors and Actuators A: Physical, vol. 22, no. 1-3, pp. 636–638, Jun. 1989.
- [3.8] A. Bakker and J. H. Huijsing, "Micropower CMOS temperature sensor with digital output," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 933–937, Jul. 1996.
- [3.9] A. L. Aita, M. A. P. Pertijs, and K. A. A. Makinwa, "A CMOS smart temperature sensor with a batch-calibrated inaccuracy of ±0.25°C (3σ) from -70°C to 130°C," in *IEEE Int'l* Solid-State Circuits Conf., Feb. 2009, pp. 342–343, 343a.
- [3.10] M. A. P. Pertijs, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS smart temperature sensor with a 3σ inaccuracy of ±0.1°C from -55°C to 125°C," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2805–2815, Dec. 2005.
- [3.11] K. Souri and K. Makinwa, "A 0.12 mm<sup>2</sup> 7.4 μW micropower temperature sensor with an inaccuracy of ±0.2°C (3σ) from -30 °C to 125 °C," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1693–1700, Jul. 2011.
- [3.12] P. Chen, C.-C. Chen, C.-C. Tsai, and W.-F. Lu, "A time-to-digital-converter-based CMOS smart temperature sensor," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1642–1648, Aug. 2005.
- [3.13] P. Chen, C.-C. Chen, Y.-H. Peng, K.-M. Wang, and Y.-S. Wang, "A time-domain

SAR smart temperature sensor with curvature compensation and a  $3\sigma$  inaccuracy of  $0.4^{\circ}C \sim +0.6^{\circ}C$  over a  $0^{\circ}C$  to  $90^{\circ}C$  range," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 600–609, Mar. 2010.

- [3.14] K. Woo, S. Meninger, T. Xanthopoulos, E. Crain, D. Ha, and D. Ham, "Dual-DLL-based CMOS all-digital temperature snesor for microprocessor thermal monitoring," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2009, pp. 68–69, 69a.
- [3.15] E. Socher, S. M. Beer, and Y. Nemirovsky, "Temperature sensitivity of SOI-CMOS transistors for use in uncooled thermal sensing," *IEEE Trans. Electron Devices*, vol. 52, no. 12, pp. 2784–2790, Dec. 2005.
- [3.16] K. Kim, H. Lee, S. Jung, and C. Kim, "A 366kS/s 400μW 0.0013mm<sup>2</sup> frequency-to-digital converter based CMOS temperature sensor utilizing multiphase clock," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2009, pp. 203–206.
- [3.17] Y. Taur and T. H. Ning, *Fundamentals of Modern VLSI Devices*. Cambridge, U.K.: Cambridge Univ. Press, 1998.
- [3.18] I. M. Filanovsky and A. Allam, "Mutual compensation of mobility and threshold voltage temperature effects with applications in CMOS circuits," *IEEE Trans. Circuits Syst. I*, vol. 48, no. 7, pp. 876–884, Jul. 2001.
- [3.19] Y. W. Li, H. Lakdawala, A. Raychowdhurya, G. Taylor, and K. Soumyanath, "A 1.05v 1.6mW 0.45°C 3σ-resolution δσ-based temperature sensor with parasitic-resistance compensation in 32nm CMOS," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2009, pp. 340–341, 341a.
- [3.20] Y.-S. Lin, D. Sylvester, and D. Blaauw, "An ultra low power 1V, 220nW temperature sensor for passive wireless applications," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2008, pp. 507–510.
- [3.21] F. Sebastiano, L. Breems, K. Makinwa, S. Drago, D. Leenaerts, and B. Nauta, "A 1.2V  $10\mu$ W NPN-based temperature sensor in 65nm CMOS with an inaccuracy of  $\pm 0.2^{\circ}$ C ( $3\sigma$ ) from -70°C to 125°C," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2010, pp. 312–313.
- [3.22] C.-C. Chung and C.-R. Yang, "An autocalibrated all-digital temperature sensor for onchip thermal monitoring," *IEEE Trans. Circuits Syst. II*, vol. 58, no. 2, pp. 105–109, Feb. 2011.
- [3.23] A. Shibayama, K. Nose, S. Torii, M. Mizuno, and M. Edahiro, "Skew-tolerant global synchronization based on periodically al-in-phase clocking for multi-core SOC platforms," in *IEEE Symp. on VLSI Circuits*, Jun. 2007, pp. 158–159.
- [3.24] W.-M. Lin, C.-C. Chen, and S.-I. Liu, "An all-digital clock generator for dynamic frequency scaling," in *Int'l Symp. on VLSI Design, Automation, and Test*, Apr. 2009, pp. 251–254.
- [3.25] I. Sutherland, R. F. Sproull, and D. Harris, *Logical Effort: Designing Fast CMOS Circuits*. Morgan Kaufmann, 1999.
- [3.26] Predictive technology model. Nanoscale Integration and Modeling (NIMO) Group, ASU. [Online]. Available: http://ptm.asu.edu/
- [3.27] K. A. Bowman, B. L. Austin, J. C. Eble, X. Tang, and J. D. Meindl, "A physical alphapower law MOSFET model," *IEEE J. Solid-State Circuits*, vol. 34, no. 10, pp. 218–222,

Oct. 1999.

- [3.28] R.-J. Yang and S.-I. Liu, "A 40-550 Mhz harmonic-free all-digital delay-locked loop using a variable SAR algorithm," *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 361–373, Feb. 2007.
- [3.29] B. Mesgarzadeh and A. Alvandpour, "A low-power digital DLL-based clock generator in open-loop mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 7, pp. 1907–1913, Jul. 2009.
- [3.30] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Aug. 2005, pp. 20–25.
- [3.31] M.-H. Chang, C.-Y. Hsieh, M.-W. Chen, and W. Hwang; "Near-/sub-threshold DLLbased clock generator with PVT-aware locking range compensation," in *IEEE Int'l Symp.* on Low Power Electronics and Design, Aug. 2011, pp. 15–20.
- [3.32] M.-H. Chang, C.-Y. Hsieh, M.-W. Chen, and W. Hwang, "Logical effort models with voltage and temperature extension in super-/near-/sub-threshold regions," in *Int'l Symp.* on VLSI Design, Automation, and Test, Apr. 2011, pp. 213–216.



- [4.1] B. H. Calhoun, J. F. Ryan, S. Khanna, M. Putic, and J. Lach, "Flexible circuits and architectures for ultralow power," *Proc. IEEE*, vol. 98, no. 2, pp. 267–282, Feb. 2010.
- [4.2] S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K. K. Das, W. Haensch, E. J. Nowak, and D. M. Sylvester, "Ultralow-voltage, minimum-energy CMOS," *IBM Journal of Research and Development*, vol. 50, no. 4/5, pp. 469–490, Jul. 2006.
- [4.3] H. Yamauchi, "A discussion on SRAM circuit design trend in deeper nanometer-scale technologies," *IEEE Trans. VLSI Syst.*, vol. 18, no. 5, pp. 763–774, May 2010.
- [4.4] N. Verma and A. P. Chandrakasan, "A 256kb 65nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [4.5] T.-H. Kim, J. Liu, and C. H. Kim, "A voltage scalable 0.26v, 64kb 8T SRAM with V<sub>min</sub> lowering techniques and deep sleep mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1785–1795, Jun. 2009.
- [4.6] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, "A 32kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 650–658, Feb. 2009.
- [4.7] J. P. Kulkarni, K. Kim, S.-P. Park, and K. Roy, "Process variation tolerant SRAM array for ultra low voltage applications," in *IEEE Design Automation Conf.*, Jun. 2008, pp. 108–113.
- [4.8] Z. Yu and B. Baas, "High performance, energy efficiency, and scalability with GALS chip multiprocessors," *IEEE Trans. VLSI Syst.*, vol. 17, no. 1, pp. 66–79, Jan. 2009.
- [4.9] D. N. Truong, W. H. Cheng, T. Mohsenin, Z. Yu, A. T. Jacobson, G. Landge, M. J. Meeuwsen, C. Watnik, A. T. Tran, Z. Xiao, E. W. Work, J. W. Webb, P. V. Mejia, and B. M. Baas, "A 167-processor computational platform in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1130–1144, Apr. 2009.
- [4.10] T.-W. Chen, J.-Y. Yu, C.-Y. Yu, and C.-Y. Lee, "A 0.5 V 4.85 Mbps dual-mode baseband transceiver with extended frequency calibration for biotelemetry applications," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 2966–2976, Nov. 2009.
- [4.11] T.-W. Chen, P.-Y. Tsai, J.-Y. Yu, and C.-Y. Lee, "A sub-mW all-digital signal component separator with branch mismatch compensation for OFDM LINC transmitters," *IEEE J. Solid-State Circuits*, vol. 46, no. 11, pp. 2514–2523, Nov. 2011.
- [4.12] P. Hazucha, T. Karnik, J. Maiz, S. Walstra, B. Bloechel, J. Tschanz, G. Dermer, S. Hareland, P. Armstrong, and S. Borkar, "Neutron soft error rate measurements in 90-nm CMOS process and scaling trends in SRAM from 0.25-μm to 90-nm generation," in *IEEE Int'l Electron Devices Meeting*, Dec. 2003, pp. 21.5.1–21.5.4.

- [4.13] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, "A 45nm 0.5V 8T column-interleaved SRAM with on-chip reference selection loop for sense-amplifier," in *IEEE Asian Solid-State Circuits Conf.*, Nov. 2009, pp. 225–228.
- [4.14] A.-T. Do, J. Low, J. Low, Z.-H. Kong, X. Tan, and K.-S. Yeo, "An 8T differential SRAM with improved noise margin for bit-interleaving in 65 nm CMOS," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 6, pp. 1252–1263, Jun. 2011.
- [4.15] T.-H. Kim, H. Eom, J. Keane, and C. H. Kim, "Utilizing reverse short channel effect for optimal subthreshold circuit design," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Oct. 2006, pp. 127–130.
- [4.16] J. P. Kulkarni and K. Roy, "Ultralow-voltage process-variation-tolerant schmitt-triggerbased SRAM design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no. 2, pp. 319–332, Feb. 2012.
- [4.17] N. Derhacobian, V. Vardanian, and Y. Zorian, "Embedded memory reliability: the SER challenge," in *IEEE Int'l Workshop on Memory Technology, Design, and Testing*, Aug. 2004, pp. 104–110.
- [4.18] R. Baumann, "The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction," in *IEEE Int'l Electron Devices Meeting*, Dec. 2002, pp. 329–332.
- [4.19] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, "Characterization of multi-bit soft error events in advanced SRAMs," in *IEEE Int'l Electron Devices Meeting*, Dec. 2003, pp. 21.4.1–21.4.4.
- [4.20] M.-H. Chang, Y.-T. Chiu, S.-L. Lai, and W. Hwang, "A 1kb 9T subthreshold SRAM with bit-interleaving scheme in 65nm CMOS," in *IEEE Int'l Symp. on Low Power Electronics* and Design, Aug. 2011, pp. 291–296.
- [4.21] N. Verma, "Analysis towards minimization of total SRAM energy over active and idle operating modes," *IEEE Trans. VLSI Syst.*, vol. 19, no. 9, pp. 1695–1703, Sep. 2011.
- [4.22] K. Nii, Y. Tsukamoto, M. Yabuuchi, Y. Masuda, S. Imaoka, K. Usui, S. Ohbayashi, H. Makino, and H. . Shinohara, "Synchronous ultra-high-density 2RW dual-port 8T-SRAM with circumvention of simultaneous common-row-access," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 977–986, Mar. 2009.
- [4.23] N. Verma, J. Kwong, and A. P. Chandrakasan, "Nanometer MOSFET variation in minimum energy subthreshold circuits," *IEEE Trans. Electron Devices*, vol. 55, no. 1, pp. 163–174, Jan. 2008.
- [4.24] K. J. Kuhn, "Reducing variation in advanced logic technologies: Approaches to process and design for manufacturability of nanoscale CMOS," in *IEEE Int'l Electron Devices Meeting*, Dec. 2007, pp. 471–474.
- [4.25] B. H. Calhoun and A. P. Chandrakasan, "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007.
- [4.26] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2V, 480kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008.

- [4.27] L. Chang, R. K. Montoye, Y. Nakamura, K. A. Batson, R. J. Eickemeyer, R. H. Dennard, W. Haensch, and D. Jamsek, "An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 956–963, Apr. 2008.
- [4.28] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. lshikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, and H. Shinohara, "A 45nm low-standby-power embedded SRAM with improved immunity against process and temperature variations," in *IEEE Int'l Solid-State Circuits Conf.*, Feb. 2007, pp. 326–327, 606.
- [4.29] S. Mukhopadhyay, R. M. Rao, and J.-J. K. C.-T. Chuang, "SRAM write-ability improvement with transient negative bit-line voltage," *IEEE Trans. VLSI Syst.*, vol. 19, no. 1, pp. 24–32, Jan. 2011.
- [4.30] B. Fu and P. Ampadu, "Comparative analysis of ultra-low voltage Flip-Flops for energy efficiency," in *IEEE Int'l Symp. on Circuits and Systems*, May 2007, pp. 1173–1176.



- [5.1] A. P. Chandrakasan, D. C. Daly, J. Kwong, and Y. K. Ramadass, "Next generation micropower systems," in *IEEE Symp. on VLSI Circuits*, Jun. 2008, pp. 2–5.
- [5.2] T.-W. Chen and C.-Y. Lee, u-PHI specification, 2010.
- [5.3] X.-F. Teng, Y.-T. Zhang, C. C.-Y. Pong, and P. Bonato, "Wearable medical systems for p-Health," *IEEE Reviews in Biomedical Engineering*, vol. 1, pp. 62–74, 2008.
- [5.4] A. P. Chandrakasan, D. C. Daly, D. F. Finchelstein, J. Kwong, Y. K. Ramadass, M. E. Sinangil, V. Sze, and N. Verma, "Technologies for ultradynamic voltage scaling," *Proc. IEEE*, vol. 98, no. 2, pp. 191–214, Feb. 2010.
- [5.5] T.-H. Kim, H. Eom, J. Keane, and C. H. Kim, "Utilizing reverse short channel effect for optimal subthreshold circuit design," in *IEEE Int'l Symp. on Low Power Electronics and Design*, Oct. 2006, pp. 127–130.
- [5.6] N. Verma and A. P. Chandrakasan, "A 256kb 65nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [5.7] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2V, 480kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008.
- [5.8] B. H. Calhoun and A. P. Chandrakasan, "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007.
- [5.9] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, "A reconfigurable 8T ultra-dynamic voltage scalable (U-DVS) sram in 65nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3163–3173, Nov. 2009.
- [5.10] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65nm sub- $v_t$  microcontroller with integrated SRAM and switched capacitor DC-DC converter," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 115–126, Jan. 2009.
- [5.11] H.-W. Huang, K.-H. Chen, and S.-Y. Kuo, "Dithering skip modulation, width and dead time controllers in highly efficient DC-DC converters for system-on-chip applications," *IEEE J. Solid-State Circuits*, vol. 42, no. 11, pp. 2451–2465, Nov. 2007.
- [5.12] Y. K. Ramadass and A. P. Chandrakasan, "Voltage scalable switched capacitor DC-DC converter for ultra-low-power on-chip applications," in *IEEE Power Electronics Specialists Conf.*, Jun. 2007, pp. 2353–2359.

- [6.1] K. Kim, H. Lee, S. Jung, and C. Kim, "A 366kS/s 400µW 0.0013mm<sup>2</sup> frequency-to-digital converter based CMOS temperature sensor utilizing multiphase clock," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2009, pp. 203–206.
- [6.2] Y.-S. Lin, D. Sylvester, and D. Blaauw, "An ultra low power 1V, 220nW temperature sensor for passive wireless applications," in *IEEE Custom Integrated Circuits Conf.*, Sep. 2008, pp. 507–510.
- [6.3] M.-H. Chang, J.-Y. Wu, W.-C. Hsieh, S.-Y. Lin, Y.-W. Liang, and W. Hwang, "High efficiency power management system for solar energy harvesting applications," in *IEEE Asia-Pacific Conf. on Circuits and Systems*, Dec. 2010, pp. 879–882.
- [6.4] W.-C. Hsieh and W. Hwang, "Low quiescent current variable output digital controlled voltage regulator," in *IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2010, pp. 609–612.



# Vita

#### PERSONAL INFORMATION

| Birth Date:  | May 10, 1978                          |
|--------------|---------------------------------------|
| Birth Place: | Tainan, Taiwan, R.O.C.                |
| Address:     | Department of Electronics Engineering |
|              | National Chiao Tung University        |
|              | No. 1001 Ta-Hsueh Road                |
|              | Hsin-Chu 300, Taiwan, R.O.C.          |
| E-Mail:      | tako.ee88g@nctu.edu.tw                |

#### **EDUCATION**

B.S. [2000] Department of Electronics Engineering, National Chiao-Tung University.

- M.S. [2001] Institute of Electronics, National Chiao-Tung University.
- M.E. [2004] Department of Electrical Engineering, Texas A&M University, U.S.A.

#### WORK/RESEARCH/PROJECT EXPERIENCES

- MIRC, EE NCTU, Ministry of Economic Affairs, Research Engineer uPHI: Wireless Body Area Network Core Technology
- MIRC, EE NCTU, National Science Council, Research Engineer Multi-System Merging and Green Computing Techniques for Wireless Video Entertainment - Low Power On-Demand Memory System for Multi-Core Design
- MIRC, EE NCTU, Industrial Technology Research Institute, Research Engineer Micro-Watt DSP Processor for Multi-Core Applications
- MIRC, EE NCTU, Ministry of Economic Affairs, Research Engineer Advanced System Designs for High-Performance and Low-Power Dual-Core Processors

#### **HONORS**

1997 Fall NCTU EE Excellence Studentship Award2000 First Rank entering Institute of Electronics, National Chiao-Tung University

#### **EXTRACURRICULAR ACTIVITIES**

1998 Vice President, Student Association of Electronics Engineering, NCTU1997 President of Freshman Welcome Camp, Electronics Engineering, NCTU

# **Publications**

#### JOURNAL PUBLICATIONS

- [1] Ming-Hung Chang, Yi-Te Chiu, and Wei Hwang, "Design and iso-area V<sub>min</sub> analysis of 9T subthreshold SRAM with bit-interleaving scheme in 65nm CMOS," *IEEE Transaction on Circuits and Systems II: Express Briefs*, vol. 59, no. 7, Jul. 2012, accepted for publication.
- [2] Ming-Hung Chang, Shang-Yuan Lin, and Wei Hwang, "A 0.4V 520nW 990μm<sup>2</sup> fully integrated frequency-domain smart temperature sensor in 65nm CMOS," *ASP Journal of Low Power Electronics*, vol. 8, no.1 Feb. 2012, pp. 63-72.
- [3] Ming-Hung Chang, Yi-Te Chiu, and Wei Hwang, "An asynchronous subthreshold 8T-SRAM-based FIFO memory for WBANs in 65nm CMOS," submitted to *IEEE Transactions on Circuits and Systems I: Regular Papers*.
- [4] Ming-Hung Chang, Wei-Hung Du, and Wei Hwang, "A 2kb built-in row-controlled dynamic voltage scaling near-/sub-threshold FIFO memory for WBANs," to be submitted to *IEEE Transactions on Circuits and Systems I: Regular Papers*.

### **CONFERENCE PUBLICATIONS**

- [1] Wei-Hung Du, Po-Tsang Huang, Ming-Hung Chang, and Wei Hwang, "A 2kb built-in row-controlled dynamic voltage scaling near-/sub-threshold FIFO memory for WBANs," in *IEEE International Symposium on VLSI Design, Automation, and Test*, Apr. 2012.
- [2] Wei-Hung Du, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "An energy-efficient 10T SRAM-based FIFO memory operating in near-/sub-threshold regions," in *IEEE System-on-Chip Conference*, Sep. 2011, pp. 19-23.
- [3] Ming-Hung Chang, Chung-Ying Hsieh, Mei-Wei Chen, and Wei Hwang, "Near-/sub-threshold DLL-based clock generator with PVT-aware locking range compensation," in *Int'l Symp. on Low Power Electronics and Design*, Aug. 2011, pp. 15-20.
- [4] Ming-Hung Chang, Yi-Te Chiu, Shu-Lin Lai, and Wei Hwang, "A 1kb 9T subthreshold SRAM with bit-interleaving scheme in 65nm CMOS," in *Int'l Symp. on Low Power Electronics and Design*, Aug. 2011, pp. 291-296.
- [5] Ming-Hung Chang, Chung-Ying Hsieh, Mei-Wei Chen, and Wei Hwang, "Logical effort models with voltage and temperature extension in super-/near-/sub-threshold regions," in

*IEEE International Symposium on VLSI Design, Automation, and Test*, Apr. 2011, pp. 213-216.

- [6] Ming-Hung Chang, Jung-Yi Wu, Wei-Chih Hsieh, Shang-Yuan Lin, You-Wei Liang, and Wei Hwang, "High efficiency power management system for solar energy harvesting applications," in *IEEE Asia Pacific Conf. on Circuits and Systems*, Dec. 2010, pp. 879-882.
- [7] Shi-Wen Chen, Ming-Hung Chang, Wei-Chih Hsieh, and Wei Hwang, "Fully on-chip temperature, process, and voltage sensors," in *IEEE Symp. Circuits and Systems*, May 2010, pp. 897-900.
- [8] Yi-Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "Subthreshold asynchronous FIFO memory for wireless body area networks (WBANs)," in *Int'l Symp.* on Medical Information and Communication Technology, Mar. 2010.
- [9] Yi-Ming Chang, Ming-Huang Chang, and Wei Hwang., "A 2.1-µW 0.3V-1.0V Wide-Locking Range Multiphase DLL Using Self-Estimated SAR Algorithm", in *IEEE System-on-Chip Conference*, Sep. 2009, pp. 115-118.
- [10] Ming-Hung Chang, Li-Pu Chuang, Yi-Ming Chang, and Wei Hwang, "A 300-mV 36-μW Multiphase Dual Digital Clock Output Generator with Self-Calibration," in *IEEE System-on-Chip Conference*, Sep. 2008, pp. 97-100.
- [11] Li-Pu Chuang, Ming-Hung Chang, Po-Tsang Huang, Chih-Hao Kan, and Wei Hwang, "A 5.2mW All-Digital Fast-Lock Self-Calibrated Multiphase Delay-Locked Loop," in *IEEE International Symposium on Circuits and Systems*, May 2008, pp. 3342-3345.
- [12] Hao-I Yang, Ming-Hung Chang, Tay-Jyi Lin, Shih-Hao Ou, Siang-Sen Deng, Chih-Wei Liu, and Wei Hwang, "A Controllable Low-Power Dual-Port Embedded SRAM for DSP Processor," in *IEEE Intl. Workshop Memory Technology, Design, and Testing*, Dec. 2007, pp. 27-30.
- [13] Chang-Hsuan Chang, Ming-Hung Chang, and Wei Hwang, "A Flexible Two-Layer External Memory Management for H.264/AVC Decoder," in *IEEE System-on-Chip Conference*, Sep. 2007, pp. 219-222.
- [14] Ming-Hung Chang, Zong-Xi Yang, and Wei Hwang, "A 1.9mW Portable ADPLL-Based Frequency Synthesizer for High Speed Clock Generation," in *IEEE Int'l Symp. on Circuits and Systems*, May 2007, pp.1137-1140.
- [15] Hao-I Yang, Ming-Hung Chang, Ssu-Yun Lai, Hsiang-Fei Wang, and, Wei Hwang, "A Low-Power Low-Swing Single-Ended Multi-Port SRAM," in *IEEE Int'l Symp. on VLSI Design, Automation, and Test*, Apr. 2007, pp. 28-31.
- [16] Ching-Yun Cheng, Ming-Hung Chang, and Wei Hwang, "Power-Gating Sense Amplifier of Low Power Pseudo SRAM," in *IEEE Int'l Symp. on VLSI Design, Automation, and Test*, Apr. 2007, pp.260-263.
- [17] Chia-Sheng Peng, Ming-Hung Chang, and Kuei-Ann Wen, "Early-Late Gate Receiving for Bluetooth Packet," in *IEEE Int'l Symp. on VLSI Technology, Systems, and*

Applications, Mar. 2001, pp. 57-60.

#### PATENTS

- [1] Po-Tsang Huang, Shu-Wei Chang, Ming-Hung Chang, and Wei Hwang, "內儲存無關項 之階層式搜尋線," TW Patent I321793, Mar. 2010.
- [2] Po-Tsang Huang, Shu-Wei Chang, Ming-Hung Chang, and Wei Hwang, "Stored Don't-Care Based Hierachical Search Line," US Patent 7,525,827, Apr. 2009.
- [3] Yi-Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "Dual-port subthreshold SRAM cell," US Application Pending(13/243,690), Sep. 2011.
- [4] Yi-Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "雙埠次臨界靜態隨機存 取記憶體單元," TW Application Pending(100119160), Jun. 2011.
- [5] Chung-Ying Hsieh, Ming-Hung Chang, and Wei Hwang, "A Programmable Clock Generator for Sub- and Near-Threshold DVFS System," US Application Pending(13/155,523), Jun. 2011.
- [6] Chung-Ying Hsieh, Ming-Hung Chang, and Wei Hwang, "Thermally Robust Buffered Clock Tree Using Logical Effort Compensation," US Application Pending(13/067,232), May 2011.
- [7] Yi-Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "Subthreshold SRAM cell," US Application Pending(13/096,796), Apr. 2011.
- [8] Chung-Ying Hsieh, Ming-Hung Chang, and Wei Hwang, "用於次臨界/近臨界動態電壓 與頻率調節系統之可程式化時脈產生器," TW Application Pending(100107526), Mar. 2011.
- [9] Yi-Te Chiu, Ming-Hung Chang, Hao-Yi Yang, and Wei Hwang, "次臨界靜態隨機存取記 憶體單元," TW Application Pending(100107824), Mar. 2011.
- [10] Shi-Wen Chen, Ming-Hung Chang, Wei-Chih Hsieh, and Wei Hwang, "Fully on-chip all digital process invariant temperature sensor," TW Application Pending, Jan. 2011.
- [11] Chung-Ying Hsieh, Ming-Hung Chang, and Wei Hwang, "使用邏輯努力補償之溫度強 健緩衝時脈樹," TW Application Pending(099146856), Dec. 2010.
- [12] Shi-Wen Chen, Ming-Hung Chang, Wei-Chih Hsieh, and Wei Hwang, "Fully on-chip temperature, process, and voltage sensors," US Application Pending(12/910,199), Oct. 2010.
- [13] Shi-Wen Chen, Ming-Hung Chang, Wei-Chih Hsieh, and Wei Hwang, "全晶上寬工作電 壓溫度製程電壓感測器," TW Application Pending(099129470), Sep. 2010.
- [14] Jung-Yi Wu, Ming-Hung Chang, Wei-Chih Hsieh, and Wei Hwang, "Fully on-chip fast pumping up to high voltage charge pump," US Application Pending(12/827,111), Jun. 2010.
- [15] Jung-Yi Wu, Ming-Hung Chang, Wei-Chih Hsieh, and Wei Hwang, "可全部整合至晶片 中的快速充電電荷幫浦," TW Application Pending(099106829), Mar. 2010.

[16] Li-Pu Chuang, Ming-Hung Chang, and Wei Hwang, "全數位快速鎖定自我校正相位延 遅鎖定電路," TW Patent Pending (097136541), Sep. 2008.

