## 國立交通大學

## 電控工程研究所

## 博士論文

應用於近臨界電壓晶片資料傳輸之 拔靴帶式電路技術 Bootstrapped Circuit Techniques for Near-threshold On-chip Data Link

研究生: 何盈杰

指導教授: 蘇朝琴 教授

中華民國一〇一年六月

### 應用於近臨界電壓晶片資料傳輸之 拔靴帶式電路技術

#### Bootstrapped Circuit Techniques for Near-threshold On-chip Data Link

研究生:何盈杰

Student : Ying-Chieh Ho

指導教授:蘇朝琴

Advisor: Chau-Chin Su

國 立 交 通 大 學 電控工程研究所 博 士 論 文



Submitted to Institute of Electrical Control Engineering College of Electrical Engineering National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in

Electrical Control Engineering

June 2012

Hsinchu, Taiwan, Republic of China

中華民國一〇一年六月

#### 應用於近臨界電壓晶片資料傳輸之

#### 拔靴帶式電路技術

研究生:何盈杰 指導教授:蘇朝琴教授

#### 國立交通大學電控工程研究所博士班

#### 摘要

近年來,「環保綠能、永續生存」是近年來各界發展的重點。對電子產品而言, 電池是能量的主要來源,延長電池的壽命可減少電池的消耗;另一方面,使用低功率 設計,讓電路能降低功率消耗並延長電池的壽命。根據 P = fCV<sup>2</sup> 的理論中,同時降 低操作電壓、減少電容負載的多重作用下,使得動態功率可達到好幾個冪次方 (Order) 的下降。為了達到低功率的效果,降低操作電壓是最直覺又有效的方法。甚至,有許 多研究是將電路操作在近臨界區(Near-threshold)附近或直接在次臨界區裡操作。奈米 技術已經廣泛地運用在低功耗的應用上,包括RF、Analog、AD/DA、與MPU等,功 率更低的還有生理信號檢測的相關設計。充分利用奈米技術中元件負載減小的特性, 以及次臨界區電流的極限。

然而近臨界電路的設計將元件操作在近臨界區,目的是大幅降低功耗,達到所謂 的效率能源(Energy-efficient)的特色。但是它有幾個主要的瓶頸:第一、操作速度慢, 多應用於生醫晶片或其它慢速的系統。第二、靜態漏電功率消耗的問題在近臨界區下 更顯得嚴重。第三、嚴重的製程漂移,影響著良率與量產成本。

在本論文裡,我們提出了近臨界電壓系統單晶片(System on Chip, SoC)上的資料 傳輸(Data link)電路設計。並提出一系列全新的靴帶式技術(Bootstrap technique),解決 近臨界區電路設計的問題。我們提出的靴帶式技術,主要概念是使電路可提供雙向的 升壓功能,所謂的雙向,是同時對 P 型跟 N 型元件作用,一邊大幅地增加驅動力, 一邊抑制靜態漏電。相較於傳統電路操作在近臨界區,可以有兩個 order 的改善。另 一個的優點就是靴帶式技術可以使在次臨界區操作電壓下的電路,操作在一般的三極 管區 (Triode region),使得電路模型更加精確。我們從電路的蒙地卡羅分析就可以清 楚地了解到製程漂移因此大幅減少。

我們一共呈現了四個相關的電路:(1)一個應用於時脈網路(clock network)裡,可 主動減少漏電流之靴帶式反相器。操作在 0.2V 時,即便是 1cm 晶片上連線的時脈樹, 能提供 10MHz 的穩定時脈,能加以抑制低電壓操作時嚴重的靜態漏電流。此外,本 設計使用閘極升壓(Gate Boosting)的概念,使大部分元件操作在導通區,大幅降低製 程漂移。(2)一個應用在晶片匯流排(on-chip bus)上,能有效抑制符號干擾(Inter-Symbol Interference, ISI)的靴帶式中繼器設計, $V_{DD} = 0.3V$ 時,單一個 channel 最高可以傳輸 100Mbps 的資料傳輸率 (使用 2<sup>10</sup>-1 PRBS),即便在  $V_{DD} = 0.1V$ 時,仍有 0.8Mbps 的 資料傳輸率。(3)接著,我們尋求最佳的有效能源設計,提出的高倍升壓的中繼器, 提供三倍與四倍升壓功能之預驅動器(Pre-driver)來提供最佳的有效能源設計,而不會 犧牲操作速度。我們應用在晶片匯流排中的中繼器,僅使用  $V_{DD} = 0.15V$ 的操作電壓, 最高可達到 5Mbps 的資料傳輸率,每位元的能源消耗僅有 35.2fJ。(4)最後,我們提 出了靴帶式振盪器(bootstrapped ring oscillator),並完成了一個可操作在近臨界電壓的 全數位鎖位迴路(All-digital PLL, ADPLL)。操作在 0.5V 時,這個 ADPLL 可提供 480MHz 的輸出頻率,僅有 78µW 的功率消耗,而在 0.25V 時,仍可提供 44.8MHz 輸出頻率,消耗 2.4µW 的功率。



#### Bootstrapped Circuit Techniques for Near-threshold On-chip Data Link

Student : Ying-Chieh Ho

Advisor : Chau-Chin Su

Institute of Electrical Control Engineering National Chiao Tung University

#### ABSTRACT

For the sustainable electronic devices, ultra-low power design is essential to prolong the battery lives. According to  $P = fCV^2$ , scaling the supply voltage down is the most effective way to reduce the power consumption. According to the forecast from the International Technology Roadmap for Semiconductors (ITRS), the supply voltage will be scaled to 0.5V for low-power applications within the next generation. Scaling the supply voltage near the threshold voltage is the most favorable solution for low-power designs. On the other hand, Nano-scaled devices exceed the limit of the speed in the near-threshold region based on small device loading. Nano-scaled process is broadly applied to ultra-low power designs, which includes RF, AD/DA, MPU, especially in biomedical applications. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case.

In order to achieve the feature of the energy-efficient operation, the designs are applied to work using near-threshold supply. However, near-threshold circuit design is definitely challenging because the driving capability ( $I_{on}$ ), which is limited to apply to slow system. Then, the static leakage power becomes severe, and decreases the  $I_{on}/I_{off}$  ratio. Moreover, process variations are degraded significantly, affecting the circuit performance, the power efficiency, and the fabrication yield.

In this dissertation, we propose circuit designs on-chip data link system using near-threshold supply. In order to improve the design issues in the near-threshold region, we have developed several bootstrapped circuits. The main contribution of the proposed bootstrapped techniques is to boost the gate voltage at the both sides, which means to boost the gate voltage of the PMOS and NMOS at the same time. The proposed circuit is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current. While the circuit is operated in sub-threshold region, two-order improvement is achieved. In addition, the bootstrapped circuits are operated in triode region with the near-threshold supply. Consequently, that explain why the process variation affects the proposed design scheme to a lesser extent. We can verify it with simulations of Monte Carlo analysis.

Four build blocks using bootstrapped circuits in on-chip data link have been proposed. The first one is a bootstrapped CMOS inverter applied to on-chip clock network. In addition to improving the driving ability, a large gate voltage swing from  $-V_{DD}$  to  $2V_{DD}$ suppresses the sub-threshold leakage current. The test chip is able to achieve 10MHz operation under 200mV  $V_{DD}$ ; the power consumption is 1.01µW. The Monte Carlo analysis results indicate that a sigma of delay time is only 2.9ns at 0.2V operation. Then, an ISI-suppressed bootstrapped repeater applied to on-chip bus is proposed. The bootstrapped CMOS repeaters are inserted to drive a 10mm on-chip bus. Additionally, a precharge enhancement scheme increases the speed of the data transmission, and a leakage current reduction technique suppresses ISI jitter. The measured results demonstrate that for a 10-mm on-chip bus, it can achieve 100Mbps data rate at 0.3V, and even 0.8 Mbps at 0.1V. The third section investigates the performance of the interconnects with repeater insertion in the sub-threshold region. A 3X CMOS pre-driver and a 4X one are proposed to enhance the driving capability. As compared to the conventional repeater, the proposed ones have higher energy efficiency. The measured results show that the 3X (4X) pre-drivers can achieve 5Mbps (1.5Mbps) data rate at 0.15V with an efficiency of 35.2fJ (32.8fJ). The last section, we present a near-threshold supply ADPLL with bootstrapped digitally-controlled ring oscillator (BDCO) that allows an ADPLL to operate with a near-threshold supply. The BDCO is composed of a bootstrapped ring oscillator (BTRO) and a weighted thermometer-controlled resistance network (WTRN). The proposed bootstrapped delay cell generates large gate voltage swing to improve the driving capability significantly. The boosted output swing keeps the transistors operated in the linear region to provide high linearity of the output frequency as function of  $V_{DD}$  even using a near-threshold supply. According to the transferring character of the BTRO, WTRN provides linear control while sweeping the supply voltage. The proposed ADPLL oscillates from 36.8 to 480MHz with a power consumption of 2.4-78µW under a supply voltage of 0.25-0.5V.

#### 誌謝

光陰似箭,歲明如梭,一轉眼離開業界回到學校進修的日子已經六年了。兩千多 個日子一晃眼就過去,而在腦海中留下的是深刻的感動。這一路走來挫折不斷,挑戰 也是一波一波接著來。曾幾何時,我幾度懷疑自己能否完成這個學業,但是此時此刻, 我完成了生涯規畫中重大的階段。

一路上有許多人相助與陪伴,才能造就今天的我。除了謝天之外,該感謝的人, 真的是太多了。打從心底知道,即便是缺少了一個貴人,就只一個,我的學位可能就 不會完成!在未來的日子裡,我會繼續創造我的未來與價值,但在這之前,我謝謝所 有身邊曾經陪伴我,鼓勵我、提攜我的各位!

感謝我的指導老師 蘇朝琴教授多年來的教導,老師無論是在學術研究上縝密嚴 謹的思考方式,抑或是為人處事上圓融包容,都讓學生獲益良多。在這幾年,我一改 以往的學習態度,不再以強記的方式為學,而是敞開心胸用謙卑的心與想像力,以熱 情來迎接無止盡的學海,也因此收穫斐然。

感謝我碩士班的指導教授 吴安宇教授,雖然時空的因素沒能繼續待在您的門 下,但是每次見面時,您總是不忘提點學生在專注研究之餘,需注意未來的規劃與寫 作的技巧,學生謹記在心。感謝 洪浩喬教授,您是我的益師益友,謝謝您除了在課 堂上的教導外,分享了這麼多您在學術研究上的經驗。當學生在茫茫的學術海中亂衝 時,有一位前輩在旁提點,讓我充滿著信心。感謝 莊景德教授,以及 李鎮宜教授在 計畫中提供的晶片面積與下線的機會。缺少了這些晶片,我們的想法就只是一場空 談,更不會有這些論文的產出。感謝 周世傑教授在法國巴黎參加研討會時,帶著學 生認識世界各地的學者,增廣個人視野。

此外,也感謝曾煜輝博士與徐仁乾博士的同袍之情,我永遠不會忘記這些一起努 力的日子,希望大家這段辛勤耕耘,未來都會有所收穫。感謝小馬在 On-Chip Bus 的 研究上鳴了第一槍,更謝謝這篇論文的其它共同作者:家齊以及于昇,很榮幸跟兩位 在這個主題上一同討論、成長,現在全世界都看到我們的成果了。謝謝在 918 這個大 家庭中一起生活的朋友:丸子、教主、楙軒、小潘潘、方董,以及其它這六七年來所 有的學弟妹,謝謝大家的協助與包容。也要謝謝這些年來,與我們一同在計畫中奮鬥 的助理們:雅雯、上容、俊秀、豐文、伉佑、美玲。

還有其它研究群的朋友們,李淑敏教授、蕭志龍教授、盧台祐博士、楊皓義博士、 杜明賢博士、胡璧合博士、蔡玉章博士、陳嘉怡博士、范銘隆博士、洪紹峰博士、許 書餘博士以及劉小胖、致煌、勖哲、柏鈞、瑋庭等各位學弟,感謝大家適時地伸出援 手,讓我的研究更為順遂。

最後,感謝我最愛的家人,我的父母、姊姊與哥哥,你們給予盈杰的栽培與殷殷 期盼,盈杰無以回報。謝謝我的妻子,佳慧,有了妳的支持我才夠無後顧地衝刺學業, 沒有妳的愛就沒有我的博士學位。而我的寶貝女兒苡瑄,把拔也要謝謝妳,因為有妳, 把拔對自己的未來更有勇氣;有了妳,把拔的人生更有意義。

謹獻給我的家人。



何盈杰

于交大電資 303

2012/6/27

## **Table of Contents**

| 摘要       | iii                                                     |    |
|----------|---------------------------------------------------------|----|
| ABSTR    | ACT                                                     | V  |
| Table of | Contents                                                | ix |
| Chapter  | 1 Introduction                                          | 1  |
| 1.1.     | Challenges in Nano-Scaled Near-threshold Design         | 1  |
| 1.2.     | Near-threshold On-chip Data Link                        | 2  |
| 1.3.     | Organization of the Dissertation                        | 3  |
| Chapter  | 2 Background Review                                     | 4  |
| 2.1.     | Effects in Nano-scaled Process [6]                      | 4  |
| 2.1.1.   | Short-Channel Effect                                    | 4  |
| 2.1.2.   | Narrow-Width Effect                                     | 5  |
| 2.1.3.   | Sub-threshold Leakage [6, 9]                            | 6  |
| 2.1.4.   | Drain-Induced Barrier Lowering [6]                      | 6  |
| 2.1.5.   | Gate-Induced Drain Leakage [6, 10]                      | 7  |
| 2.1.6.   | Gate Leakage [11]                                       | 7  |
| 2.2.     | Challenges in Ultra Low-voltage Designs                 | 8  |
| 2.2.1.   | Degradation of Driving Capability                       | 8  |
| 2.2.2.   | Leakage Power and Ion-to-Ioff Ratio [8, 12]             | 8  |
| 2.2.3.   | Process, Voltage and Temperature Variation              | 9  |
| 2.3.     | Low-voltage Design Techniques                           | 10 |
| 2.3.1.   | Bootstrap Techniques                                    | 10 |
| 2.3.2.   | Dynamic Voltage and Frequency Scaling                   | 12 |
| 2.3.3.   | Multi-threshold MOS Control                             | 13 |
| 2.3.4.   | Bulk-driven Technique                                   | 13 |
| 2.4.     | Summary                                                 | 13 |
| Chapter  | 3 Near-threshold Clock Network                          | 15 |
| 3.1.     | Overview of On-chip Interconnect                        | 16 |
| 3.1.1.   | RC-Interconnect with repeater insertion                 | 16 |
| 3.1.2.   | Time constant, power dissipation and FoM                | 17 |
| 3.2.     | Proposed Active Leakage Reduction Bootstrapped Inverter | 18 |
| 3.3.     | Detail Evaluation and Discussion                        | 20 |
| 3.3.1.   | Boosting Efficiency                                     | 21 |
| 3.3.2.   | Reduction of Leakage Current                            | 22 |
| 3.3.3.   | Delay Time Analysis                                     | 25 |
| 3.3.4.   | Delay Time Analysis of Process Variation                | 27 |
| 3.4.     | Implementation and Experimental Results                 |    |

| 3.4.1.                                           | Implementation of the Bootstrap Capacitor                                                                                                          |    |
|--------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.4.2.                                           | Chip Implementation and Measurement                                                                                                                |    |
| 3.5.                                             | Summary                                                                                                                                            |    |
| Chapte                                           | r 4 Near-threshold On-chip Bus                                                                                                                     |    |
| 4.1.                                             | Proposed On-chip Bus Architecture                                                                                                                  |    |
| 4.2.                                             | ISI-suppressed Bootstrapped Driver                                                                                                                 |    |
| 4.3.                                             | Detailed Evaluation and Comparisons                                                                                                                |    |
| 4.3.1.                                           | Boosting Efficiency                                                                                                                                |    |
| 4.3.2.                                           | Leakage Current Reduction                                                                                                                          |    |
| 4.3.3.                                           | Leakage Power Analysis                                                                                                                             |    |
| 4.3.4.                                           | ISI Suppression                                                                                                                                    |    |
| 4.3.5.                                           | Energy Efficiency                                                                                                                                  |    |
| 4.3.6.                                           | Monte Carlo Simulations                                                                                                                            |    |
| 4.4.                                             | Experimental Setup and Measurement                                                                                                                 |    |
| 4.4.1.                                           | Chip implementation                                                                                                                                |    |
| 4.4.2.                                           | Measured Waveforms                                                                                                                                 |    |
| 4.4.3.                                           | Leakage Power Measurement                                                                                                                          |    |
| 4.5.                                             | Summary                                                                                                                                            |    |
| Chapte                                           | r 5 High-boosting Pre-driver                                                                                                                       |    |
| 5.1.                                             | Proposed High-boosting Pre-driver                                                                                                                  |    |
| 5.2.                                             | High-boosting Pre-driver in Long Interconnects                                                                                                     |    |
| 5.2.1.                                           | Leakage Current Reduction                                                                                                                          |    |
| 5.2.2.                                           | Energy Efficiency                                                                                                                                  |    |
| 5.2.3.                                           | Boosting Efficiency                                                                                                                                |    |
| 5.2.4.                                           | Monte Carlo Simulations                                                                                                                            |    |
| 5.3.                                             | Experiment and Measurement Results                                                                                                                 |    |
| 5.3.1.                                           | Chip implementation                                                                                                                                |    |
| 5.3.2.                                           | Measured Waveforms                                                                                                                                 |    |
| 5.4.                                             | Summary                                                                                                                                            |    |
| Chapte                                           | r 6 Near-threshold ADPLL                                                                                                                           | 70 |
| 6.1.                                             | Analytic strugg of Drag agod All Disited DI I                                                                                                      | 71 |
| 611                                              | Architecture of Proposed All-Digital PLL                                                                                                           |    |
| 0.1.1.                                           | PFD, PS and TDC                                                                                                                                    |    |
| 6.1.2.                                           | PFD, PS and TDC<br>DLF                                                                                                                             |    |
| 6.1.2.<br>6.1.3.                                 | PFD, PS and TDC<br>DLF<br>Bootstrapped Digitally-Controlled Oscillator                                                                             |    |
| 6.1.2.<br>6.1.3.<br>6.1.3.1                      | PFD, PS and TDC<br>DLF<br>Bootstrapped Digitally-Controlled Oscillator                                                                             |    |
| 6.1.2.<br>6.1.3.<br>6.1.3.1<br>6.1.3.2           | PFD, PS and TDC<br>DLF<br>Bootstrapped Digitally-Controlled Oscillator<br>Bootstrapped Ring Oscillator<br>Weighted-Thermometer Code Control        |    |
| 6.1.2.<br>6.1.3.<br>6.1.3.1<br>6.1.3.2<br>6.1.4. | PFD, PS and TDC<br>DLF<br>Bootstrapped Digitally-Controlled Oscillator<br>Bootstrapped Ring Oscillator<br>Weighted-Thermometer Code Control<br>SDM |    |

| 6.2.1.     | Power Analysis of BTRO               | 77 |
|------------|--------------------------------------|----|
| 6.2.2.     | Linearity Analysis of BTRO           | 78 |
| 6.3.       | Experimental Results and Comparisons | 81 |
| 6.3.1.     | Chip Implementation                  | 81 |
| 6.3.2.     | Measured Results                     | 83 |
| 6.3.3.     | Comparisons                          | 86 |
| 6.4.       | Conclusions                          | 87 |
| Chapter    | 7 Conclusions                        |    |
| References |                                      | 98 |
| VITA       |                                      | 98 |
| Publicati  | on List                              | 98 |
|            |                                      |    |



# Chapter 1 Introduction

In the past few years, low voltage and low power designs have attracted significant attentions because of the popularity of portable devices. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case. According to  $P=fCV^2$ , scaling the supply voltage near the threshold voltage is the most favorable solution for low-power designs. A 180mV, 1024-point FFT processor is a pioneer sub-threshold supply design [1], and followed by [2]. Sub-threshold SRAM is another important category [3]. Other designs include a 6-bit Flash ADC for use at 0.2–0.9V and a 14-tap 8-bit finite impulse response (FIR) at 20MHz under 0.27V [4-5].

"Sustainability" is the theme of the ASSCC 2011 and ISSCC 2012. They focused on the design techniques of energy-efficient and low-voltage circuits and of improving battery lifetime. A panel discussion about 0.5V system is held as well during ASSCC 2011, which pointed out the challenges of this new trend. However, energy-efficient designs under a low-voltage supply usually have speed degradation. A new circuit design strategy should perform good trade-off between energy efficiency and speed. In addition, the nano-scaled effects,  $I_{on}/I_{off}$  ratio, and process variations are degraded significantly, affecting the circuit performance, the power efficiency (leakage power), and the fabrication yield.

#### **1.1.Challenges in Nano-Scaled Near-threshold Design**

As technology continues to be scaled down, the performance of nano-scaled devices are influenced by many reasons, such as threshold voltage, channel physical dimensions, doping concentration, gate oxide thickness, and supply voltage. Due to the fluctuation of these factors, *short-channel effect* (SCE), *narrow-width effect*, *drain-induced barrier lowering* (DIBL), *gate-induced drain leakage* (GIDL), and gate leakage are incurred. These effects become a critical bottleneck for the trade-off among speed, power and cost requirements.

Near-threshold circuit design is affected significantly because of the degradation of the driving capability, the  $I_{on}/I_{off}$  ratio, and variations. Although circuits down to the near-threshold supply can achieve ultra-low power consumption, the driving capability of CMOS devices require a large area to compensate for driving efficiency. A conventional CMOS circuit also

incurs a severe  $I_{off}$  problem in the nano-meter process. In addition, the near-threshold circuit suffers serious process, voltage and temperature (PVT) variations, which could be even several times variations.

#### **1.2.** Near-threshold On-chip Data Link

Fig. 1-1 shows a block diagram of on-chip data link system. According to different system requirement, serializer/de-erializer might be needed. Apart from serializer/de-erializer, the on-chip bus and local oscillator are the most important macros in the system.

On-chip interconnects becomes a bottleneck with respect to speed, power, cost and noise while the technology scaling to nano-meter. Among the on-chip bus design categories, repeater insertion is a popular method for interconnects. In this dissertation, we discuss challenges and design issues for a near-threshold clock buffer and a nano-scaled near-threshold data link circuit. In order to solve these problems, we have proposed a new on-chip clock network and data bus with several bootstrapped techniques.



Fig. 1-1 Basic function blocks of on-chip data link.

*Phase-locked loops* (PLLs) often play an important role to serve as a local oscillator. In this dissertation, we develop a *bootstrapped ring oscillator* (BTRO), which can operate at 0.2-0.6V supply voltage. Owing to the bootstrapped technique, it achieves high linearity as a function of voltage supply. Based on this feature, a new ADPLL with BTRO is proposed as well. It can achieve 480MHz with only consuming 78  $\mu$ W.

#### **1.3. Organization of the Dissertation**

The rests of the dissertation are organized as follows. Section II reviews the backgrounds of this dissertation. First, several effects of the nano-scaled devices are introduced. Challenges in low-voltage circuit design are discussed as well. Moreover, some reported low-voltage techniques are reviewed. Section III introduces the repeated-RC on-chip interconnect architecture. A bootstrapped inverter applied to a 0.2V clock network is developed. It also features an active leakage current reduction technique to save leakage power. Section IV introduces a low-voltage on-chip bus with an ISI-suppressed bootstrapped repeater. In order to achieve high energy-efficiency, Section V introduces high-boosting bootstrapped repeaters. In Section VI, we present a near-threshold ADPLL using a bootstrapped digitally-controlled oscillator (DCO). Finally, Section VII draws conclusions and future works.



# Chapter 2 Background Review

In the past few decades, the scaling of CMOS technologies has been the major driving force of the trend of Moore's Law. As scaling to nanometer technology, the process parameters are no longer scaled to a single scaling factor because the saturation of carrier velocity and the increasing sub-threshold leakage current become serious. With the continuing shrinking of the channel length and the gate-oxide thickness, some non-ideal effects appear to affect circuits. Additionally, lowering the supply of nano-scaled designs to the near-threshold region has several detrimental impacts. In this chapter, the effects in nano-scaled near-threshold design are briefly reviewed. Subsequently, popular low-voltage design techniques shall be introduced as well.

#### 2.1. Effects in Nano-scaled Process [6]

#### 2.1.1. Short-Channel Effect

The *short-channel effect* (SCE) is occurred on a MOSFET device in which channel length is as the same order of magnitude as the depletion-layer widths of the source and drain junction. The SCE is often modeled of charge sharing, where the source and drain depletion regions store the charge under the gate. The threshold voltage  $V_{th}$  of a MOSFET can be represented using depletion approximation as

$$V_{th} = V_{fb} + 2\Phi_f + \frac{Q_B}{C_{OX}}$$
(2.1)

where  $V_{fb}$  is the flat-band voltage;  $\Phi_f$  is the Fermi potential;  $Q_B$  is the charge of channel; and  $C_{OX}$  is the oxide capacitance. While channel length is shrunk, the stored charges are reduced significantly in the doped area. As a result, threshold voltage is increased due to increasing channel length.



Fig. 2-1. Threshold voltage with change in channel length due to SCE [6].

Halo doping, which is a non-uniform channel doping in modern processes to adjust threshold voltage is so-called *reverse short-channel effect* (RSCE). The increasing of threshold voltage comes from extra doping charges near the source and drain regions. As the device's length is reduced, the threshold voltage of the device increases. The behavior is the opposite of what is expected from the SCE [7-8].

#### 2.1.2. Narrow-Width Effect

The *narrow-width effect* (NWE) occurs when the threshold voltage  $V_{th}$  of a nano-scaled MOSFET is modulated by the gate width. Hence the device width modulates the drain current. According to the Eq.(2.1), there are two main reasons to cause NWE. First, the charge in the gate-induced depletion region results an increase of threshold voltage. The second on is that channel doping is higher along the width dimension. Because dopants trespass under the gate, higher voltage is necessary to incur the channel inversion. Fig. 2-2 shows the NWE as a function of channel width.



Fig. 2-2. Threshold voltage with change in channel width due to NWE.

#### 2.1.3. Sub-threshold Leakage [6, 9]

In a nano-scaled device, the sub-threshold (or weak inversion conduction) current  $I_{sub}$  is happened with gate-source voltage below the threshold voltage  $V_{th}$ . It can be expressed as in Eq.(2.2).

$$I_{sub} = \mu C_{dep} \frac{W}{L} V_T^2 \exp\left(\frac{V_{GS} - V_{th}}{nV_T}\right) \left(1 - \exp\left(\frac{-V_{DS}}{V_T}\right)\right).$$
(2.2)

Where  $\mu$  is the effective mobility;  $C_{dep}$  is the depletion capacitance; W and L are the width and length of the device;  $V_T$  is the thermal voltage;  $V_{GS}$  is the gate-to-source voltage; n is the sub-threshold slope factor, and  $V_{DS}$  is the drain-to-source voltage.

As compared to the strong inversion region, the sub-threshold current is dominated by the diffusion current instead. The movement by the diffusion is likely to charge flowing in BJTs. However, sub-threshold current is affected by other phenomenon, such as drain-induced barrier lowering (DIBL) and gate-induced drain leakage (GIDL). They are introduced in the following sections.



#### 2.1.4. Drain-Induced Barrier Lowering [6]

Fig. 2-3. Drain current of a NMOS device vs.  $V_G$  in the near-threshold region.

In micron-scaled devices, the source and drain are separated far enough that no effect is incurred on the depletion regions. In such a case, the drain current is nearly independent of the channel length and drain bias. At the off conditions, the potential barrier between the source and drain prevents electrons from flowing to the drain. In a short-channel device, the  $V_{th}$  varies with channel length according to the SCE. In addition, DIBL effect induces energy barrier lowering with increasing drain voltage [6]. When a short-channel device uses a higher drain voltage, the energy barrier decreases lower, resulting in further increasing the drain current. Fig. 2-3 depicts  $I_D$  as a function of  $V_G$ , which illustrates DIBL effect as the drain voltage increases. As shown in Fig. 2-1, DIBL effect lowers the threshold voltage, but remains the slope in the near-threshold region.

#### 2.1.5. Gate-Induced Drain Leakage [6, 10]

Gate-induced drain leakage (GIDL) occurs in the drain junction owing to high field effect in the drain junction of an MOSFET. It usually happens when the electric field in or around the gated PN junction becomes more substantial with the applied gate voltage. The high-field effects, like avalanche multiplication and band-to-band tunneling (BTBT), become severely. Thus, the leakage current of a reverse-biased gated diode may increase dramatically when the negative gate voltage begins to cause field crowding and peak field. In order to suppress GIDL, thicker oxide and lower electric field might be used. Besides, very high drain doping is considerable for minimizing GIDL as well. Figure 2-3 also shows the GIDL according to drain current characters of a NMOS device with different drain voltage.

#### **2.1.6.** Gate Leakage [11]

In nanometer technology, the process parameters as the gate oxide layer thickness  $T_{OX}$  has been scaled to the values in the range of 12–22Å. As mentioned, DIBL also incurs in the presence of large gate tunneling leakage current  $I_{gate}$ .  $I_{gate}$  increases due to the finite probability of an electron tunneling through the SiO<sub>2</sub> layer directly. The probability is a strong exponential function of  $T_{OX}$ . Only a difference of 2Å  $T_{OX}$  thinner may increase an order of magnitude. Therefore, it becomes the most sensitive parameter with respect to any physical dimensions. Typically,  $I_{gate}$  is much smaller than sub-threshold leakage current  $I_{sub}$ , while  $T_{OX}$  is large than 20Å. In simulation level, BSIM4 model (level =54) includes nano-scaled effects such as GIDL and DIBL. In addition,  $I_{gate}$  has taken into account as well. For fast simulation and reliable purposes some models of gate leakage current are reported.

#### 2.2. Challenges in Ultra Low-voltage Designs

#### 2.2.1. Degradation of Driving Capability

When a MOSFET device is operated in the super- $V_{th}$  region, the drain current operated in the saturation region is a function of the gate voltage. It can be represented as Eq.(2.3).

$$I_{D,Sat} = \mu C_{ox} \frac{W}{L} (V_{GS} - V_{th})^2 (1 + \lambda V_{DS}).$$
(2.3)

Where  $C_{ox}$  is the gate oxide capacitance per unit area; and  $\lambda$  is the factor for channel-length modulation. According to Eq.(2.3), drain current  $I_{D,Sat}$  decreases quadratically when the gate voltage goes lowering. When the gate voltage keeps going lower into the sub-threshold region, the drain current starts to decrease exponentially, as shown in Eq.(2.2). That is to say, when our design is operated in near-threshold region, poor driving is the first design issue. In normal 1V designs, sizing is a way that we often use to increasing driving. However, gate capacitance of a MOS device drops very slightly when the gate drive lowers to nearly threshold voltage. As a result, enlarging device size to enhance driving capability seems not a good idea in the near-threshold region.

## 2.2.2. Leakage Power and Ion-to-Ioff Ratio [8, 12]

 $I_{on}$ -to- $I_{off}$  ratio becomes a critical factor in near-threshold digital circuits and near-threshold circuits. The inherently small  $I_{on}$ -to- $I_{off}$  ratio dominates how many transistors can be connected per node. As reported in [12], the degradation in  $I_{on}$ -to- $I_{off}$  is from approximately 10<sup>7</sup> to 10<sup>4</sup> and it implies that there is a strong interaction between the ON and the OFF devices in sub-threshold region when it comes to setting the voltage level of critical signals. Unfortunately, this causes a relevant failure mechanism in circuit operation. As illustrated in Fig. 2-4, an inverter is served as a driver with a capacitive load of 200 fF while  $V_{DD}$  is being swept from 0.1–0.3V. The circuit is operated to the limit of the speed. Obviously, the leakage power becomes a greater portion of the total power consumption while  $V_{DD}$  keeps going lower.



Fig. 2-4. Leakage power on a repeater at subthreshold supply.

#### 2.2.3. Process, Voltage and Temperature Variation

*Process, voltage and temperature* (PVT) corners induced performance variation makes the circuits design in near-threshold region tremendously challenging. First of all, process variability affects current due to some process parameters, such as mobility and threshold voltage. Even a small variation may lead to exponentially mismatch. The process variation is divided into two major categories [13]. Besides, it is classified into more specific categories, according to their physical range on a wafer or on a die [14]. Fig. 2-5 depicts  $I_D$  as a function of gate voltage in the near-threshold region, which illustrates process and voltage effect at room temperature. It shows that the variation of  $I_D$  becomes worse due to the process and voltage fluctuation as the supply voltage goes lower.

Apart from the static term of the process variation after a fabricated die, voltage supply variation is related to the fluctuations during the circuits operations. Real-time fluctuations caused by a voltage drop or inductance effect in wire may result in function failure [14-15]. The impact of temperature is another important factor to the variation and reliability in a nano-scaled chip, especially the supply voltage down to the near-threshold region. The sub-threshold current is highly depending on the temperature owing to the parameter  $V_T$ . In contrast to the current in the super-threshold region,  $I_D$  increases as the temperature is raised. The measured temperature sensitivity of the threshold voltage is about 0.8 mV/°C [6].



Fig. 2-5. Drain current in different corners in the near-threshold region.

#### 2.3. Low-voltage Design Techniques

As mentioned, circuit design in the near-threshold region has many challenges. Several techniques have been reported to solve the problems or improve energy efficiency. They are briefly reviewed in the following sections.

#### 2.3.1. Bootstrap Techniques

Bootstrapping is an effective means of enhancing the speed in order to raise the driving efficiency. Therefore, a previous work has developed a bootstrapped CMOS driver for large capacitive loads, shown if Fig. 2-6 [16]. According to [16], the bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. The gate voltages of PMOS and NMOS driver transistors are kept  $V_{DD}$  and 0 in the cut-off phase. In the driving phase, the gate voltages are fed  $-V_{DD}$  and  $2V_{DD}$  to increase the current density. When the input V<sub>in</sub> is at 0 V, the V<sub>a</sub> is at  $V_{DD}$  and the output of the inverter is at  $V_{DD}$ . Moreover, M<sub>N2</sub> and M<sub>N1b</sub> are off; M<sub>P2</sub> and M<sub>P1b</sub> are on. Therefore, V<sub>2P</sub> is pre-charge to 0 V by M<sub>N2b</sub>, and bootstrap capacitor C<sub>bp</sub> stores a potential of  $V_{DD}$ . When the V<sub>in</sub> transits from 0 V to  $V_{DD}$  (from L to H), V<sub>2P</sub> is boosted from 0 V to  $-V_{DD}$ . Then, the potential of a  $-V_{DD}$  is passed from V<sub>2P</sub> to V<sub>1P</sub>. Consequently, the potential of a  $-V_{DD}$  is at the gate of the driver M<sub>P2</sub>, which drives V<sub>out</sub> by  $V_{SG} = 2V_{DD}$ . As V<sub>in</sub> transits from H to L, a similar mechanism pushes V<sub>1N</sub> to  $-V_{DD}$ .



Fig.2-6 Reported bootstrapped driver in [16].

The driver in [16] successful enhances the driving capability by boosting the gate voltage, which is suitable using in the near-threshold supply as well. However, there are several drawbacks such as reverse leakage current or non-ideal transient edge. Some researchers have proposed some improvements based on [16]. Among them, Kil *et al.* proposed a sub-threshold bootstrapped repeater in a 9MHz distributed clock network at 0.4V [17]. The sub-threshold bootstrapped repeater is depicted in Fig. 2-7, which is composed of two bootstrap circuits. One is for pre-boosting, and the other is for driving. The circuit of per-boosting enhances the pre-charge current to increase the speed. In addition,  $M_{PS2}$  and  $M_{NS2}$  are switches that can feed the boosted signal back to eliminate the reverse current. However, while this approach is applied to a data link, the kick-back disturbance through the boosting capacitors causes a large timing jitter. Furthermore, it consumes large static power and is associated with high capacitor costs.



#### 2.3.2. Dynamic Voltage and Frequency Scaling

*Dynamic Voltage and Frequency Scaling* (DVFS) is a popular power saving scheme since it is broadly used in microprocessor and DSP ASICs [18]. Since different functions need different execution times, supply voltage or the data rate can be dynamically changed to meet the specification requirements in DVFS system; hence, the power consumption can be optimized for the computational tasks conditionally.

On the other hand, DVFS scheme also applied to lower the operating frequency in portable products when battery goes low. DVFS is able to keep system working on basic functions in order to extend the battery lifetime or stand-by time. DVFS scheme is applied to adjust PVT variation as well [19]. In fact, such designs often remain large redundant margin in particle chip. DVFS determines the supply voltage or the frequency for the task appropriately and dynamically and therefore exceeds most power efficient.

*Critical Path Monitors* (CPMs) [18, 20-21] a sub-module of these worst-case margins by using a delay-chain which is replica of the critical path of the actual design. The propagation delay through this replica-path is monitored and voltage and frequency are scaled until the replica-path just meets timing. The replica-path tracks the critical-path delay across inter-die

process variations and global fluctuations on supply voltage and temperature, thereby eliminating margins due to global PVT variations.

#### 2.3.3. Multi-threshold MOS Control

Since the circuits operate in the near-threshold region, lowering the supply voltage decreases  $I_D$  according to equations (2.2) and (2.3). It results in a drastic rising in gate delay time. In order to overcome the speed degradation problem, one way is to reduce the  $V_{th}$  of a MOSFET device [22-23]. As  $V_{th}$  is reduced, however, another significant problem incurs. A rapid increase in stand-by current due to changes in the sub-threshold leakage current damages the power performance. To save stand-by power during the sleeping mode, a power management scheme combined small embedded processor and multi-threshold sleep control is reported in [24]. It utilizes high  $V_{th}$  MOSFET devices, resulting in low standby and dynamic power.

#### 2.3.4. Bulk-driven Technique

Similar to multi-threshold MOS control, the bulk-driven technique is using circuit techniques to shift  $V_{th}$  lower or higher by biasing bulk voltage. Sometime, the bulk-driven technique is called "adaptive body-biasing" as well [25]. Some contributed works based on the bulk-driven technique are reported in [26-27]. The threshold voltage can be expressed as in Eq.(2.4) [28].

$$V_{th} = V_{th0} - \gamma \left[ \sqrt{2\phi_F - V_{SB}} - \sqrt{2\phi_F} \right].$$
(2.4)

It is the well-known equation relating how the body voltage affects the threshold voltage, where  $\gamma$  is the body effect coefficient. The bulk-driven technique has several important features. To enhance the driving capability by modulate the  $V_{th}$  is the obvious one. The most important feature is that it can allow zero, negative, and even small positive bias voltages to achieve the desired DC currents such that it has a good alternative to increase the input common-mode voltage range. In normal circuit design, the bulk terminals of PMOS (NMOS) is always connected to the highest (lowest) potential to avoid the latch-up problem from junction forward biasing of the bulk–source.

#### 2.4. Summary

In this chapter, several backgrounds of the dissertation have been briefly reviewed. Since some non-ideal effects owing to the shrinking of the channel length and the gate-oxide thickness, current variation caused by environment makes circuit designs more challenging. Additionally, nano-scaled circuits design using near-threshold supply has several detrimental impacts. Trade-off between performance and energy efficiency should be carefully dealt with. Last part of this chapter, some popular low-voltage design techniques have been introduced as well. Based on the concept of the bootstrap technique, we will develop several bootstrap circuits in the following chapters.



# **Chapter 3** Near-threshold Clock Network

A driver with strong driving current and little skew is needed in a clock network. According to Fig. 3-1(a), the conventional bootstrapped driver consists of a pull-up and pull-down control pair to drive the PMOS and NMOS transistors, respectively. As mentioned in chapter 2, the gate voltages of PMOS and NMOS driver transistors are kept  $V_{DD}$  and 0 in the cut-off phase; they are fed  $-V_{DD}$  and  $2V_{DD}$  to increase the current density in the driving phase. Despite a previous effort [35] to increase the boosting efficiency by rearranging the timing of the switching and boosting signals, reverse leakage current remains the main drawback of conventional bootstrapped drivers. Among other bootstrapped circuits, single capacitor ones reduce the costs of hardware overhead [36-37]. However, their complex circuitry design seriously degrades charge sharing at the capacitor node. Moreover, the leakage current is problematic as well.



Fig. 3-1.(a) Conventional bootstrapped circuit (b) Proposed bootstrapped circuit.

In this chapter, we present a sub-threshold clock network with a bootstrapped CMOS inverter operated at sub-threshold power supply. The bootstrapped CMOS inverter is introduced to achieve high boosting efficiency and improve the speed. It is applicable in both increasing driving ability by boosting signals into super-threshold region and reducing the leakage current as well. Fig. 3-1(b) illustrates the circuit diagram. Theoretically, the PN bootstrap circuit produces an output swing of  $-V_{DD}$  to  $2V_{DD}$ .  $2V_{DD}$  ( $-V_{DD}$ ) enhances the driving capability of NMOS (PMOS) driver and suppresses the leakage for the PMOS (NMOS). The PN bootstrap circuit provides  $V_{SG}$  ( $V_{GS}$ ) =  $2V_{DD}$  and turns on the PMOS (NMOS) driver. In contrast, a

negative  $V_{SG}$  ( $V_{GS}$ ) = - $V_{DD}$  suppresses leakage current while the PMOS (NMOS) driver is turned off. Moreover, as compared to other previous works, the proposed design scheme has fewer devices in the sub-threshold region. Consequently, that explain why the process variation affects the proposed design scheme to a lesser extent.

#### **3.1. Overview of On-chip Interconnect**

Before introducing the proposed bootstrapped CMOS inverter, the fundamental of interconnect is briefly reviewed. First of all, interconnect and repeater linear model is adopted according to VLSI parameters scaling in this section. In addition, the definitions of speed and power consumption of the on-chip interconnect circuits are described. All these parameters introduced from linear models to define *figure of merit* (FoM), the index for optimal global on-chip interconnect design.

#### 3.1.1. RC-Interconnect with Repeater Insertion



Fig. 3-2. Cross section of interconnect configurations.

In general, a global interconnect is assumed to be placed between two adjacent orthogonal metal layers and two coplanar wires, as shown in Fig. 3-2, where W and S are the interconnect width and spacing; T is the interconnect thickness and H is the dielectric height;  $C_f$  is the fringing-field capacitance;  $C_a$  is the parallel plate capacitance to the top and bottom layers of metal;  $C_c$  is the coupling capacitance between the neighboring interconnects. The interconnect resistance per unit length is denoted as (3-1).

$$r_w = \frac{\rho}{W \cdot T} \tag{3-1}$$

Where  $\rho$  is the metal resistivity;  $r_w$  is the sheet resistance in the data sheet.

With technology scaling and global interconnect increasing, repeaters insertion is broadly used to reduce delay and power consumption. Several literatures have addressed the optimization of global interconnect design with repeater insertion [29-33]. Since the interconnect parameters can be determined by width *S* and spacing *W* and so on, on-chip interconnects with repeaters insertion can be analyzed by Elmore RC delay model. According to Elmore delay model, time constant tof whole interconnect can be given from the model depicted in [29-31]

When we separate global interconnect into several segments, the small delay penalty of repeaters can be tolerated on these critical segments. Time constant  $\tau$  is dominated by interconnect segment. However, if the segment of global interconnect is over-shorten, the driving capability of repeaters decreases severely. Consequently, there is a trade-off between time constant  $\tau$  and power consumption.

#### 3.1.2. Time Constant, Power Dissipation and Figure of Merit

Data rate is relative to time constant. Rising time and falling time can be estimated by the step response The output rise time is defined from the 20% transition edge to 80% transition edge, as shown in Eq.(3-2).

$$\mathbf{t}_{\rm r} = t_{80\%} - t_{20\%} \cong 1.386\tau \,. \tag{3-2}$$

The minimum rising time is specified as 0.125 unit interval (UI) in the SATA standard, where  $t_{80\%}$  and  $t_{20\%}$  is the time when output voltage exceeds 80%  $V_{DD}$  and 20%  $V_{DD}$ , respectively during the rising edge [34].

Besides speed is one of the most important factors in on-chip interconnect design, power consumption is another basic consideration as well. The total power consumption includes not only the switching power, but also the leakage power and the short-circuit power, which is expressed as  $P_{SW}$ ,  $P_{SC}$  and  $P_{Leakage}$ , respectively. The detail expressions and discussions are reported in [29-31]. The total power dissipation of each interconnect is written as in Eq.(3-3).

$$P_T = \left(\frac{L}{h}\right) \times \left(P_{SW} + P_{SC} + P_{Leakage}\right).$$
(3-3)

Where *L* is the total length of interconnect and *h* is the separated segment length. Since switching power dissipation is a great portion of total power,  $P_{SW}$  can be expressed as in Eq.(3-4).

$$P_{SW} = \alpha f \cdot \left[ \frac{mL}{h} (c_{gs} + c_{db}) + c_{Wire} \right] \cdot V_{DD}^{2}.$$
(3-4)

where  $\alpha$  represents the activity factor which shows the probability of signal switching. The

 $(c_{gs}+c_{db})$  is the parasitic capacitor of repeater.

Performance of interconnect is effected by many design parameters. Most of them were discussed in literatures [32-33]. The FoM is used to compare the performance. Here, FoM<sub>1</sub> in Eq.(3-5) is defined as the total energy per bit to express the energy efficiency.

$$FoM_1 = E_T = \frac{P_T}{f} \approx \alpha C_{Total} V_{DD}^2.$$
(3-5)

Where  $E_T$  represents the total energy. Fig. 3-3 shows the energy per bit is a function where total L is 10 mm and  $E_T$  is depicted as a function of segment length h and repeater finger m. As a result, we can find out that the design is more energy-efficient as h is longer and m is using minimum m=1. Since the supply voltage  $V_{DD}$  is assigned by the system requirement, the only way to gain the energy efficiency is using long segment length h. However, it suffers great penalty of speed. According to this limiting fact, the most energy efficiency happens as using maximum h and the minimum driver sizing. It becomes a trade-off depending on the requirement.



Fig. 3-3. Effect of segment length and fingers of repeaters on the energy per bit.

#### **3.2. Active Leakage Reduction Bootstrapped Inverter**

Fig. 3-4 schematically depicts the proposed active leakage reduction bootstrapped inverter (ALBI). Where  $C_{BP}$  and  $C_{BN}$  are the bootstrap capacitors;  $M_{P1}$  and  $M_{N1}$  are the transistors for  $C_{BP}$  pre-charge and  $C_{BN}$  pre-discharge; INV refers to the inverter to control  $M_{P2}$  and  $M_{N2}$ ;  $M_{PD}$  and  $M_{ND}$  are the output drivers for  $C_L$ ;  $N_P$  and  $N_N$  are the boosted nodes. The node  $N_B$  is boosted above  $V_{DD}$  and below ground to enhance the driving capability. Fig. 3-5 and Fig. 3-6 show the

operations with the input switching from H to L and from L to H respectively. Fig. 3-7 shows the ALBI simulated transient waveforms with an output load of 0.5pF under a power supply of 200mV. According to this figure, before  $V_{in}$  transits from H-to-L, node  $N_N$  has the initial voltage of 0V. After transiting from H-to-L,  $N_N$  is boosted below ground to (-188mV). Meanwhile,  $M_{P2}$  is turned off and  $M_{N2}$  is turned on. Therefore, the boosted signal at  $N_N$  passes through  $M_{N1}$  to  $N_B$  to drive  $M_{PD}$  in order to pull up the capacitive load  $C_L$ . At this moment,  $M_{P1}$  is turned on to pre-charge  $N_P$  to  $V_{DD}$  (0.2V). However,  $M_{N1}$  is turned on reversely causing the reverse current flow to charge  $N_N$ . At the end of the period while  $V_{in}$  is L,  $N_N$  still holds (-90mV). When  $V_{in}$  goes from L to H, the operation is similar to  $V_{in}$  transiting from H to L.  $N_P$  is boosted above  $V_{DD}$  to 389mV and discharged to 303mV at the end of the period while  $V_{in}$  is H.



Fig. 3-4. Proposed bootstrapped inverter.



Fig. 3-5. Proposed bootstrapped inverter operations (input H-to-L).



Fig. 3-6. Proposed bootstrapped inverter operations (input L-to-H).



Fig. 3-7. Simulated timing waveforms at 5 MHz at 200 mV  $V_{DD}$ .

#### **3.3. Detail Evaluation and Discussion**

The proposed ALBI is superior to previous designs in terms of leakage power and switching speed. In a low-voltage circuit design, the decreasing the  $I_{on}/I_{off}$  ratio degrades the noise margin. In the proposed design, the boosted voltage is used in both driving phase and cut-off phase. Additionally, the proposed design improves the  $I_{on}/I_{off}$  ratio by using the active bootstrapped leakage reduction method. Moreover, fewer design components increase the speed of the bootstrapped circuit. Owing to the fewer components operating in the sub-threshold region, the proposed design scheme performs better than other previous works in terms of Monte Carol analysis.

To compare the performances of the proposed scheme and conventional ones more fairly, this work re-designed the conventional inverter and reported bootstrapped drivers by using the 90nm process. The sizes of the conventional inverter and the bootstrapped driver are designed to obtain the same rise/fall transient output waveforms. Their device sizes are listed in TABLE 3-1. A 30fF boost capacitor is used to ensure that the boosting efficiency exceeds 80%. These features are evaluated in detail as follows.

| Driver topology  | Sub-circuit                       | NMOS W/L<br>(nm/nm) | m <sub>n</sub> | PMOS W/L<br>(nm /nm) | m <sub>p</sub> |
|------------------|-----------------------------------|---------------------|----------------|----------------------|----------------|
| Conventional INV | inverter                          | 420 / 80            | 30             | 440 / 80             | 30             |
|                  | inverter                          | 400 / 80            | 4              | 200 / 80             | 4              |
| Proposed         | $M_{P1}, M_{N1}$                  | 200 / 80            | 1              | 200 / 80             | 1              |
| inverter         | M <sub>P2</sub> , M <sub>N2</sub> | 200 / 160           | 1              | 200 / 160            | 1              |
|                  | driver                            | 285 / 80            | 1              | 340 / 80             | 2              |
| _                | inverter                          | 400 / 80            | 4              | 200 / 80             | 4              |
| Bootstrapped     | switch                            | 200 / 80            | 3              | 200 / 80             | 3              |
|                  | driver                            | 250 / 80            | 1              | 340 / 80             | 2              |
|                  | inverter                          | E \$400 / 80        | 4              | 200 / 80             | 4              |
| Bootstrapped     | switch                            | 200 / 80            | 4              | 200 / 80             | 4              |
|                  | driver                            | 1 260 / 80          | 1              | 300 / 80             | 2              |

**TABLE 3-1 Device Sizing** 

#### **3.3.1.** Boosting Efficiency

Ideally, the boosted node N<sub>B</sub> generates a voltage swing from  $2V_{DD}$  to  $-V_{DD}$ . However, the parasitic capacitance at node N<sub>B</sub> exhibits the charge-sharing effect with the bootstrap capacitance [17]. For example, when N<sub>B</sub> transitions above  $V_{DD}$ , consider the equivalent circuit of the upper side shown in Fig. 3-4. V<sub>BP</sub> and C<sub>PTP</sub> are the voltage and the total parasitic capacitance at N<sub>B</sub>, respectively. Ideally, V<sub>BP</sub> transits from  $-V_{DD}$  to  $2V_{DD}$ . Thus,

min

$$V_{BP} = \frac{C_{BP}}{C_{BP} + C_{PTP}} \cdot 2V_{DD} - \frac{C_{PTP}}{C_{BP} + C_{PTP}} \cdot V_{DD} \quad .$$
(3-6)

To increase driving capability, the bootstrap capacitance is designed to be significantly larger than the parasitic capacitance at the node. As a result, (3-6) can be rewritten as (3-7),

$$V_{BP} \approx \frac{C_{BP}}{C_{BP} + C_{PTP}} \cdot 2V_{DD} \triangleq \beta_P \cdot 2V_{DD}.$$
(3-7)

 $\beta_{P}$  is the boosting efficiency factor or simply the boosting efficiency. Similarly, as V<sub>BN</sub> transits

from  $V_{DD}$  to below ground, the estimated  $V_{BN}$  is

$$V_{BN} \approx \frac{C_{BN}}{C_{BN} + C_{PTN}} \cdot \left(-V_{DD}\right) \triangleq \beta_N \cdot \left(-V_{DD}\right).$$
(3-8)

Based on larger bootstrap capacitance, the boosting efficiency is better. In order to observe the leakage power and time delay time in a more ideal case, we used 100fF as a bootstrap capacitor. In our test chip, based on a trade-off between cost and performance, a 30fF boost capacitor is used for sure that the boosting efficiency is 80% at least. As shown in the Fig. 3-8, the boosting efficiency is 88% when using a 30fF bootstrap capacitor.



Fig. 3-8. Boosting efficiency vs. bootstrap capacitor.

#### **3.3.2. Reduction of Leakage Current**

In the proposed design scheme, the boosted high  $(2V_{DD})$  at N<sub>B</sub> enhances the driving capability of M<sub>ND</sub> and suppresses the leakage current of M<sub>PD</sub>. Similarly, the boosted low  $(-V_{DD})$  at N<sub>B</sub> enhances the driving of M<sub>PD</sub> and reduces the leakage of M<sub>ND</sub>.

The  $I_{off}$  current is primarily formed by a sub-threshold leakage current [38-39]. Hence, scaling the supply voltage lowers the  $I_{on}/I_{off}$  ratio. In the previous literature, bootstrapped drivers improve the  $I_{on}/I_{off}$  ratio only by enhancing  $I_{on}$  unidirectional. The proposed design effectively suppresses the leakage current of PMOS (NMOS) by providing a potential of a  $-V_{DD}$  to  $V_{SG}$  ( $V_{GS}$ ). According to the I-V formula in sub-threshold region, our design s reduces the leakage current exponentially.

Although HSPICE can simulate steady-state leakage power, characterizing the leakage

power under dynamic operations is difficult. The leakage power of a periodic waveform can be estimated by separating it from the average total power. The total energy  $E_T$  of a period of T is

$$E_T = P_T \cdot T \approx \left( P_{SW} + P_{SC} + P_{Leakage} \right) \cdot T$$
  
=  $E_{SW} + E_{SC} + P_{Leakage} \cdot T$ , (3-9)

where  $E_T$ ,  $E_{SW}$ ,  $E_{SC}$  and  $E_{Leakage}$  represents the total energy, the switching energy, the short-circuit energy, and the leakage energy. The switching energy, short circuit energy and leakage current are assumed to remain constant under the same power supply. A long wire can be regarded as large capacitive load is pF range. When a CMOS driver drives heavy capacitive loads, the energy contributions of the short-circuit current can be ignored.  $E_{Leakage}$  is proportional to *T*;  $E_{rep}$  is the total energy of the repeaters. Thus, we can rewrite Eq.(3-9) as

$$E_T \approx \left( E_{rep} + \frac{\alpha}{2} C_{wire} V_{DD}^2 \right) + P_{Leakage} \cdot T.$$
(3-10)

For two identical signals with different periods  $T_1$  and  $T_2$ , Leakage power  $P_{Leakage}$  is derived as

$$P_{Leakage} = \frac{P_{T_1} \cdot T_1 - P_{T_2} \cdot T_2}{(T_1 - T_2)}.$$
(3-11)

Fig. 3-9 shows the comparison results for the leakage power as a function of frequency with a 0.2pF capacitive load in different temperature and process corners. The ratio of leakage power to total power is also shown in Fig. 3-9. Owing to the negative V<sub>GS</sub> control, the leakage power at 10MHz under 0.2V of the proposed bootstrapped inverter is 2pW. The leakage power is 3.9nW for a conventional inverter, 0.15nW for [16], and 39nW for [17]. Although the PMOS (NMOS) transistor is turned off with the positive voltage  $V_{SG}$  ( $V_{GS}$ ) =  $V_{DD}$  in [17], the leakage power in [17] is more than three orders higher than in the proposed design scheme. When the operating frequency goes from 10MHz to 100kHz, the potential of the boost node become lower due to the node leakage degrades the leakage performance. The potential of the boost node even returns to  $V_{DD}$  or 0 at 100kHz. Hence, we can find out the leakage power is very close to the design in [16].







(b)



Fig. 3-9. Leakage power as a function of frequency from 10 MHz to 100 kHz in corners.

#### 3.3.3. Delay Time Analysis

Delay time is another important feature of bootstapped circuits. Although the driving transistors operate in a triode region under the subthreshlod-supply, other devices remain in the subthreshlod region. The total delay time is thus the sum of the propagation delay of the INV and the driver, which is denoted as

$$t_{P,BI} = t_{P,INV} + t_{P,Driver} .$$
(3-12)

Where  $t_{P,BI}$ ,  $t_{P,INV}$ , and  $t_{P,Driver}$  are the delays of the bootstrapped inverter, the INV, and the driver, respectively.

Assume that the boost efficiency is the same for all bootstrapped drivers. Delay time of the INV becomes a dominant factor. The sub-threshold logic delay is derived in [9] as

$$t_{p} = \frac{k_{f} \cdot C_{L} \cdot V_{DD}}{\mu C_{dep} \frac{W}{L} V_{T}^{2} \exp(\frac{V_{DD} - V_{th}}{n V_{T}})}.$$
(3-13)

Where  $k_f$  is a fitting parameter. However, circuit delay time is related to the RC loading effects. The ALBI has the shortest delay time among the other bootstrapped circuits since the loading of INV is only gate capacitance of  $M_{N2}$  and  $M_{P2}$ .
Fig. 3-10 summarizes the comparison results for the delay time (from H to L) and the power consumption as a function of  $C_L$  at 10 MHz with a supply of 200 mV. The proposed design is the lowest in power consumption and delay time.



Fig. 3-10. Delay time and power consumption versus capacitive loads at 10 MHz.

The potential of the boost node returned to  $V_{DD}$  or 0 indeed degrades the leakage performance in the low frequency or in the fast process/temperature corners. On the contrary, the potential of another boost node can easily pre-charge to  $V_{DD}$  or 0. As shown in Fig. 3-11, whether in the nominal 25°C, TT corner or in -40°C, SS corner or the 125 °C, FF corner, the delay times of all designs are almost the same at the frequencies from 1 MHz to 100 kHz.



Fig. 3-11. Delay time as a function of frequency in corners.

#### 3.3.4. Delay Time Analysis of Process Variation

Sub-threshold operation limits the yield due to its serious process variations. Although the boosted control signal pushes the driver transistors into the triode region, the residue circuit devices still incur the same serious problems with the variation. With fewer devices in the sub-threshold region, the proposed design is less affected by the process variation.

The delay time variability analysis is performed based on Monte Carlo simulations. Device mismatch, threshold voltage  $V_{th}$  and process corner variation are assumed to be Gaussian random distribution. In order to cover the most critical process and temperature corners, Monte Carlo simulations are under  $3\sigma$  process variation at 25°C, 125°C and -40°C, as shown in Fig. 3-12. The supply voltage is 200mV and the clock rate is 1MHz. The number of samples for each temperature corner is 1500, and the total number of samples is 4500. For the worst case at -40°C, a conventional inverter has an average delay of 15.1ns, and the standard deviation is 26.4ns. For the proposed design does not only reduce the average delay to 6.9ns, but also the standard deviation to 6.3ns, which is much better than [16] and [17]. Obviously, The ALBI has higher immunity to the process and temperature variation.



Fig. 3-12. Monte Carlo simulation results under a power supply of 200 mV.

## **3.4. Implementation and Experimental Results**

#### 3.4.1. Implementation of the Bootstrap Capacitor

We can choose the value of the boost capacitor to adjust the boosting efficiency. Large boost capacitor can achieve high boosting efficiency. In addition, larger boost capacitor can store more charges to keep the node voltage against the leakage even at the low speed. However, the area cost and power consumption is the design trade-off. In our test chip, a 30fF boost capacitor is used ensure that the boosting efficiency is at least 80% and doesn't occupy too much area.

MOSFET cap, MOM cap, and MIM capacitor are three types of capacitors in CMOS technology. Among them, MOSFET capacitor has the densest capacitance per area. However, MOSFET capacitor also has several drawbacks. First of all, while the MOSFET capacitor operated in sub-threshold region, the capacitance changes abruptly due to the control voltage as shown in Fig. 3-13. Then, the leakage current of the nano-scaled device becomes more serious. Next, MOSFET capacitor has large parasitic capacitance from  $V_{ctrl}$  nodes to the bulk as compared to other caps. The large parasitic capacitance but largest area. A 30fF MIM capacitor occupies 5.1um x 8.5um. Besides, MIM capacitor needs an extra mask which means extra cost. As a result, we use MOM capacitor as the boost capacitor without extra mask. A 30fF MOM capacitor occupies 3.7um x 8.6um and has 1fF parasitic capacitance load at both nodes.



Fig. 3-13. MOSFET capacitor changes due to the control voltage.

#### 3.4.2. Chip Implementation and Measurement

A test chip of bootstrapped CMOS inverters is implemented in 90nm 1P9M SPRVT process to demonstrate the effectiveness of the proposed design scheme. The test circuits include the reported bootstrapped circuits of [16], [17], and the proposed design. The circuits also contain test keys to verify the interconnection model. Each bootstrapped circuit is implemented as a 10-stage cascade driver chain. In each stage, two 30fF MOM capacitors serve as bootstrap capacitors and a 200fF MOM capacitor as  $C_L$ . Level shifters are used to boost the 200mV internal signal to 500mV chip I/O signal for the measurement. The total area is 958µm×776µm, and the core area is 566µm×102µm. Fig. 3-14 shows the die photograph. The layout area of the proposed bootstrapped inverter cell is 25.8µm×4.1µm.



Fig. 3-14. Die photograph and cell layout.



Fig. 3-15 Experimental environment.

Fig. 3-15 shows the photography of our experimental environment. Fig. 3-16 shows the measured waveform. The cumulative clock peak-to-peak and RMS jitters are 3.6ns and 504ps, respectively. The measured average total power is  $1.01\mu$ W. With the leakage power estimated in Eq. (3-10), the derived leakage power is 107nW with the periods of 100ns and 105ns. TABLE 3-2 lists the summary of the chip. Since the threshold voltage  $V_{thn}$  and  $|V_{thp}|$  are 240mV and 180mV, respectively. We target to operate 10MHz at 0.2V. TABLE 3-3 lists the comparisons of measured results with other works at 0.2V. For a ten-stage driver chain operating at 10MHz, the ALBI has a delay time of 30.1 $\mu$ s, energy efficiency is 0.1 pJ/cycle, and the leakage power is 107nW, which is the best as compared to [16] and [17].



Fig. 3-16. Measured waveform at 0.2V core  $V_{DD}$  (0.5V I/O  $V_{DD}$ ).

| Item               | Specification (unit)             |                    |                         |          |  |
|--------------------|----------------------------------|--------------------|-------------------------|----------|--|
| Process            | 90nn                             | n SPRVT Low        | -K CMOS Process         |          |  |
|                    | Bootstrapp                       | ed Circuits        | 0.2V                    |          |  |
| Supply Voltage     | Level Shi                        | ift Buffer         | 0.2V, 0.5V              |          |  |
|                    | Digital                          | Circuits           | 0.5V                    |          |  |
| Derror Dissingtion | Leakage                          | e Power            | Total Power             |          |  |
| (10 stages)        | Post-sim<br>(FF Corner) Measured |                    | Post-sim<br>(FF Corner) | Measured |  |
| (10 stages)        | 133nW 107nW                      |                    | 1.13uW                  | 1.01uW   |  |
|                    | Intercon<br>Circ                 | nect Test<br>suits | 575μm×307μm             |          |  |
| Layout Area        | Bootstrapped Circuits            |                    | 566μm×102μm             |          |  |
|                    | Whole                            | e Chip             | 958μm×776μm             |          |  |

**TABLE 3-2 Chip Summary** 

|                       | JSSC1997<br>[16] | T.VLSI2008<br>[17] | Proposed |
|-----------------------|------------------|--------------------|----------|
| Supply voltage (V)    | 0.2              | 0.2                | 0.2      |
| Max frequency (MHz)   | 4                | 5                  | 10       |
| Delay time (us)       | 47.3             | 48.2               | 30.1     |
| Total Power (uW)      | 0.74             | 1.71               | 1.01     |
| Leakage Power (nW)    | 276              | 833                | 107      |
| Energy per cycle (pJ) | 0.19             | 0.34               | 0.10     |

**TABLE 3-3 Comparisons** 

# 3.5. Summary

This chapter describes an ALBI applied to a sub-threshold supply clock network. Based on 4500 times of Monte Carlo simulations, the average delay time of the proposed design with 200fF  $C_L$  is 6.9ns with a standard deviation of 6.3ns, which achieves a reduction of 76% from the conventional inverter. Measured results verify that the test chip can achieve a clock rate of 10MHz at 200mV  $V_{DD}$ . Due to the negative V<sub>GS</sub> suppression, the measured leakage power is more than 50% improvement over the previously reported bootstrapped drivers. The power consumption is 1.01µW, and the leakage power is 107nW, and the energy efficiency is 0.1pJ/cycle.

# Chapter 4

# **Near-threshold On-chip Bus**

In data communication, *inter-symbol interference* (ISI) critically limits the data rate. In this chapter, an on-chip bus design with an ISI-suppressed bootstrapped near-threshold repeater is proposed. Operating at the near-threshold supply voltage is the most effective means in power reduction. To overcome the poor driving capability, the bootstrap technique is used. In addition, a pre-charge enhancement and a leakage current reduction schemes are adopted. They achieve beneficial speed-energy tradeoff. Furthermore, the proposed repeater suppresses ISI noise in data link applications.

# 4.1. Proposed On-chip Bus Architecture





Fig. 4-1 shows the proposed 4-bit on-chip bus for data communication under the near-threshold power supply. A bus is divided into several segments, each of which is driven by a bootstrapped repeater. Ground shielding is used to eliminate the effective-loading uncertainty and decouple the noise from adjacent channels. The staggered repeaters on adjacent channels are misaligned to reduce the coupling noise and *simultaneous switching noise* (SSN).

#### **4.2. ISI-suppressed Bootstrapped Driver**

An *ISI-suppressed bootstrapped driver* (ISBD) as a repeater is composed of an inverter as the driver and a bootstrap control circuit. The bootstrap control circuit has many important features. First, a pre-charge enhancement scheme improves the pre-charge capability to achieve high-speed operation. Second, a leakage current elimination technique suppresses the ISI noise. Third, the bootstrap control circuit produces a boosted output swing from  $-V_{DD}$  to  $2V_{DD}$  to increase the driving current ( $2V_{DD}$ ) and turn off the transistor aggressively ( $-V_{DD}$ ). As a result, the  $I_{on}/I_{off}$  ratio is improved substantially.

Fig. 4-2 depicts the proposed ISBD.  $C_{BP}$  and  $C_{BN}$  are the bootstrap capacitors;  $M_{P1}$  and  $M_{N1}$  are the precharge transistors for  $C_{BP}$  and  $C_{BN}$ ;  $INV_P$  and  $INV_N$  are the pre-drivers to boost  $C_{BP}$  and  $C_{BN}$ ; and  $M_{PD}$  and  $M_{ND}$  are the output drivers.  $N_{BT}$  is boosted to  $2V_{DD}$  and  $-V_{DD}$  to enhance the driving capability of  $M_{PD}$  and  $M_{ND}$ .  $N_{BT}$  is also fed back to control  $M_{P1}$  and  $M_{N1}$  to enhance the precharge capability and eliminate the reverse leakage current simultaneously.



Fig. 4-2. Circuit of proposed bootstrapped repeater.

Figures 4-3 and 4-4 show the transient waveforms with input switching from H to L and from L to H. Assume that the bootstrap capacitors  $C_{BP}$  and  $C_{BN}$  had stored a voltage potential of  $V_{DD}$  before  $V_{in}$  has a transition from H to L; node  $N_{BP}$  has an initial voltage of  $V_{DD}$ , and node  $N_{BT}$ has an initial voltage of  $-V_{DD}$ , ideally. After  $V_{in}$  transits from H to L,  $N_{OP}$  transits from L to H and  $N_{BP}$  is boosted to  $2V_{DD}$ . At the same time,  $M_{P2}$  is turned on and  $M_{N2}$  is turned off.  $2V_{DD}$  at  $N_{BP}$  starts to charge  $N_{BT}$  through  $M_{P2}$  and pushes  $N_{BT}$  to  $2V_{DD}$ . After  $N_{BT}$  is charged above threshold voltage  $V_{th}$ ,  $M_{N1}$  is turned on to precharge  $N_{BN}$  to GND. Now,  $C_{BN}$  has a potential of  $-V_{DD}$ .



Fig. 4-3. Proposed bootstrapped repeater operation (input H-to-L).



Fig. 4-4. Proposed bootstrapped repeater operation (input L-to-H).

As  $V_{in}$  transits from L to H, a similar mechanism pushes N<sub>BT</sub> to  $-V_{DD}$ . Figure 4-5 shows the simulated transient waveforms with a 1mm wire load and a  $V_{DD}$  of 0.2V. Here, N<sub>BT</sub> swings from 384mV to -186mV instead of the ideal 400mV to -200mV owing to the charge sharing effect.

Like all bootstrap circuits, the ISBD has start-up and stand-by problems. Before start-up, one of the bootstrap capacitors does not have charge stored. Similarly, during a long stand-by period, one of the bootstrap capacitors becomes depleted of charge by sub-threshold leakage. A transition of the data input is required to recharge the depleted bootstrap capacitor. The normal bootstrap function can then be regained at the next transition.

A CMOS transistor has parasitic diodes between sources/drains to the body. Although, the body and the sources can be shortened in PMOS using an N-well bulk-CMOS process, the parasitic diodes are retained for  $M_{N2}$ , as shown in Fig. 4-6. When a negative voltage ( $-V_{DD}$ ) is generated at  $N_{BN}$ , the parasitic diode might be turned on if  $V_{DD}$  exceeds 0.7V. Therefore, the

proposed design is used in near-threshold applications.



Fig. 4-6. Cross-section of proposed circuit.

# 4.3. Detailed Evaluation and Comparisons

The previous section briefly introduced the architecture of the on-chip bus and the basic operation of the ISBD. This section will discuss them in greater detail with reference to boosting efficiency, leakage power, ISI suppression, energy efficiency and Monte Carlo analysis.

#### 4.3.1. Boosting Efficiency

We have mentioned the boosting efficiency due to charge sharing in chapter 3. In fact, the

boosting efficiency factor is a time-variant function, according to the accumulation of leakage charge. When  $V_{BTP}$  is boosted above  $V_{DD}$ , the leakage currents  $I_{LMP1}$  and  $I_{LMN2}$  discharge  $C_{BT}$  through  $M_{P1}$  and  $M_{N2}$ , respectively, as shown in Fig. 4-7. The time-variant boosting efficiency causes an ISI problem, which will be discussed in a later section.



Fig. 4-7. Equivalent circuit for evaluating boosting efficiency.

#### 4.3.2. Leakage Current Reduction

We have introduced the leakage current reduction according to the ALBI in chapter 3. Making  $V_{GS}$  negative is an effective means of reducing  $I_{off}$  and improving the  $I_{on}/I_{off}$  ratio, consistent with Eq.(2-2). For example, Fig. 2-3 plots the  $I_D$  of an NMOS with a fixed  $V_{DD}$  drain voltage as  $V_{GS}$  is swept from -0.45V to 0.65V. Obviously,  $I_D$  varies exponentially proportional with the gate voltage  $V_G$  in the near-threshold region. Since HSPICE is based on BSIM4 model (level =54), drain current has a good approximation to the nano-scaled effects such as DIBL and GIDL. Typically, the leakage current of the NMOS is 0.4nA at  $V_{GS} = 0V$ . When  $V_{GS} = -0.22V$ ,  $I_D$  is reduced to 30pA from 0.4nA at  $V_{GS} = 0V$ . However, the GIDL current that is induced by the high electrical field between gate and drain becomes the major component of the leakage current while the gate voltage is shifted to -0.45V.  $I_{off}$  for a single transistor is analyzed and  $P_{Leakage}$  for a complete circuit is determined as follows.

#### 4.3.3. Leakage Power Analysis

Similar to the section in chapter 3, we have two identical signals with different periods  $T_1$  and  $T_2$ . Leakage power  $P_{Leakage}$  is then obtained by Eq.(3-11).

$$P_{Leakage} = \frac{P_{T_1} \cdot T_1 - P_{T_2} \cdot T_2}{(T_1 - T_2)} .$$
(3-11)

As compared with ALBI and ISBD, ISBD eliminated the reverse current to keep the boosted voltage. As a result, the reduction of the leakage power using ISBD performs well even operating at very slow frequency.

To demonstrate the reduction of leakage current, the proposed design is compared with the conventional inverter and two reported works [16-17]. They are all designed to drive a 200fF C<sub>L</sub>. A 55nm SPRVT process is used. For all bootstrap drivers,  $C_B = 50$ fF and the widths of M<sub>PD</sub> and M<sub>ND</sub> are 288nm and 108nm, respectively, for a fair comparison. The conventional inverter was designed to be 50 times the size of the bootstrapped driver to obtain the similar output t<sub>rise</sub> and t<sub>fall</sub> as the bootstrapped one at  $V_{DD} = 0.2$ V. Additionally, due to the iso-area condition, the results of the case with m=150 is also added.

Figure 4-8 plots the total power as a function of the supply voltage for the five designs. As mentioned, the switching power and leakage power constitute almost all the total power consumption. Figure 4-9 plots the leakage power as a function of the supply voltage. The operating frequencies are 0.5MHz, 3MHz, 10MHz, 25MHz and 66MHz at 0.1V to 0.3V, respectively. Owing to the negative  $V_{GS}$ , the leakage power of the proposed bootstrapped repeater is one order of magnitude less than those of the other designs. Figure 4-10 shows the  $P_{Leakage}/P_T$  ratio as function of the supply voltage. The proposed design has the lowest total power and a  $P_{Leakage}/P_T$  ratio of 1.5% even though  $V_{DD} = 0.1$ V. It is roughly one order of magnitude lower than those of the others.



Fig. 4-8. Comparisons of total power at different  $V_{DD}$ .



Fig. 4-9. Comparisons of leakage power at different  $V_{DD}$ .



Fig. 4-10. Comparisons of  $P_{Leakage}/P_T$  ratio at different  $V_{DD}$ .

Figure 4-11 shows the total power as a function of activity factors. When the activity factor is small, the non-transient time is long. That means the leakage power takes larger portion of the the total power. Figure 4-12 shows the  $P_{Leakage}/P_T$  ratio as a function of activity factors.. The proposed design has a  $P_{Leakage}/P_T$  ratio of 1% at 0.02 activity factor, which is much smaller than all other designs.



Fig. 4-11. Comparisons of total power being swept by activity factors.



Fig. 4-12. Comparisons of  $P_{Leakage}/P_T$  ratio being swept by activity factors.

Figure 4-13 shows the total power as a function of the input clock rate at 0.2V. With the leakage reduction technique, the switching power of the proposed design is almost the same as the total power. Figure 4-14 shows the  $P_{Leakage}/P_T$  ratio as a function of the input clock rate. At 33kHz, the  $P_{Leakage}/P_T$  ratio of the proposed design is 25%, while other designs are more than 60%.



Fig. 4-13. Comparisons of total power at different clock rates.



Fig. 4-14. Comparisons of  $P_{Leakage}/P_T$  ratio at different clock rates.

#### 4.3.4. ISI Suppression

In data communication, ISI critically limits the data rate. The boosting efficiency of a bootstrapped inverter is closely related to the ISI, as follows. The driving capability of the output driver is controlled by the voltage  $V_{BT}$  at  $N_{BT}$ , which is either  $2\beta_P V_{DD}$  or  $-\beta_N V_{DD}$ . In the design herein, the fed-back  $V_{BT} = 2V_{DD}$  ( $V_{BT} = -V_{DD}$ ) eliminates the reverse current through  $M_{P1}$  ( $M_{N1}$ ) when  $N_{BP}$  ( $N_{BN}$ ) is boosted. Figure 4-14 shows a data string with consecutive *a* 0s followed by *b* 1s. According to the circuit model in Fig. 4-7, the bootstrapped voltage can be derived as

$$V_{BT}(a+b) \approx \frac{2}{C_{BP} + C_{PT}} \cdot Q(a+b)$$

$$= \frac{2}{C_{BP} + C_{PT}} \cdot \left(Q(0) - \int_{0}^{aT} (I_{LMP1} + I_{LMN2}) dt + \int_{aT}^{bT} I_{DMP1} \cdot dt\right).$$
(4-1)

Here, T is the period; Q(0) is the initial charge in C<sub>BP</sub>, and  $I_{DMP1}$  is the pre-charge current on M<sub>P1</sub>. As a result,  $\beta_P$  depends on input data. To minimize the variation of  $\beta_P$ , according to (4-1), the leakage currents  $I_{LMP1}$  and  $I_{LMN2}$  must be minimized. Since the proposed design employs a special mechanism to suppress the sub-threshold leakage  $I_{LMP1}$  and  $I_{LMN2}$ , as stated earlier, the pre-charge current  $I_{DMP1}$  is also enhanced by the boosted signal. Therefore, the proposed design has better immunity to ISI. Fig. 4-16 shows the boosted and the output waveforms of the data with 4, 16 and 64 consecutive 0s followed by only one "1". The ISI is suppressed successfully in all cases.



Fig. 4-15. Timing diagram fro various numbers of consecutive 1s and 0s.



Fig. 4-16. Waveforms at nodes for various numbers of consecutive 0s.

Fig. 4-17(a) compares the proposed design with reported repeaters in the clock link. The total length of the interconnect is fixed at 10-mm with minimum wire spacing for coplanar ground shielding. The 10-mm interconnect is segmented for various interconnect lengths along the X axis. The drivers are designed to yield  $t_{rise}$  and  $t_{fall}$  equal to 7.5% of a clock period. Fig. 4-17(b) compares the data links of the designs and demonstrates data rate as a function of segment length. The parameters  $t_{rise}$  and  $t_{fall}$  are designed to be 15% of a UI in data links. Notably, only one transition occurs per clock period in data links while two occur in clock links. The jitter tolerance is defined as 0.3 UI peak-peak jitter of the output signal. Both Fig. 4-17(a) and Fig. 4-17(b) indicate that our design can simultaneously achieve the highest data rate and energy efficiency.







Fig. 4-17. Comparison of (a) clock links, (b) data links as function of segment length.

#### 4.3.5. Energy Efficiency

The proposed design has a significant speed improvement and high energy efficiency. Bootstrap techniques improve the driving capability exponentially by boosting the gate voltage of the driver. However, the bootstrap circuit consumes extra power. The average power of the bootstrap circuit can be represented as

$$P_{T,BT} = P_{SW,BT} + P_{SC,BT} + P_{Leak,BT}.$$
(4-2)

Where  $P_{T,BT}$ ,  $P_{SW,BT}$ ,  $P_{SC,BT}$ , and  $P_{Leak,BT}$  are the average, switching, short-circuit and leakage power of the bootstrap circuit, respectively. For the proposed bootstrapped circuit in Fig. 4-2, the switching power is

$$P_{SW,BT} \approx \alpha f (2C_{INV} + 9\beta C_{PT}) V_{DD}^2.$$
(4-3)

Where  $C_{INV}$  is the total input and output capacitance of INV<sub>P</sub> and INV<sub>N</sub>;  $\beta$  is the boosting efficiency. Assume that  $\beta_P = \beta_N = 0.9$ , and  $C_{INV} \approx C_{PT}$ , (4-3) can be rewritten as

$$P_{SW,BT} \approx 10.1 \cdot \alpha f C_{PT} V_{DD}^2 \,. \tag{4-4}$$

When a CMOS driver is applied to drive heavy capacitive loads, the energy contributions of the short-circuit current can be ignored [40]. Combined with the switching power for the wire, the total energy consumption is

$$E_T \approx \frac{\alpha}{2} \left( 10.1 \cdot C_{PT} V_{DD}^2 + C_{wire} V_{DD}^2 \right) + P_{Leak,BT} \cdot T.$$
(4-5)

 $P_{Leak,BT}$  is the leakage power of the bootstrap circuit. The leakage energy of the driver can be ignored, as shown in Fig 4-9. Figure 4-18 shows that the proposed bootstrapped repeater and the conventional one drive a 0.5 pF capacitive load while  $V_{DD}$  is being swept from 0.1–0.3 V. The bootstrapped repeater and the conventional one use the same output driver. Both these two circuits operate at their highest speed. The data rate of the proposed bootstrapped repeater is 7–13 times higher than the conventional one. When these two circuits are operated at 0.1–0.2 V, the energy of the proposed design is even lower than the conventional one, because the proposed one reduces the leakage power effectively.



Fig. 4-18. Comparison of driving capability and energy

#### 4.3.6. Monte Carlo Simulations

Since sub-threshold circuits indeed suffer severe process variation problems, Monte Carlo simulations are used to investigate the effects. Four types of repeaters are discussed. A 10-mm interconnect is divided into 10 segments. Device mismatch, threshold voltage  $V_{th}$  and process corner variation are assumed to be Gaussian random distribution.

The analysis is setup to find out the distribution of the maximum clock rate and the variability ratio. The maximum clock rate is the highest speed in each Monte Carlo sample and the variability ratio is defined as  $f_{max}/f_{min}$ . Under  $3\sigma$  variation, we simulated the designs at 20 different clock rates by the ratio of power of two. The number of samples in each clock rate is 1000. The PDFs of the maximum clock rate are shown in Fig. 4-19 in which X axis is normalized to 10MHz and scaled by power-of-two. Fig. 4-19 also shows the mean  $\mu$ , standard deviation  $\sigma$ , minimal clock rate  $f_{min}$ , and maximum clock rate  $f_{max}$ . Our design has the minimal  $f_{max}/f_{min}$  ratio of 11.3, as compared to 16.9, 16.0 and 34.0 of the inverter, [16] and [17], respectively.

Fig. 4-20 shows Monte Carlo simulation of the leakage power at 1 MHz under a 0.2 V  $V_{DD}$ . Our design has an average of 13.0 pW and a standard deviation is 7.3 pW, which are two to three orders better than the rest. Fig. 4-21 shows the  $P_{Leakage}/P_T$  ratio at 0.2 V. An average of 0.16% is far better the others and a  $\sigma$  of 0.09% indicates more concentrated as well.



Fig. 4-20. Monte Carlo simulation results of leakage power.



Fig. 4-21. Monte Carlo simulation results of  $P_{Leakage}/P_T$  ratio

# 4.4. Experimental Setup and Measurement

#### 4.4.1. Chip implementation

A test chip was designed and fabricated in 55nm 1P10M SPRVT. The test chip includes two on-chip buses- the proposed bootstrapped repeater and the conventional one. Fig. 4-22 shows the block diagram of both on-chip buses. Four-bit pseudo-random bit sequences (PRBS) are generated and passed through an H-to-L level shifter to adjust the voltage swing to 0.1–0.3 V. An extra input I/P enables the equipment to provide a tunable clock signal or random data. Each on-chip bus has four channels. Each channel is 10-mm long and is divided into 10 segments, with a wire spacing of 90 nm for ground shielding in Metal5. In each bootstrapped repeater, two 50 fF MOM capacitors serve as the bootstrap capacitors. Level shifters are used for the I/O. The total area is  $821\mu$ m× $820\mu$ m and the core area is  $637\mu$ m× $206\mu$ m. Fig. 4-23 shows a photograph of the die. The layout area of the proposed bootstrapped repeater is  $16.7\mu$ m× $11.8\mu$ m. The measurement setup is shown is Fig. 4-24.



Fig. 4-22. Block diagram of test circuits.



Fig. 4-24. Measurement setup.

### 4.4.2. Measured Waveforms

The measured results are illustrated in this section. In order to operate and measure 0.1-0.4V voltage swing, H-to-L and L-to-H level shifters have been designed in the test chip. The calibration mode can be selected to measure the H-to-L and L-to-H level shifters without the 10mm wire. Figures 4-25(a)-(d) shows the measured waveforms of the H-to-L and L-to-H level

shifters. Figures 4-25(b) and 4-25(d) show better results of eye-diagrams with 1.25V  $V_{IOH}$ . Under the core supply voltages of 0.11V, 0.2V, 0.3V and 0.4V, Figures 4-26(a)-(d) show the measured clock waveforms; Figures 4-27(a)-(d) show data eye diagrams (b); and Figure 4-28(a)-(d) shows the transient waveforms. TABLE 4-1 presents the timing performance. The random data are a  $2^{10}$ - 1 bit PRBS sequence and the level shifters contribute an RMS of 174ps and a peak-to-peak jitter of 982ps.



@ V<sub>IOL</sub>=0.4V,V<sub>IOH</sub>=0.8V,Data rate=100Mbps 100@ V<sub>IOL</sub>=0.4V,V<sub>IOH</sub>=1.25V,Data rate=100Mbps





(d)



- (a) Eye diagrams of  $V_{IOL} = 0.3$  V and  $V_{IOH} = 0.8$  V.
- (b) Eye diagrams of  $V_{IOL} = 0.3$  V and  $V_{IOH} = 1.25$  V.
- (c) Eye diagrams of  $V_{IOL} = 0.4$  V and  $V_{IOH} = 0.8$  V.
- (d) Eye diagrams of  $V_{IOL} = 0.4$  V and  $V_{IOH} = 1.25$  V.



(a)



(b)



Fig. 4-26 Measured clock waveforms with core  $V_{DD}$  = (a)0.11V, (b)0.2V, (c)0.3V and (d)0.4V (0.11–1.25V I/O  $V_{DD}$ ).



(a)



 $Jitter_{RMS}$ =0.95 ns,  $Jitter_{P-P}$ =5.7 ns

(b)



Fig. 4-27. Measured eye-diagrams with PRBS using core  $V_{DD}$  = (a)0.11V, (b)0.2V, (c)0.3V and (d)0.4V (0.11–1.25V I/O  $V_{DD}$ ).





Fig. 4-28. Measured transient waveforms with core  $V_{DD} = (a)0.11V$ , (b)0.2V, (c)0.3V and (d)0.4V (0.11–1.25V I/O  $V_{DD}$ ).

| Supply voltage     | 0.1V    | 0.11V    | 0.2V    | 0.3V    |  |
|--------------------|---------|----------|---------|---------|--|
| Clock rate         | 0.6MHz  | 1MHz     | 22.5MHz | 100MHz  |  |
| Clock jitter (RMS) | 22.4ns  | 12.0ns   | 0.58ns  | 132ps   |  |
| Clock jitter (p-p) | 206ns   | 87.3ns   | 5.15ns  | 954ps   |  |
| Data rate          | 0.8Mbps | 1.25Mbps | 40Mbps  | 100Mbps |  |
| Data jitter (RMS)  | 81.0ns  | 48.5ns   | 0.95ns  | 0.43ns  |  |
| Data jitter (p-p)  | 395ns   | 271ns    | 5.72ns  | 2.65ns  |  |
| Data latency       | 2.93µs  | 1.99µs   | 166µs   | 36.0µs  |  |

| T. | A | BL | Æ | 4-1. | Mea | sured | Timing | Perf | ormance |
|----|---|----|---|------|-----|-------|--------|------|---------|
|----|---|----|---|------|-----|-------|--------|------|---------|

Fig. 4-29 shows the simulated and measured power and energy efficiencies of both the bootstrapped and the conventional buses. The FF process corner is used in the post-layout simulation to ensure consistency with the measurements. In general, the measured results coincide with the simulated ones, except in the extreme case of  $V_{DD} = 0.1$ V. The proposed design can operate at 0.6MHz (100MHz) under 0.1V (0.3V) with an energy efficiency of 40fJ/bit (123fJ/bit). The conventional repeater bus is 4MHz (20MHz) and 98fJ/bit (182fJ/bit) at 0.2V (0.3V). It shows the proposed one performs higher speed, wider range and better energy efficiency.



Fig. 4-29. Comparisons of measured and post-simulation results.

#### 4.4.3. Leakage Power Measurement

A distinguishing feature of the proposed design is the reduction in leakage current. Fig. 4-30 plots measured and simulated leakage power. The measured powers are 30nW, 140nW, 575nW and 2.75 $\mu$ W at  $V_{DD} = 0.1-0.4$ V, which are closer to FF corner than the TT corner.

TABLE 4-2 summarizes the performance of the on-chip bus test chip. TABLE 4-3 compares the results with some previous works. Most other relevant investigations have focused on low-power on-chip data communication in the Gbps range. The FoMs are used to compare the performance of the data link. The FoM<sub>1</sub> is defined as the energy per bit. The proposed design can operate in the sub-threshold region under a supply voltage of 0.1–0.3V. The energy per bit is 40fJ/bit at 0.1V, 59fJ/bit at 0.2V, and 123fJ/bit at 0.3V, indicating that the proposed design is more power-efficient than the others. The definition of the FoM<sub>2</sub> is the data rate normalized to pitch-power product. It shows that the proposed one can achieve higher normalized data rate than the rest.



Fig. 4-30. Measured and post-simulation leakage power versus supply voltage.

| Process                              | 55nm 1P10M SPRVT Low-K CMOS |                       |                   |  |  |
|--------------------------------------|-----------------------------|-----------------------|-------------------|--|--|
| V <sub>th</sub>                      | NMOS: 300mV; PMOS: –310mV   |                       |                   |  |  |
| Core Supply                          |                             | 0.1–0.3V              |                   |  |  |
| Supply Voltage of                    | V <sub>IOL</sub> 18         | б КV <sub>IOM</sub>   | V <sub>IOH</sub>  |  |  |
| Level Shift Buffers                  | 0.1–0.3V                    | 0.2–0.8V              | 0.4–1.0V          |  |  |
| Supply Voltage of<br>Digital Circuit | 0.4–1.0V                    |                       |                   |  |  |
| Max. Clock Link                      | 0.6MHz<br>@ 0.1V            | 22.5MHz<br>@ 0.2V     | 100MHz<br>@ 0.3V  |  |  |
| Max. Data Link                       | 0.8Mbps<br>@ 0.1V           | 40Mbps<br>@ 0.2V      | 100Mbps<br>@ 0.3V |  |  |
| Energy (fJ/bit)                      | 0.1V<br>@ 0.6MHz            | 0.2V<br>@ 22.5MHz     | 0.3V<br>@ 100MHz  |  |  |
|                                      | 40                          | 59                    | 123               |  |  |
| Lookogo Dowor                        | 0.1V                        | 0.2V                  | 0.3V              |  |  |
| Leakage Power                        | 0.03µW                      | 0.14µW                | 0.57µW            |  |  |
|                                      | Conventional bus            | 637µm x 183µm         |                   |  |  |
| Layout Area                          | Bootstrapped bus            | 637µm x 206µm         |                   |  |  |
|                                      | Whole Chip                  | le Chip 821µm x 820µm |                   |  |  |

# TABLE 4-2. Chip Summary

|                                               | TVLSI08[17]     | TCASI08[41]     | JSSC08[42]      | JSSC10[43]      | Conv.           | Proposed    |       |       |
|-----------------------------------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-------------|-------|-------|
| Technology                                    | 180nm           | 180nm           | 180nm           | 90nm            | 55nm            | 55nm        |       |       |
| Topology                                      | BT<br>repeaters | INV<br>repeater | Cap<br>coupling | Cap<br>coupling | INV<br>repeater | BT repeater |       |       |
| Single/ Differential                          | Single          | Diff            | Diff            | Diff            | Single          | Single      |       |       |
| Supply voltage (V)                            | 0.4             | 1.0             | 1.8             | 1.2             | 0.2             | 0.1         | 0.2   | 0.3   |
| Total length (mm)                             | 80              | 10              | N/A             | 10              | 10              | 10          |       |       |
| Width (nm)                                    | N/A             | 1000            | 2 x 300         | 2 x 540         | 90              | 90          |       |       |
| Spacing (nm)                                  | N/A             | 1500            | 2 x 300         | 2 x 320         | 90              | 90          |       |       |
| Data rate (Mbps)                              | ★9 MHz          | 1500            | 1000            | 2000            | 8               | 0.8         | 40    | 100   |
| *FoM <sub>1</sub> (pJ/bit)                    | N/A             | 1.74            | 2.24            | 0.28            | 0.098           | 0.04        | 0.059 | 0.123 |
| <sup>*</sup> FoM <sub>2</sub><br>(Mbps/μW·μm) | N/A             | 0.23            | 0.37            | 2.08            | 28.34           | 69.44       | 47.08 | 22.58 |

#### **TABLE 4-3.** Comparisons

★ only shows clock rate.

\* FoM<sub>1</sub> =  $\frac{\text{Power}(\mu W)}{\text{Data rate (Mbps)}}$  = Energy (pJ/bit); FoM<sub>2</sub> =  $\frac{\text{Data rate (Mbps)}}{\text{Power}(\mu W) \cdot \text{Pitch}(\mu m)}$ 

# 4.5. Summary

This work successfully explores on-chip bus design under a supply voltage of 0.1-0.3V. The proposed insertion of a bootstrapped CMOS repeater to suppress ISI yields low accumulated ISI jitter and a high clock/data rate even at a subthreshold-supply voltage. Additionally, the proposed bootstrapped repeater improves energy efficiency and has a  $P_{Leakage}/P_T$  ratio of 1% even at  $V_{DD} = 0.1V$ . This ratio is one order of magnitude lower than those of the other designs. According to Monte Carlo analysis, the proposed design has small variability under of device mismatch and process variation. Measured results verify that the proposed design achieves a 100MHz (0.6MHz) clock link and 100Mbps (0.8Mbps) data link at 0.3V (0.1V)  $V_{DD}$ . It is energy-efficient, consuming only 123fJ (40fJ) per bit.

Junite

# Chapter 5 High-Boosting Pre-driver

This chapter discusses the near-threshold interconnects with a high-boosting pre-driver. As compared to previous bootstrapped drivers in chapters 3 and 4, the high-boosting pre-driver presents better trade-off of energy efficiency. The proposed technique provides 3X and 4X boosting to enhance the driving current. Moreover, the high-boosting pre-driver pushes the driver to operate devices in the linear region far from threshold voltage, which explains why the process variation affects the proposed interconnects to a lesser extent.

# 5.1. Proposed High-boosting Pre-driver

Several boosting techniques enhance the driving to improve the pre-charge capability, which includes the proposed ALBI and ISBD in previous chapters. They used a 2X boosting with an output swing from  $-V_{DD}$  to  $2V_{DD}$ . In this chapter, we proposed two bootstrapped repeaters with 3X and 4X boosting ratios, shown in Fig. 5-1 and Fig. 5-2. The output swings from  $-2V_{DD}$  to  $3V_{DD}$  and  $-3V_{DD}$  to  $4V_{DD}$ , respectively. According to the high boosting gain, it has a pre-charge enhancement scheme to improve the pre-charge capability. Furthermore, it has a leakage current elimination technique to improve the energy efficiency substantially.



Fig. 5-1. The circuit diagram of the proposed 3X boosting pre-driver.

The proposed 3X boosting pre-driver is depicted in Fig. 5-1.  $C_{BP1, 2}$  and  $C_{BN1, 2}$  are the bootstrap capacitors;  $INV_P$  and  $INV_N$  are the inverters to boost  $C_{BP}$  and  $C_{BN}$ ; and  $INV_{DR}$  is the output driver.  $V_{3X}$  is boosted to  $-2V_{DD}$  to  $3V_{DD}$  to enhance the driving capability of  $INV_{DR}$ .  $V_{3X}$  is also fed back to control  $M_{P3, 4}$  and  $M_{N3, 4}$  to enhance the precharge capability and eliminate the reverse leakage current simultaneously. Fig. 5-2 depicts the proposed 4X boosting pre-driver. BT2X<sub>P</sub> and BT2X<sub>N</sub> are the pre-drivers using bootstrapped delay cells in [16], which provides swing from  $-V_{DD}$  to  $2V_{DD}$ .



Fig. 5-2. The circuit diagram of the proposed 4X boosting pre-driver.

Fig. 5-3 shows the transient waveforms of the 3X boosting pre-driver with input square wave at 0.15V  $V_{DD}$ . Assume that the bootstrap capacitors  $C_{BP1,2}$  and  $C_{BN1,2}$  had stored a voltage potential of  $V_{DD}$  before Vin transitions from H to L;  $N_{BP1}$  and  $N_{BP2}$  have an initial voltage of  $V_{DD}$ , and  $V_{3X}$  has an initial voltage of  $-2V_{DD}$ . After  $V_{in}$  transits from H to L,  $M_{N2}$  is turned off and  $N_{BP1}$  is boosted to  $2V_{DD}$ . Then,  $N_{BP2}$  is boosted to  $3V_{DD}$ . At the same time,  $M_{P5}$  is turned on and  $M_{N5}$  is turned off.  $3V_{DD}$  at  $N_{BP2}$  starts to charge  $V_{3X}$  through  $M_{P2}$  and pushes  $V_{3X}$  to  $3V_{DD}$ . After  $V_{3X}$  is charged above the threshold voltage  $V_{th}$ ,  $M_{N3}$  and  $M_{N4}$  are turned on to precharge  $N_{BN}$  to GND. Now,  $C_{BN1}$  and  $C_{BN2}$  have a potential of  $-V_{DD}$ .

Fig. 5-4 shows the transient waveforms of the 4X boosting pre-driver at 0.15V  $V_{DD}$ . The operation is very similar to the bootstrapped delay cells in [16]. The difference is that 4X boosting pre-driver uses BT2X<sub>P</sub> and BT2X<sub>N</sub> instead of conventional inverters.

Ideally, 3X and 4X pre-drivers have three times and four times boosting gain. However, non-ideal boosting efficiency incurs the reduced voltage swings, as shown in Fig. 5-3 and Fig. 5-4. [16] discussed the boosting efficiency owing to the charge sharing at the boosted node. Although using larger boosting capacitors can achieve higher boosting efficiency, the area overhead costs more. In addition, parasitic capacitance on the bootstrap capacitors increases the

delay time. Fig. 5-5 shows the relationship between boosting efficiency and supply voltage using different boosting pre-drivers. The boosting factor is defined as a ratio of boosted voltage to  $V_{DD}$ . The 3X and 4X pre-drivers have boosting efficiency penalty due to the parasitic loads on boosting path. Although the boosting efficiency of the 3X and 4X boosting circuits performs less efficiency than 2X ones, they still have higher boosting factor to gain more driving capability.



Fig. 5-3. Simulated timing waveforms of the 3X pre-driver at 0.15V supply.



Fig. 5-4. Simulated timing waveforms of the 4X pre-driver at 0.15V supply.



Fig. 5-5. Boosting efficiency as a function of  $V_{DD}$ .

# 5.2. High-boosting Pre-driver in Long Interconnects

## 5.2.1. Leakage Current Reduction

For a subthreshold design, the leakage current  $I_{off}$  accounts for a significant portion of the total power consumption. In the subthreshold region,  $I_{sub}$  is expressed as Eq.(2-2). Thus, the  $I_{on}/I_{off}$  ratio of the conventional inverter can be represent as in Eq. (5-1)

$$\frac{I_{on}}{I_{off}} = \exp(\frac{V_{DD}}{nV_T}).$$
(5-1)

As one can see, reducing the supply below the threshold voltage reduces  $I_{off}$ . However,  $I_{on}$  is reduced more significantly. As a result, the  $I_{on}/I_{off}$  ratio is reduced.

Making  $V_{GS}$  negative is an effective means of improving  $I_{on}/I_{off}$  ratio, according to (1). In the proposed 3X and 4X pre-drivers, the gate-source voltages are  $-2V_{DD}$  and  $-3V_{DD}$  for turned-off transistors. Hence, the subthreshold leakage  $I_{off}$  is reduced significantly. In addition, the gate-source voltages are  $3V_{DD}$  and  $4V_{DD}$  for turned-on transistors to enhance the  $I_{on}$ , and improve the  $I_{on}/I_{off}$  ratio simultaneously.

#### 5.2.2. Energy Efficiency

The proposed 3X and 4X pre-drivers have a significant speed improvement and high energy efficiency. Since the predrives consumes extra power, the total energy per bit is represented as in

Eq.(3-9). A long wire can be regarded as a large capacitive load in pF range. When a CMOS driver drives heavy capacitive loads, the energy contributions of the short-circuit current can be ignored [40]. Assume that the total length of interconnect is L, we can rewrite (3-9) as

$$E_T \approx \frac{L}{h} E_{rep} + \frac{\alpha}{2} C_{wire} V_{DD}^2 + P_{Leakage} \cdot T.$$
(5-2)

Where  $E_{rep}$  is the switching energy consumed by each repeater; *h* is the segment length; and  $\alpha$  is the activity factor.  $E_{Leakage}$  is proportional to *T*. According to the reported works, using longer segment length and lower supply voltage is more energy efficient. However, it suffers great speed penalty while the repeater drives a long segment. Assume all the inverters in Fig. 5-1 and Fig. 5-2 are identical and the sizes of the switches are as same as the inverter. The capacitance of the inverter is  $C_{INV}$ . For an example of 3X pre-driver, since the gate capacitance almost dominates the  $C_{INV}$ , the equivalent capacitance is  $5C_{INV}$  at the input node and  $3C_{INV}$  at the output node. Ideally, the voltage swing of the input node is  $V_{DD}$ , and the 3X pre-drivers produce large voltage swing of  $5V_{DD}$  by using bootstrap technique. We can represent the power of the 3X pre-driver as in (5-3).

$$P_{T,3X} \approx \frac{L}{h} \Big[ 5C_{INV} + (6-1)^2 \beta \cdot 3C_{INV} \Big] fV_{DD}^2 + fC_{wire} V_{DD}^2 + P_{Leakage} \\ = \frac{L}{h} \Big[ (75\beta + 5)C_{INV} \Big] fV_{DD}^2 + fC_{wire} V_{DD}^2 + P_{Leakage}.$$
(5-3)

Similarly, we can derivate the total power of the 4X one as in (5-4).

$$P_{T,4X} \approx \frac{L}{h} \Big[ 7C_{INV} + (8-1)^2 \beta \cdot 3C_{INV} \Big] fV_{DD}^2 + fC_{wire} V_{DD}^2 + P_{Leakage}$$

$$= \frac{L}{h} \Big[ (147\beta + 7)C_{INV} \Big] fV_{DD}^2 + fC_{wire} V_{DD}^2 + P_{Leakage}.$$
(5-4)

As compared to the power contribution at boosted nodes, the switching power due to the large voltage swing at the boosted nodes is the dominant term. As a result, the switching power contributed by input parasitic capacitance can be ignored. Thus, a general form of a single repeater power can be represented as in (5-5).

$$P_{rep,kX} \approx \alpha f \beta (2k-1)^2 C_{BT} V_{DD}^2.$$
(5-2)

Where  $\beta$  is the boosting efficiency;  $C_{BT}$  is the total capacitance at boosted nodes; k is the boosting gain. Combined with the switching power for the wire, the total energy consumption is

$$E_{T,kX} \approx \frac{\alpha}{2} \left[ \frac{L}{h} \beta (2k-1)^2 C_{BT} + C_{wire} \right] V_{DD}^2 + P_{Leak,BT} \cdot T.$$
(5-3)

Fig. 5-6 shows the repeaters driving a 0.5 pF capacitive load under 0.1–0.3V  $V_{DD}$ . The proposed repeaters and the conventional repeater use the same output driver. All the circuits operate at their highest speed. The data rate of the 3X boosting is pre-drivers almost 100 times higher than the conventional one. When these two circuits are operated at 0.15V, the energy of the proposed design is even lower than the conventional one, even though the data rate is 100X. This is because the proposed one reduces the leakage current effectively.



Fig. 5-6. Comparison of speed and energy with different repeaters.

Fig. 5-7 compares the proposed design with the repeaters of different boosting pre-dirvers in the clock link. The total length of the interconnect is fixed at 10mm with minimum wire spacing under coplanar ground shielding. The 10mm interconnect is segmented for various interconnect lengths along the X-axis. Fig. 5-7 indicates that the 3X pre-driver can achieve the highest data rate and energy efficiency simultaneously. Using 2X pre-driver is the most energy efficient but the speed is much slower than 3X and 4X ones. The 4X pre-driver is more suitable for driving long segment length, which performs good trade-off between speed and energy efficiency.

In order to compare the energy efficiency with four different repeaters, they are designed to achieve 5Mbps under 0.15V where the segment length of the inverter and the 2X repeater is 1mm and the 3X and the 4X ones are 2.5mm and 5mm, respectively. The simulation results are shown in Fig. 5-8. Accordingly, we can find that the 3X and the 4X designs have good energy efficiency below 0.2V.


Fig. 5-7 Comparison of clock links as function of segment length.



Fig. 5-8. Comparison of interconnect designs at different  $V_{DD}$ .

### 5.2.3. Boosting Efficiency

Similar to the discussion in chapters 3 and 4, the boosting efficiency is the index as regard as the boosting ability. According to the charge sharing effect, 3X and 4X pre-drivers may have penalty due to the parasitic loads on boosting path. Fig. 5-9 shows the boosting factor and boosting efficiency in practical cases. Although the boosting efficiency of the 3X and the 4X pre-drivers performs less efficiency than 2X pre-driver, they still have higher boosting factor to gain more driving capability.



Fig. 5-9. Comparison of boosting efficiency of proposed repeaters.

#### 5.2.4. Monte Carlo Simulations

The variability of  $I_D$  becomes worse due to the process and voltage fluctuation as the supply voltage goes lower. According to boosting pre-driver, devices are operated in triode region so as to have less process fluctuation. Since sub-threshold circuits suffer severe process variation problems, Monte Carlo simulations are used to investigate the effects. Device mismatch, threshold voltage  $V_{th}$  and process corner variation are assumed to have Gaussian random distribution.

Four types of repeaters are discussed. As compared to the 3X pre-driver, the conventional inverter was designed to be 240 times the size of the bootstrapped driver due to the iso-area condition.

The analysis is setup to find out the distribution of the maximum clock rate and the variability ratio. The maximum clock rate is the highest speed in each Monte Carlo sample and the variability ratio is defined as  $f_{max}/f_{min}$ . Under  $3\sigma$  variation, we simulated the designs at 10 different clock rates. The number of samples in each clock rate is 1000. The CDFs and PDFs of the achievable maximum clock rate are shown in Fig. 5-10 in which X-axis is logarithmic scale of data rate. Fig. 9 also shows mean  $\mu$ , standard deviation  $\sigma$ , minimal clock rate  $f_{min}$ , and maximum clock rate  $f_{max}$ . The 3X pre-driver has the minimal  $f_{max}/f_{min}$  ratio of 8, as compared to 32, 12.0 and 24.0 of the inverter, the 2X and 4X pre-drivers, respectively. In addition, the 2X and 3X pre-drivers have better performance on standard deviation as compared to the conventional one.



Fig. 5-10. Monte Carlo analysis of data rate.

The impact resulting from temperature fluctuation is another important issue to the variation and reliability in a nano-scaled chip, especially under the sub-threshold supply operation. The sub-threshold current is highly depending on the temperature owing to the thermal voltage  $V_T$ . In contrast to the super-threshold region,  $I_D$  is increased as the temperature is raised. The temperature sensitivity of the threshold voltage is about 0.8 mV/°C, which has been discussed in [6]. As a result, when the proposed pre-drivers are operated at 0.15V supply, some of the devices are operated in the super-threshold region, the others in sub-threshold region. That means our proposed pre-drivers can compensate the variation due to temperature sensitivity. Fig. 5-11 shows the Monte Carlo simulation of the latency variation of a 10mm interconnect under the temperature conditions of -40°C, 25°C and 125°C, respectively. The number of samples in each temperature corner is 300. Obviously, the 3X and 4X boosting pre-drivers provide higher concentration on temperature fluctuation.



Fig. 5-11. Monte Carlo analysis of the latency of a 10mm interconnect.

# **5.3. Experiment and Measurement Results**

#### 5.3.1. Chip Implementation

A test chip has been designed and fabricated in 65nm 1P10M SPRVT. The test chip includes four on-chip buses- the 2X, 3X, 4X pre-driving repeaters and the conventional inverter, as shown in Fig. 5-12. Four-bit pseudo-random bit sequences (PRBS) are generated and passed through an H-to-L level shifter to adjust the voltage swing to 0.1-0.3 V. An extra input I/P is provided to switch between a tunable clock signal or random data. Each on-chip bus has three channels. Each channel is 10-mm long with a wire spacing of 100nm for ground shielding in Metal5. The bus using 2X pre-drivers and the conventional repeater is divided into 10 segments, and into 4 and 2 segments with 3X and 4X pre-drivers, respectively. In each boosting pre-driver, 100fF MIM capacitors serve as the bootstrap capacitors. Level shifters are used for the I/O circuit. Fig. 5-13 shows the photograph of the die. The total area with I/O pads is 1400 $\mu$ m×1400 $\mu$ m.



Fig. 5-12. Block diagram of test circuits.



Fig. 5-13. Die photo and cell layout.

#### 5.3.2. Measured Waveforms

Fig. 5-14 shows the measured data eye diagram waveforms under a 0.15V supply. A  $2^9-1$  bit PRBS sequence is used as the input random data. Fig. 5-15(a) and (b) show the simulated and measured data rate and energy efficiencies of the all buses. The TT process corner is used in the post-layout simulation to ensure consistency with the measurements. In general, the measured results coincide with the simulated ones. The bus with 2X boosting pre-driver can operate at 1.5MHz clock or 2.5Mbps data under 0.15V with an energy efficiency of 32.4fJ/bit. For the bus with 3X boosting pre-driver, they are 3MHz, 5Mbps and 35.2fJ/bit. For the 4X bus, they are 1.1MHz, 1.5Mbps, and 32.8fJ/bit. According to the interconnect parameters from the datasheet, the energy dissipation of the wires is 20.3fJ/bit (0.5·  $fC_{wire}V_{DD}^2$ ). It shows the proposed buses performs well energy efficiency and are close to the limit.



Fig. 5-14. Measured waveform under 0.15 V core  $V_{DD}$  (600 mV~ 800 mV I/O  $V_{DD}$ ).



Fig. 5-15. Comparisons with measured and post-simulation results. (a) Data rate at different  $V_{DD}$ , (b) energy rate at different  $V_{DD}$ .

TABLE 5-1 summarizes the performance of the test chip, and TABLE 5-2 compares to the previous works. The FoM is used to compare the performance. FoM<sub>1</sub> is defined as the energy per bit. FoM<sub>2</sub> is the data rate normalized to pitch-power product [44]. The proposed design can operate in the sub-threshold region under a supply voltage of 0.15V. The energy per bit is 35.2fJ/bit for the 3X pre-driver, and 32.8fJ/bit for the 4X pre-driver. This indicates that the proposed designs are more energy-efficient than the others. The comparisons with FoM<sub>2</sub> show that the proposed ones are also more area efficient than the others.

| Process                   | 65nm 1P10M SPRVT Low-K CMOS |                 |            |  |  |
|---------------------------|-----------------------------|-----------------|------------|--|--|
| V <sub>th</sub>           | NMOS: 230mV; PMOS: –190mV   |                 |            |  |  |
| Core Supply<br>Voltage    | 0.1 ~ 0.3V                  |                 |            |  |  |
| Interconnect<br>length    | 10mm                        |                 |            |  |  |
| Segment length            | 2X (h=1mm)                  | 3X (h=2.5mm)    | 4X (h=5mm) |  |  |
| Max. Clock<br>@0.15V      | 1.5MHz                      | 3MHz            | 1.1MHz     |  |  |
| Max. Data rate<br>@0.15V  | 2.5Mbps                     | 5Mbps           | 1.5Mbps    |  |  |
| Energy per bit<br>(fJ/Ch) | 32.4                        | 35.2            | 32.8       |  |  |
|                           | 2X Bus                      | 758µm x 135µm   |            |  |  |
| Core                      | 3X Bus                      | 732µm x 254µm   |            |  |  |
| Layout Area               | 4X Bus                      | 717µm x 89µm    |            |  |  |
|                           | Whole Chip                  | 1400µm x 1400µm |            |  |  |

 TABLE 5-1.
 Chip Summary

## 5.4. Summary

This chapter has successfully explored on-chip bus design under 0.15 V. The proposed 3X and 4X boosting pre-driver improves the energy efficiency and the data rate simultaneously. According to Monte Carlo analysis, the proposed design has a smaller peak-to-peak variability under the device mismatch and process variation. A test chip in 65 nm 1P10M SPRVT CMOS process has been designed and fabricated. The measured results verify that the proposed 3X (4X) pre-driver achieves a 3 MHz (1.1 MHz) clock rate and 5 Mbps (1.5 Mbps) data rate at 0.15V  $V_{DD}$ . The energy-efficiency is 35.2 fJ/bit (32.8 fJ/bit). In addition, it has highest data rate, normalized to the power and pitch product, as compared to the others.

|                            | TVLSI'08[17]           | TCAS2'12 [45] | Prop     | osed     |
|----------------------------|------------------------|---------------|----------|----------|
| Technology                 | 180nm                  | 90nm          | 65nm     |          |
| Supply voltage (V)         | 0.4                    | 0.2           | 0.       | 15       |
| Repeater Topology          | 2X BT                  | 2X BT         | 3X BT    | 4X BT    |
| Total length (mm)          | 80                     | 10            | 10       | 10       |
| Segment length (mm)        | 10                     | 1             | 2.5      | 5        |
| Single/ Differential       | Single                 | Single        | Single   | Single   |
| Сар. Туре                  | MOS Cap.               | MOM Cap.      | MIM Cap. | MIM Cap. |
| Width (µm)                 | N/A                    | 0.14          | 0.1      | 0.1      |
| Spacing (µm)               | N/A                    | 0.14          | 0.1      | 0.1      |
| Data rate (Mbps)           | )ata rate (Mbps) ★9MHz |               | 5        | 1.5      |
| *FoM <sub>1</sub> (fJ/bit) | 444                    | 50            | 35.2     | 32.8     |
| *FoM₂ N/A (Mbps/ µW·µm)    |                        | 35.7          | 71.4     | 75.8     |

 TABLE 5-2.
 Comparisons

★ only shows clock rate.

\*  $FoM_1 = \frac{Power(\mu W)}{Data rate(Mbps)} = Energy per bit; FoM_2 = \frac{Data rate(Mbps)}{Power(\mu W) \cdot Pitch(\mu m)}$ 

# **Chapter 6**

# **Near-threshold ADPLL**

For the sustainable electronic devices, ultra-low power design is essential to prolong the battery lives. According to  $P = fCV^2$ , scaling the supply voltage down is the most effective way to reduce the power consumption. According to the forecast from the International Technology Roadmap for Semiconductors (ITRS), the supply voltage will be scaled to 0.5V for low-power applications within the next generation [46]. Recently, some 0.5V biomedical applications have been reported [47-48]. In addition, some important analog building blocks have been developed with a 0.5V supply at MHz level [49-50].

*Phase-locked loops* (PLLs) are key building blocks in integrated circuits. Several clock circuits scaled to 0.5V are reported using analog approaches [51-53]. *All-digital PLLs* (ADPLLs) are popular alternative to analog PLLs for their portability and scalability. Additionally, ADPLLs have no DC power dissipation. For a PLL, the oscillator is the most power starving building block even in near-threshold operation. Although LC oscillators have superior phase noise, ring oscillators are often chosen due to power and area considerations. The *digitally-controlled oscillator* (DCO) presented in [54] is composed of a 12-bit DAC and a current-controlled oscillator using 260 uA bias current. However, the high resolution DAC requires extra power and area overhead. In order to enhance the driving capability and linear control range, a 0.5V 8-phase *voltage-controlled oscillator* (VCO) with a bulk-driven technique is reported in [53]. It successfully modulates threshold voltage  $V_{th}$  by slightly increasing the leakage current. [55] takes an all digital approach. It uses a large number of digital delay cells and paths that it makes difficult to reduce the power due to its parasitic loads. Several DCOs are composed of a supply-regulated ring oscillator and a digitally-controlled resistance network (DRN) [56-57]. Here, linearity and complexity are major designs issues for DRNs.

In this chapter, we present a near-threshold supply ADPLL with *bootstrapped digitally-controlled ring oscillator* (BDCO) to operate at 0.25-0.5V. The BDCO is composed of a *bootstrapped ring oscillator* (BTRO) and a *weighted thermometer-controlled resistance network* (WTRN). The proposed bootstrapped delay cell generates large gate voltage swing to improve the driving capability. The boosted output swing keeps the transistors operate in linear region to have high linearity under a near-threshold supply.

The rests of the chapter are organized as follows. Section 6.1 introduces the proposed ADPLL. The analyses of performance evaluation are described in Section 6.2. In Section 6.3, the test chip and the experimental results are given. Finally, the comparisons and the conclusion are drawn in Section 6.4.

# 6.1. Architecture of Proposed All-Digital PLL

The proposed ADPLL, as shown in Fig. 6-1, consists of a *phase frequency detector* (PFD) to detect the phase error, a *phase selector* (PS) to reroute the signal path, a *time-to-digital converter* (TDC) to convert the phase error into digital code, a *digital loop filter* (DLF) to filter out the high frequency noise, a DCO to generate the required output frequency, and a *divider* (DIV) to divide and feed back the output frequency. To improve the resolution of the DCO, a 4-bit *sigma-delta modulator* (SDM) is used for the dithering.



Fig. 6-1. Block diagram of the proposed ADPLL

# 6.1.1. PFD, PS and TDC

PFD, PS and TDC together can be regarded as digital phase detector. PFD produces UP and DN signals to indicate the phase error. The circuit diagram is shown in Fig. 6-2(a). It is designed as a dynamic circuit to operate at high frequency. In order to have the correct phase arrangement for the TDC, two signals are reroute by PS, as illustrated in Fig. 6-2(b) [57].

TDC is based on a Vernier delay line, as shown in Fig. 3 [59]. It requires proper phase order for the conversion. As LEAD and LAG signals propagate in their independent delay chain, the timing difference between the two signals decreases by  $\Delta T$  in each stage, where  $\Delta T$  is defined as the resolution of the Vernier TDC. In the proposed ADPLL, a 4-bit TDC is designed with 20ps resolution at 0.5V. The phase comparators compare the phases of the delayed LEAD and LAG signals and produce a thermometer code. Each comparator is composed of two cross-coupled latches as depicted in Fig. 6-3. Finally a *thermometer-to-binary* (T2B) decoder converts the thermometer code to a 4-bit binary one.



(b)

Fig. 6-2. Circuit schematics of (a) PFD and (b) PS.



Fig. 6-3. Circuit schematic of the TDC.

# 6.1.2. DLF

The DLF is a  $2^{nd}$  order digital filter whose parameters are obtained by a bilinear transformation from its analog counterpart, as depicted in Fig. 6-4. It contains two signal paths, the proportional path ( $K_p$ ) and the integral path ( $K_1$ ). The transfer function is

$$H(s)_{ALF} = \frac{V(s)}{I(s)} = R + \frac{1}{sC}.$$
 (6-1)

Z-domain transfer function for representation of the DLF is in (6-2).

$$H_{DLF}(z) = K_{P} + K_{I} \frac{1}{1 - z^{-1}} = \frac{(K_{P} + K_{I}) - K_{P} z^{-1}}{1 - z^{-1}}.$$
 (6-2)

The Z-domain equations can be converted to the S-domain equations according to bilinear transformation [58], as written in (3).

$$s = \frac{2}{T_s} \frac{1 - z^{-1}}{1 + z^{-1}}.$$
 (6-3)

Here  $T_S$  is the sampling period of the reference clock in the ADPLL. The integrator is expressed as  $\frac{1}{1-z^{-1}}$  in Z-domain. Thus, while converting to Z-domain by bilinear transformation, Eq. (6-1) can be rewrite as

$$H(z)_{ALF} = \frac{\left(\frac{T_{S}}{2C} + R\right) + z^{-1}\left(\frac{T_{S}}{2C} - R\right)}{1 - z^{-1}}.$$
(6-4)

According to equations (6-2) and (6-4), the parameters  $K_p$  and  $K_I$  of the DLF are expressed as

$$K_{\rm p} = R - \frac{T_{\rm s}}{2C}, \quad K_{\rm I} = \frac{T_{\rm s}}{C}.$$
 (6-5)

Following the mentioned steps, we can obtain the design parameters of the DLF in the proposed ADPLL.



Fig. 6-4. Circuit schematic of DLF.

# 6.1.3. Bootstrapped Digitally-Controlled Oscillator

Based on our previous work [58], the proposed monotonic *bootstrapped DCO* (BDCO) is composed of a 5-stage BTRO with its supply voltage  $V_C$  connected to a WTRN, as shown in Fig. 6-5. For near-threshold operation, linearity and variability are two major concerns. The techniques we use to overcome these two problems are detailed as follows.



Fig. 6-5. Circuit schematics of the BDCO and BTRO.

## 6.1.3.1. Bootstrapped Ring Oscillator

In order to operate in the near-threshold region, a *bootstrapped ring oscillator* (BTRO) has been proposed [58], as shown in Fig. 6-5. The bootstrapped delay cell produces an output swing of  $-V_C$  to  $2V_C$  ideally. The transient waveforms are illustrated in Fig. 6-6. When  $V_{in}=2V_C$ ,  $N_{OP}=0$ and  $N_{BP}$  is precharged to  $V_C$  by  $M_{P1}$ . After  $V_{in}$  transits to  $-V_C$ ,  $N_{OP}$  rises to  $V_C$  and boosts  $N_{BP}$  to  $2V_C$ . The boosted  $2V_C$  at  $N_{BP}$  is transferred to  $V_{out}$  via  $M_{P2}$ .  $2V_C$  ( $-V_C$ ) output voltage pushes NMOS (PMOS) transistors of the next cells into super-threshold region and increases their driving capability. It also suppresses the PMOS (NMOS) leakage current exponentially. As a result, we are able to increase the operation frequency without leakage problem by using large transistors. Since transistors are operating in super-threshold region, they have better linearity and immunity against process variation.



Fig. 6-6. Simulated transient waveforms of a five-stage bootstrapped ring oscillator.

## 6.1.3.2. Weighted-Thermometer Code Control

The proposed WTRN is illustrated in Fig. 6-7. It controls  $V_C$  for BTRO. In addition to the fully thermometer code in [56], the weighted code is used to have better linearity. The resistance network consists of 9-bit PMOS transistor arrays, binary-to-thermometer (B2T) code converters and an SDM. Fully thermometer control occupies large area with complicated wiring. Hybrid architecture of binary and thermometer control is reported in [57] and costs less chip area. Because the PMOS arrays are no longer binary weighted to obtain a better linearity, the proposed PMOS arrays are arranged in a segmented thermometer code with a dedicated transistor sizing. There are a total of 13 control bits, two for coarse tune, three for medium tune, four for fine tune, and four for dithering by a SDM to further improve the resolution. In order to improve the conductivity at sub-0.5V, only four PMOS transistors stacked in each column. Figure 6-8 shows the DCO output frequency versus the coarse and medium control codes. As compared to the binary weighted, the proposed BDCO has better linearity with a gain of 563 kHz/code in TT corner.



Fig. 6-7. Detail circuit schematic of the BDCO with the WTRN



Fig. 6-8. DCO output frequency versus coarse codes in corners

# 6.1.4. SDM

To improve the resolution of the BDCO, a 4-bit 1<sup>st</sup>-order SDM is used to dither the least-significant bit (LSB). Figure 8 shows its circuit diagram. It consists of a 4-bit adder and a register. With the SDM dithering, the BDCO has equivalently 16 times the resolution improvement. The parameters of the ADPLL are listed in TABLE I with a target of 400 MHz at 0.5V.



Fig. 6-9. Block diagram of the SDM.

| Parameters                          |                                                                  |
|-------------------------------------|------------------------------------------------------------------|
| Loop bandwidth                      | 1.25MHz                                                          |
| DCO gain                            | 563kHz/code                                                      |
| Digital loop filter<br>coefficients | K <sub>P</sub> =2 <sup>-1</sup> ;K <sub>1</sub> =2 <sup>-4</sup> |
| TDC resolution                      | 20ps                                                             |
| Divider number                      | 16                                                               |

Table 6-1. Design parameters of the proposed ADPLL

# **6.2. Detailed Evaluation on BTRO**

# 6.2.1. Power Analysis of BTRO



Fig. 6-10. Power analysis of the BTRO

For a PLL, oscillator consumes most. Different from an analog VCO in which constant biasing current is the major power consumption, DCO consumes no DC current. However, the dynamic power is major concern, especially for BTRO due to its large output swing from  $-\beta V_C$ to  $\beta 2 V_C$ .  $\beta$  is the boosting efficient factor [15]. As shown in Fig. 6-10, the total capacitance at the node  $V_{out}$  is  $C_{OP}$  of this stage and  $C_{IP}$  of the next stage. In addition,  $C_{INV}$  denotes the total capacitance at the output nodes of the INV<sub>P</sub> and INV<sub>N</sub>, where the output swings are for *GND* to  $V_C$ . As a result, the total dynamic power consumption of the 5-stage BTRO is

$$P_{BTRO} \approx 5f \left[ \left( C_{IP} + C_{OP} \right) \left( \beta 2V_C + \beta V_C \right)^2 + C_{INV} V_C^2 \right] \\\approx f \left[ 45\beta^2 \left( C_{IP} + C_{OP} \right) + 5C_{INV} \right] V_C^2.$$
(6-6)

There are several leakage current paths in a bootstrapped delay cell. As shown in Fig. 9, take  $V_{in}=\beta 2V_C$  as an example, one is from pre-charge node N<sub>BP</sub> to the output through M<sub>P2</sub>, and another from the ground to the boosted node through M<sub>N1</sub>. Since  $\beta 2V_C$  is applied to the gate of M<sub>P2</sub> and - $V_C$  to that of M<sub>N1</sub>, all these transistors are biased with negative  $V_{GS}$ . Similarly, the other two paths are on the INV<sub>P</sub> and INV<sub>N</sub>. As a result, all leakage currents are significantly reduced such that they can be neglected.

### 6.2.2. Linearity Analysis of BTRO

For a VCO/DCO, the tuning linearity is very important which affects tracking and locking behavior as well as jitter performance. For the proposed 5-stage ring oscillator, the period is  $10T_D$ , where  $T_D$  is the single stage delay. Assume that the rising and falling time is not exactly the same, and the  $T_D$  then can be represented as

$$T_D = 0.5 \left( \tau_{PHL_C} + \tau_{PLH_C} \right).$$
(6-7)

Here  $\tau_{PHL}$  and  $\tau_{PLH}$  are the propagation delays measured from the time of input change to the time of the corresponding output from H to L and L to H, respectively. The linearity can analyzed based on  $\tau_{PHL}$ . We take a 5-stage inverter-based VCO as an example. Assume the characteristics of PMOS and NMOS are very similar and a load  $C_L$  refers to the effective load capacitance at output node of the single stage. As shown in Fig. 6-11,  $C_L$  is dis-charged by the NMOS with a  $V_{GS}=V_{DD}$ . Since the VCO is operated in the near-threshold region, the maximum  $V_{DD}$  is 0.5V. According to the state equation,  $\tau_{PHL}$  c can be the integration as in (6-8).

$$\tau_{PHL_{-}C} = \int_{0.5V_{DD}}^{V_{DD}} \frac{C_{L}}{I_{DN}} dV_{out} .$$
(6-8)



Fig. 6-11. Delay time calculation for an inverter-based ring oscillator.

According to the switching characteristics, the switching operation consists of two intervals due to the threshold voltage  $V_{th}$  [61]. The switching operation at near-threshold supply is either in saturation with a  $V_{DD}$  above threshold voltage or in sub-threshold with a  $V_{DD}$  below threshold voltage. Thus, we can rewrite (6-8) as

$$\tau_{PHL_C} = \tau_{PHL\_Sat} + \tau_{PHL\_Sub} \,. \tag{6-9}$$

When the ring oscillator is operated above the threshold voltage, the NMOS has a saturation current, as expressed in (6-10) [62]. Thus, we can derivate  $\tau_{PHL_C,Sat}$  as in (6-11) according to I-V equation in saturation region.

$$I_{D,Sat} = \frac{1}{2} \mu C_{ox} \frac{W}{L} (V_{DD} - V_{th})^2 (1 + \lambda V_{DD}).$$
(6-10)

$$\tau_{PHL_C,Sat} = 2C_L \cdot \ln \left| \frac{1 + \lambda V_{DD}}{1 + \lambda V_{th}} \right| \cdot \left[ \mu C_{ox} \frac{W}{L} (V_{DD} - V_{th})^2 \right]^{-1}.$$
(6-11)

Where  $\mu$  is the effective mobility;  $C_{ox}$  is the gate oxide capacitance per unit area; W and L are the width and length of the device;  $V_{th}$  is the threshold voltage, and  $\lambda$  is the factor for channel-length modulation. On the other hand, when the VCO operates below the threshold voltage, according to sub-threshold current in (6-12),  $\tau_{PHL_{c,Sub}}$  is rewritten as in (6-13).

$$I_{Sub} = \mu C_{dep} \frac{W}{L} V_T^2 \exp(\frac{V_{DD} - V_{th}}{nV_T}) \left( 1 - \exp(\frac{-V_{DD}}{V_T}) \right).$$
(6-12)

$$\tau_{PHL_C,Sub} = C_L \left[ \left( V_{th} - \frac{V_{DD}}{2} \right) + V_T \ln \left| \frac{1 - \exp(\frac{-V_{DD}}{2V_T})}{1 - \exp(\frac{-V_{th}}{V_T})} \right| \right]$$

$$\cdot \left[ \mu C_{dep} \frac{W}{L} V_T^2 \cdot \exp(\frac{V_{DD} - V_{th}}{nV_T}) \right]^{-1}.$$
(6-13)

Where  $C_{dep}$  is the depletion capacitance;  $V_T$  is the thermal voltage; and *n* is the sub-threshold slope factor. Obviously, the gate delay characteristics of the inverter-based ring oscillator are separated into two different regions. According to (6-11) and (6-13), both of these two regions are not proportional to the reciprocal of  $V_{DD}$ . As a result, the inverter VCO is not a linear supply-regulated VCO.

As compared to inverter VCO, the BTRO features boosted swings from  $-\beta V_{DD}$  to  $\beta 2V_{DD}$  to push the INV<sub>P</sub> and INV<sub>N</sub> operating in the triode region. The driving current is represented in (6-14). The propagation delay of the falling edge,  $\tau_{PHL_BT}$  is illustrated in Fig. 6-12. We can derivate  $\tau_{PHL_BT}$  from (6-8) and (6-14) to (6-15).



Fig. 6-12. Delay time calculation for the BTRO.

$$I_{D,BT} = \mu C_{ox} \frac{W}{L} \left[ (\beta 2 V_{DD} - V_{th}) V_{DD} - \frac{1}{2} V_{DD}^2 \right]$$
(6-14)

$$\tau_{PHL_BT} = C_L \cdot \ln \left| \frac{(8\beta - 1)V_{DD} - V_{th}}{(4\beta - 1)V_{DD} - V_{th}} \right| \cdot \left[ \mu C_{ox} \frac{W}{L} (2\beta V_{DD} - V_{th}) \right]^{-1}$$
(6-15)

Thus, we can obtain the period of the BTRO from the  $\tau_{PHL_BT}$  and  $\tau_{PLH_BT}$ . As a result of

(6-15), the frequency of the BTRO is highly proportional to the reciprocal of  $(2\beta V_{DD} - V_{th})$ , which is suitable for supply-regulated VCO in the near-threshold region.

For a design example for 5-stage supply-regulated VCO, the VCO transfer curves at 25°C in different process corners are shown in Fig. 6-13. As compared to an inverter VCO, BTRO has higher linearity at near-threshold region and is less affected by the process variation.



Fig. 6-13. Comparisons of the VCOs transfer curve with supply-regulation.

### **6.3. Experimental Results and Comparisons**

# 6.3.1. Chip Implementation

The proposed ADPLL has been fabricated in 90nm 1P9M SPRVT CMOS process. The test chip includes two test circuits, the proposed BTRO and the ADPLL. Figure 6-14 shows the block diagram of the test circuits. Multi-stage bootstrapped level shifters with an intermediate supply voltage  $V_{M_{\perp}VO}$  are used for driving open drain devices. Figure 6-15 shows the chip micrograph. The overall active area of the BTRO and the ADPLL is 31.5 µm×61.5 µm and 326 µm×175 µm, respectively. The test chip is mounted on an FR4 test board with SMA connectors, as shown in Fig. 6-16. An Agilent 81130A pulse generator provides the reference clock; an Agilent 54382D is used to measure output waveforms and its jitter performance. A Keithley 2400 power meter provides DC power and measures power consumptions. Phase noise was measured using an Agilent E4440A Spectrum Analyzer.







Fig. 6-15. Micrograph of the test chip.



Fig. 6-16. Photo of the FR4 test board.

# 6.3.2. Measured Results

Figures 6-17(a) and 6-17(b) show the measured output waveforms of the BTRO at 0.2 and 0.6V. The detail frequency/power versus 0.2-0.6V  $V_{DD}$  plots of the BTRO are shown in Fig. 6-18. These measured results match the simulated ones in TT corner. As to the oscillation frequency versus the supply voltage, the BTRO has a relatively linear behavior near the threshold region.



(b)

Fig. 6-17. Measured output waveforms of the BTRO at (a) 0.6V  $V_{DD}$ ; (b) 0.2V  $V_{DD}$ .



Fig. 6-18. Comparisons with measured and simulation results.

A locked clock waveform at 400 MHz is illustrated in Fig. 6-19. The measured jitter histogram shows that the output rms jitter and peak-to-peak jitter are 9.37ps and 69.1ps, respectively. The output frequency range of the proposed ADPLL is from 36.8 MHz to 480 MHz under a supply voltage of 0.25 to 0.5V. Figures 6-20 and 6-21 show the measured results of output spectrum and phase noise at 0.5V and 0.25V  $V_{DD}$ , respectively. With a reference of 30 MHz (2.3 MHz), the measured spur at 480 MHz (36.8MHz) under a 0.5V (0.25V)  $V_{DD}$  is 42.5dB (39.9dB) below the carrier. The phase noise are -96.2dBc/Hz (-91.6dBc/Hz) at 1 MHz offset and -79.9dBc/Hz (-78.1dBc/Hz) at 10kHz offset when the output frequency is 480MHz (36.8MHz). Table 6-II summaries the major characters of the test chip.



Fig. 6-19. Measured output waveform of the proposed ADPLL.

#### Measured reference spur



#### Measured phase noise



Fig. 6-20. Measured spectrum and phase noise of the proposed ADPLL at 0.5V.



#### Measured phase noise



Fig. 6-21. Measured spectrum and phase noise of the proposed ADPLL at 0.25V.

| Process                     |       | 90nm 1P9M SPRVT                             |                              |  |  |
|-----------------------------|-------|---------------------------------------------|------------------------------|--|--|
| V <sub>th</sub>             |       | NMOS: 240mV; PMOS: 180mV                    |                              |  |  |
| Core Supply Voltage         |       | 0.25V to 0.5V                               |                              |  |  |
| Output frequency            |       | 36.8MHz to 480MHz                           |                              |  |  |
| Power                       |       | 2.8uW<br>@44.8MHz, 0.25V                    | 78uW<br>@480MHz, 0.5V        |  |  |
| Phase Noise<br>@1MHz offset |       | -87.1dBc/Hz<br>@44.8MHz, 0.25V              | -96.2dBc/Hz<br>@480MHz, 0.5V |  |  |
| Jitter (RMS)                |       | 7.8 to 21.5ps over all operation conditions |                              |  |  |
| Layout<br>Area              | BTRO  | 31.5um x                                    | 61.5um                       |  |  |
|                             | ADPLL | 326um x 175um                               |                              |  |  |

#### **TABLE 6-2 Test Chip Summary**

# 6.3.3. Comparisons

**TABLE 6-3 Performance Comparisons of Low-voltage oscillators** 

|                              | JSSC'05<br>[63]         | JSSC'08<br>[64]    | TCASII'09<br>[65]               | TCAS1'10<br>[66]     | TCAS1'11<br>[53]        | BTRO                   |
|------------------------------|-------------------------|--------------------|---------------------------------|----------------------|-------------------------|------------------------|
| Process                      | 180 nm                  | 65 nm              | 130 um                          | 180 nm               | 90 nm                   | 90 nm                  |
| Supply voltage<br>(V)        | 0.5                     | 0.5                | 18 <b>€0</b> .5                 | 0.6                  | 0.5                     | 0.2-0.6                |
| OSC-type                     | LC-VCO                  | Ring-DCO           | Ring-VCO                        | LC-VCO               | Ring-VCO                | Ring-VCO               |
| Output phase                 | 2                       | N/A 6 4            |                                 | 8                    | 10                      |                        |
| Tuning range                 | 3.65-3.97 GHz           | 0.09-1.25 GHz      | 306-725 MHz                     | 2.4-2.64 GHz         | 0.4-2.24 GHz            | 48-771 MHz             |
| Phase noise<br>@1 MHz offset | -119 dBc/Hz<br>@3.8 GHz | N/A                | -95 dBc/Hz<br>@ 550 MHz         | N/A                  | -87 dBc/Hz<br>@2.24 GHz | -89 dBc/Hz<br>@771 MHz |
| Power                        | 570 μW<br>@3.8 GHz      | 0.9 mW<br>@1.0 GHz | 210 <sub>µ</sub> W<br>@ 550 MHz | 10.8 mW<br>@2.64 GHz | 1.157 mW<br>@2.24 GHz   | 87.6 μW<br>@771 MHz    |
| Area                         | 0.23 mm <sup>2</sup>    | N/A                | 0.017 mm <sup>2</sup>           | N/A                  | 0.0017 mm <sup>2</sup>  | 0.0019 mm <sup>2</sup> |
| *Figure of merit             | 0.15 pJ                 | 0.9 pJ             | 0.382 pJ                        | 4.09 pJ              | 0.517 pJ                | 0.114 pJ               |

\* Figure of merit (FoM) =  $\frac{Power (\mu W)}{Freq. (MHz)}$  = Energy per cycle (pJ)

In order to compare performances of the VCOs/DCOs, TABLE 6-3 lists the results with some reported oscillators. The BTRO is able to operate at only 0.2V supply voltage. Additionally, the measured energy per cycle indicates that the BTRO is power efficient. TABLE.6-4 summaries recent state-of-the-art PLLs using a near-threshold supply. The previous works [7, 21] achieve great phase noise with LC-VCO. However, these designs occupy a large die area using passive resonant elements and provide only two or four phases of output frequency. On the

contrary, ring-VCO PLLs have area efficient and more phases of output frequency but inherent inferior phase noise. The proposed ADPLL has 10-phase output frequency and consumes 78  $\mu$ W at 480 MHz under a  $V_{DD}$  of 0.5V, which is occupied 53.8% by the DCO. The proposed design can work even at  $V_{DD} = 0.25$ V with a lock range of 36.8 to 44.8MHz. In terms of the *figure of merit* (FoM) in pJ/cycle, the proposed one is almost an order improvement.

|                                         | ISSCC'07<br>[55]                 | JSSC'08<br>[64]     | TCAS1'10<br>[66]    | JSSC'10<br>[67]    | T.CAS1'11<br>[53]   | This work           |                    |
|-----------------------------------------|----------------------------------|---------------------|---------------------|--------------------|---------------------|---------------------|--------------------|
| Process                                 | 90 nm                            | 65 nm               | 180 nm              | 130 nm             | 90 nm               | 90 nm               |                    |
| Supply<br>voltage (V)                   | Analog: 0.5<br>Digital: 0.65     | 0.5                 | 0.6                 | 0.6-1.6            | 0.5                 | 0.25                | 0.5                |
| Oscillator type                         | LC-VCO                           | Ring-DCO            | LC-VCO              | Ring-DCO           | Ring-VCO            | Ring-DCO            |                    |
| Output phase                            | 2                                | N/A                 | 4                   | N/A                | 8                   | 10                  |                    |
| Operating<br>frequency                  | 2.4-2.6 GHz                      | 0.09-1.25<br>GHz    | 2.4-2.64<br>GHz     | 10-500<br>MHz      | 0.4-2.24<br>GHz     | 36.8-44.8<br>MHz    | 0.176-0.48<br>GHz  |
| Power (mW)                              | 6                                | 1.65 mW<br>@1.0 GHz | 14.4 mW<br>@2.5 GHz | 7.2 mW<br>@0.5 GHz | 2.08<br>@2.24 GHz   | 0.0024<br>@36.8 MHz | 0.078<br>@0.48 GHz |
| RMS jitter<br>(ps)                      | N/A                              | 3<br>@1.0 GHz       | N/A                 | 39<br>@191 MHz     | 2.22<br>@2.24 GHz   | 7.8<br>@36.8 MHz    | 10.8<br>@0.48 GHz  |
| Reference<br>spur (dBc)                 | -52<br>@2.6 GHz                  | N/A                 | -39.83<br>@2.56 GHz | N/A                | -40.28<br>@2.24 GHz | -39.9<br>@36.8 MHz  | -42.5<br>@0.48 GHz |
| Area (mm <sup>2</sup> )                 | 0.14                             | 0.03                | 1.68<br>(w/i pads)  | 0.09               | 0.074               | 0.057               |                    |
| Phase noise<br>(dBc/Hz)<br>@1MHz offset | -121<br>@2.6 GHz<br>@3MHz offset | N.A.                | -105<br>@2.56 GHz   | N/A                | -87<br>@2.24 GHz    | -91.6<br>@36.8 MHz  | -96.2<br>@0.48 GHz |
| FoM (pJ)                                | 2.4                              | 1.65                | 5.76                | 14.4               | 0.93                | 0.065               | 0.163              |

TABLE 6-4 Comparisons of low-voltage PLLs.

# 6.4. Conclusions

A conventional PLL has been facing challenges scaled to near-threshold supply. A VCO (DCO) consumes of most power in PLL (ADPLL) and degrades severely when operating at near-threshold supply. In this chapter, the proposed BTRO performs high linearity and energy-efficiency under a supply voltage of 0.2-0.6V. In addition, we present a near-threshold supply ADPLL with the BDCO that allows an ADPLL to operate at 36.8 to 480MHz under a 0.25-0.5V supply with power consumption of only 2.4 to  $78\mu$ W. As compared to reported low voltage analog PLLs or ADPLLs, the proposed ADPLL provides 10 phases, saves more power and features more energy-efficient.

# **Chapter 7**

# Conclusions

This dissertation completes a near-threshold on-chip data link, which is composed of an ALBI for clock network, an ISBD for repeaters of on-chip bus, high-boosting pre-drivers, and an ADPLL served as a local oscillator.

The first work presents an ALBI operated with a sub-threshold power supply. In addition to improving the driving ability, a large gate voltage swing from  $-V_{DD}$  to  $2V_{DD}$  suppresses the sub-threshold leakage current. As compared to other reported works, the proposed bootstrapped inverter uses fewer transistors operated in sub-threshold region. Therefore, our design has shorter delay time. The Monte Carlo analysis results indicate that a sigma of delay time is only 2.9ns under the process variation with 0.2V operation. Additionally, a test chip is fabricated in the 90nm SPRVT Low-K CMOS process. Chip measurement results demonstrate the feasibility of operating 10-stage bootstrapped inverters with 200fF loading of each stage at a power supply of 0.2V. The test chip is able to achieve 10MHz operation under 0.2V; the power consumption is 1.01 $\mu$ W; and the leakage power is 107nW.

The second work presents a 40-130 fJ/bit/ch on-chip data link design under a 0.1-0.3V power supply. An ISBD is proposed to drive a 10mm on-chip bus. It features a  $-V_{DD}$  to  $2V_{DD}$  swing to enhance the driving capability and reduces the sub-threshold leakage current. Additionally, a pre-charge enhancement scheme increases the speed of the data transmission, and a leakage current reduction technique suppresses ISI jitter. A test chip is fabricated in a 55nm SPRVT Low-K CMOS process. The measured results demonstrate that for a 10mm on-chip bus, the achievable data rate is 0.8–100Mbps, and the energy consumption is 40–123fJ per bit under 0.1–0.3V.

The third work investigates the performance of the interconnects with repeater insertion in the sub-threshold region. A CMOS repeater with a 3X and 4X pre-driver is proposed to enhance the driving capability. As compared to the conventional repeater, the proposed ones have higher energy efficiency. A test chip with 3X and 4X pre-drivers for 10-mm on-chip bus has been fabricated in 65nm SPRVT CMOS process. The measured results show that the 3X (4X) pre-drivers can achieve 5Mbps (1.5Mbps) data rate at 0.15V with an efficiency of 35.2fJ

(32.8fJ).

The last work presents a low-power bootstrapped ring oscillator (BTRO) and a near-threshold low-power all-digital PLL. Since oscillator is the most power starving building blocks in PLLs, a BTRO is developed to operate at 0.2-0.6V. In addition, the BTRO provides high linearity at the near-threshold operation. Due to the boosted voltage swing, it achieves 771MHz under a supply of 0.6V and consumes only 87.6µW. Accordingly, a 9-bit bootstrapped DCO (BDCO) composed of a BTRO and a weighted thermometer-controlled resistance network is proposed. To improve the resolution of the BDCO, a 4-bit sigma-delta modulator is used for the dithering. It is applied to a low-power ADPLL and fabricated in a 90nm SPRVT Low-K CMOS process. The core area without output buffers is 0.057mm<sup>2</sup>. The measured results demonstrate that the proposed bootstrapped ring oscillator oscillates at 48 MHz (771MHz) with a power consumption of at  $0.63\mu$ W (87.6µW) under a supply voltage of 0.2V (0.6V)  $V_{DD}$ . Furthermore, the measured results also demonstrate that the proposed ADPLL oscillates from 36.8-480MHz with a power consumption of 2.4-78µW under a supply voltage of 0.25-0.5V  $V_{DD}$ .



# References

- A. Wang and A.P. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 1, pp. 310-319, Jan. 2005.
- [2] J. Wang, J. Chen, Y. Wang, and C. Yeh, "A 230 mV-to-500 mV 375 KHz-to-16 MHz 32b
   RISC core in 0.18 μm CMOS," *in IEEE Int. Solid-State Circuits Conf. (ISSCC) Digest of Tech. Papers*, Feb. 2007, pp. 294-604.
- [3] M. H. Tu, J. Y. Lin, M. C. Tsai, S. J. Jou, and C. T. Chuang, "Single-Ended Subthreshold SRAM With Asymmetrical Write/Read-Assist," *IEEE Trans. on Circuits and Systems I: Regular Papers*, vol. 57, no. 12, pp. 3039-3047, Dec. 2010.
- [4] D.C. Daly, and A.P. Chandrakasan," A 6-bit, 0.2 V to 0.9 V highly digital flash ADC with comparator redundancy," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 11, pp. 3030-3038, Nov. 2009.
- [5] W. H. Ma, J. C. Kao, V. S. Sathe, and M. C. Papaefthymiou, "187 MHz sub-threshold-supply charge-recovery FIR," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 4, pp. 793-803, Apr. 2010.
- [6] K. Roy, S. Mukhopadhyay, and H. M. Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," *Proceeding of the IEEE*, vol. 91, no. 2, pp. 305-327, Feb. 2003.
- [7] C. Y. Lu, and J. M. Sung, "Reverse short-channel effects on threshold voltage in sub-micrometer salicide devices," *IEEE Trans. Electron Device Letters*, vol. 10, no. 10, pp. 446-448, Jan. 1989.
- [8] T. H. Kim, J. Keane, H. Eom, and C. H. Kim, "Utilizing reverse short-channel effect for optimal subthreshold circuit design," *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol. 15, no. 7, pp. 821-829, Jul. 2004.
- [9] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, "Nanometer device scaling in sub-threshold logic and SRAM," *IEEE Trans. Electron Devices*, vol. 55, no. 1, pp. 175-185, Jan. 2008.
- [10] X. Yuan, J. E. Park, J. Wang, E. Zhao, D. C. Ahlgren, T. Hook, J. Yuan, V. W. C. Chan, H. Shang, C. H. Liang, R. Lindsay, S. Park, and H. Choo, "Gate-induced-drain leakage current"

in 45 nm CMOS technology," *IEEE Trans. Device and Materials Reliability*, vol. 8, no. 3, pp. 501-508, Sep. 2008.

- [11] D. Lee, D. Blaauw, and D. Sylvester, "Gate oxide leakage current analysis and reduction for VLSI circuits," *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol. 12, no. 2, pp. 155-166, Feb. 2004.
- [12] N. Verma, and A. P. Chandrakasan, "A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141-149, Jan. 2008.
- [13] D. Bol, R. Ambroise, D. Flandre, and J. D. Legat, "Interests and limitations of technology scaling for subthreshold logic," *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 10, pp. 1508-1519, Oct. 2009.
- [14] S. S. Sapatnekar, "Overcoming variations in nanometer-scale technologies," *IEEE J. Emerging and Selected Topics in Circuits and Systems*, vol. 1, no. 1, pp. 5-18, Mar. 2011.
- [15] S. R. Vemuru, "Effects of simultaneous switching noise on the tapered buffer design," *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol. 5, no. 3, pp. 290-300, Sep. 1997.
- [16] J. H. Lou and J. B. Kuo, "A 1.5-V full-swing bootstrapped CMOS large capacitive-load driver circuit suitable for low-voltage CMOS VLSI," *IEEE J. Solid-State Circuits*, vol. 32, no. 1, pp. 119-121, Jan. 1997.
- [17] J. Kil, J. Gu, and C. H. Kim, "A high-speed variation-tolerant interconnect technique for sub-threshold circuits using capacitive boosting," *IEEE Trans. Very Large Scale Integration* (VLSI) Systems, vol. 16, no. 4, pp. 456-465, Apr. 2008.
- [18] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A dynamic voltage scaled microprocessor system," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, Nov. 2000.
- [19] S. Das, C. Tokunaga, S. Pant, W. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. Blaauw, "RazorII: in situ error detection and correction for pvt and ser tolerance," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 32–48, Jan. 2009.
- [20] A. Drake, R. Senger, H. Deogun, G. Carpenter, S. Ghiasi, T. Ngyugen, N. James, and M. Floyd, "A distributed critical-path timing monitor for a 65 nm high-performance microprocessor," *in IEEE Int. Solid-State Circuits Conf. (ISSCC) Digest of Tech. Papers*, Feb. 2007, pp. 398–399.

- [21] M. Nakai, S. Akui, K. Seno, T. Meguro, T. Seki, T. Kondo, A. Hashiguchi, H. Kawahara, K. Kumano, and M. Shimura, "Dynamic voltage and frequency management for a low power embedded microprocessor," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 28–35, Jan. 2005.
- [22] J. T. Kao and A. P. Chandrakasan, "Dual-threshold voltage techniques for low-power digital circuits," *IEEE J. Solid-State Circuits*, vol. 35, no. 7, pp. 1009-1018, Jul. 2000.
- [23] Y. Pu, J. P. Gyvez, H. Corporaal, and Y. Ha, "An ultra-low-energy multi-standard JPEG co-processor in 65 nm CMOS with sub/near threshold supply voltage," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 668-680, Jan. 2010.
- [24] S. Mutoh, S. Shigematsu, Y. Matsuya, H. Fukuda, and J. Yamada, "A 1 V multi-threshold voltage CMOS DSP with an efficient power management technique for mobile phone application," *in IEEE Int. Solid-State Circuits Conf. (ISSCC) Digest of Tech. Papers*, Feb. 1996, pp. 168–171.
- [25] J. M. Carrillo, G. Torelli, R. Perez-Aloe, J. F. Duque-Carrillo, "1-V rail-to-rail CMOS Opamp with improved bulk-driven input stage" *IEEE J. Solid State Circuits*, vol. 42, no. 3, pp. 508-517, Mar. 2007
- [26] J. T. Kao, M. Miyazaki, and A. P. Chandrakasan, "A 175-MV multiply-accumulate unit using an adaptive supply voltage and body bias architecture" *IEEE J. Solid State Circuits*, vol. 37, no. 11, pp. 1545-1554, Nov. 2002
- [27] Y. L. Lo, and W. B. Yang, T. S. Chao, and K. H. Cheng, "Designing an Ultralow-Voltage Phase-Locked Loop Using a Bulk-Driven Technique," *IEEE Trans. Circuits and Syst. II*, vol. 56, no. 5, pp. 339-343, May 2009
- [28] B. Razavi, Design of Analog CMOS Integrated Circuits. New York: McGraw-Hill, 2001
- [29] K. Banerjee and A. Mehrotra, "A Power-Optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs," IEEE Trans. Electron Devices, vol. 49, no. 11, pp. 2001-2007, Nov. 2002.
- [30] M.L. Mui, K. Banerjee and A. Mehrotra, "A Global Interconnect Optimization Scheme for Nanometer Scale VLSI with Implications for Latency, Bandwidth, and Power Dissipation," *IEEE Trans. Electron Devices*, vol. 51, no. 2, pp. 195-203, Feb. 2004.

- [31] L. Xiao-Chun, and et al., "Global Interconnect Width and Spacing Optimization for Latency, Bandwidth and Power Dissipation," IEEE Trans. Electron Devices, vol. 52, no. 10, pp. 2272-2279, Oct. 2005.
- [32] H. Kaul, D. Sylvester, D. Blaauw, T. Mudge, and T. Austin, "DVS for On-chip Bus Designs Based on Timing Error Correction," *in Proceeding ofDesign, Automation and Test in Europe*, 2005, vol. 1, pp. 80- 85, Mar. 2005.
- [33] V.V. Deodhar and J.A. Davis, "Optimization of Throughput Performance for Low-Power VLSI Interconnects," IEEE Trans. Very Large Scale Integration Systems, vol. 13, no. 3, pp. 308-318, Mar. 2005.
- [34] Serial ATA International Organizations, "Serial ATA Revision 2.5," October 2005.
- [35] L. Chong-Fatt, Y. Kiat-Seng, and S. S. Rofail, "Sub-1V bootstrapped CMOS driver for giga-scale-integration era," *Electronics Letters*, vol. 35, no. 5, Mar. 1999.
- [36] J. C. Garcia, J. A. Montiel-Nelson, and S. Nooshabadi, "A single-capacitor bootstrapped power-efficient CMOS driver," *IEEE Trans. on Circuits and Systems II: Express Briefs*, vol. 53, no. 9, pp. 877-881, Sep. 2006.
- [37] J. W. Kim, and B. S. Kong, "Low-voltage bootstrapped CMOS drivers with efficient conditional bootstrapping," *IEEE Trans. on Circuits and Systems II: Express Briefs*, vol. 55, no. 6, pp. 556-560, Jun. 2008.
- [38] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, "Nanometer device scaling in sub-threshold logic and SRAM," *IEEE Trans. on Electron Devices*, vol. 55, pp. 175-185, no. 1, Jan. 2008.
- [39] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 9, pp. 1178-1186, Jan. 2005.
- [40] H. J. M. Veendrick, "Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," *IEEE Journal of Solid-State Circuits*, vol. sc-19, no. 4, pp. 468-473, Aug. 1984.
- [41] V. V. Deodhar and J. A. Davis, "Optimal voltage scaling, repeater insertion, and wire sizing for wave-pipelined global interconnects," *IEEE Trans. on Circuits and Systems I: Regular Papers*, vol. 55, no. 4, pp. 1023-1030, May 2008.

- [42] R. Ho, T. Ono, R. D. Hopkins, A. Chow, J. Schauer, F. Y. Liu, and R. Drost, "High speed and low energy capacitively driven on-chip wires," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 1, pp. 52–60, Jan. 2008.
- [43] E. Mensink, D. Schinkel, E. A. M. Klumperink, E. van Tuijl, and B.Nauta, "Power Efficient Gigabit Communication Over Capacitively Driven RC-Limited On-Chip Interconnects," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 2, pp. 447-457, Feb. 2010.
- [44] Y. Zhang, X. Hu, A. Deutsch, A. E. Engin, J. F. Buckwalter, and C. K. Cheng, "Prediction and Comparison of High-Performance On-Chip Global Interconnection," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 7, pp. 1154-1166, Jul. 2011.
- [45] Y. Ho, C. Chang and C. Su, "Design of a Sub-threshold-Supply Bootstrapped CMOS Inverter Based on an Active Leakage Current Reduction Technique," *IEEE Trans. on Circuits System. II*, vol. 59, no.1, pp. 55-59, Jan. 2012.
- [46] International Technology Roadmap for Semiconductors (2006). Available: http://public.itrs.net/
- [47] Y. T. Lin, Y. S. Lin, C. H. Chen, H. C. Chen, Y. C. Yang, and S. S. Lu, "A 0.5-V biomedical system-on-a-chip for intrabody communication system," *IEEE Trans. Industrial Electronics*, vol. 58, no. 2, pp. 690-699, Feb. 2011.
- [48] J. Kwong, and A. P. Chandrakasan, "An energy-efficient biomedical signal processing platform," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1742-1753, Jul. 2011.
- [49] J. Shen and P. R. Kinget, "A 0.5V 1.1MS/sec 6.3fJ/conversion-step SAR-ADC with tri-level comparator in 40nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 787-795, Apr. 2008.
- [50] A. Shikata, R. Sekimoto, T. Kuroda, and H. Ishikuro, "A 0.5V 1.1MS/sec 6.3fJ/conversion-step SAR-ADC with tri-level comparator in 40nm CMOS," *in Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2011, pp. 262–263.
- [51] H. H. Hsieh, C. T. Lu, and L. H. Lu, "A 0.5-V 1.9-GHz low-power phase-locked loop in 0.18-um CMOS," in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2007, pp. 164–165.
- [52] S. A. Yu and P. Kinget, "A 0.65V 2.5 GHz fractional-N frequency synthesizer in 90 nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2007, pp. 304–306.

- [53] K. H. Cheng, Y. C. Tsai, Y. L. Lo, and J. S. Huang, "A 0.5-V 0.4-2.24-GHz inductorless phase-locked loop in a system-on-chip," *IEEE Trans. Circuits and Systs. I*, vol. 58, no. 5, pp.849-859, May. 2011.
- [54] M. C. Chen, J. Y. Yu, and C. Y. Lee, "A Sub-100µW area-efficient digitally-controlled oscillator based on hysteresis delay cell topologies," *in Asian Solid-State Circuits Conf.* (ASSCC), Dig. Tech. Papers, Nov. 2009, pp. 89–92.
- [55] W. Khalil, S. Shashidharan, T. Copani, S. Chakraborty, S. Kiaei, and B. Bakkaloglu, "A 700uA 405-MHz all-digital fractional-frequency-locked loop for ISM band applications," *IEEE Trans. Microwave Theory and Techniques*, vol. 59, no. 5, pp.1319-1326, May. 2011.
- [56] D. H. Oh, D. S. Kim, S. H. Kim, D. K. Jeong , and W. C. Kim, "A 2.8Gb/s all-digital CDR with a 10b monotonic DCO," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2007, pp. 222–224.
- [57] S. Lin, and S. Liu, "A 1.5GHz all-digital spread-spectrum clock generator," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp.3111-3119, Nov. 2009.
- [58] V. Kratyuk, P. Hanumolu, U. K. Moon, and K. Mayaram, "A design procedure for all-digital phase-locked loops based on a charge-pump phase-locked loop analogy," *IEEE Trans. Circuits and Syst. II*, vol. 54, no. 3, pp.247-251, Mar. 2007.
- [59] P. Dudek, S. Szczepanski, and J. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line," *IEEE J. Solid-State Circuits*, vol. 35, no. 2, pp.240-247, Feb. 2000.
- [60] Y. Ho, Y. S. Yang, and C. Su, "A 0.2-0.6 V ring oscillator design using bootstrap technique," *in Asian Solid-State Circuits Conf. (ASSCC), Dig. Tech. Papers*, Jeju, Nov. 2011, pp. 333-336.
- [61] N. Weste and D. Harris, CMOS VLSI Design. Boston, MA: Addison-Wesley, 2005
- [62] B. Razavi, Design of Analog CMOS Integrated Circuits. New York: McGraw-Hill, 2001
- [63] K. Kwok and H. C. Luong, "Ultra-low-voltage high-performance CMOS VCOs using transformer feedback," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 652-660, Mar. 2005.
- [64] J. A. Tierno, A. V. Rylyakov, and D. J. Friedman, "A wide power supply range, wide tuning range, all static CMOS all digital PLL in 65 nm SOI," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 42-51, Jan. 2008.

- [65] Y. L. Lo, and W. B. Yang, T. S. Chao, and K. H. Cheng, "Designing an ultralow-voltage phase-locked loop using a bulk-driven technique," *IEEE Trans. Circuits and Syst. II*, vol. 56, no. 5, pp. 339-343, May 2009.
- [66] C. T. Lu, H. H. Hsieh, and L. H. Lu, "A low-power quadrature VCO and its application to a 0.6-V 2.4-GHz PLL," *IEEE Trans. Circuits and Syst. I*, vol. 57, no. 4, pp. 793–802, Apr. 2010.
- [67] W. Liu, W. Li, P. Ren, C. Lin, S. Zhang, and Y. Wang, "A PVT tolerant 10 to 500 MHz all-digital phase-locked loop with coupled TDC and DCO," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 314-321, Feb. 2010.



# VITA



博士生:何盈杰(Yingchieh Ho) 指導教授:蘇朝琴(Chauchin Su) 論文題目:應用於近臨界電壓晶片資料 傳輸之拔靴帶式電路技術

(Bootstrapped Circuit Techniques for Near-threshold On-chip Data Link)

# 學歷:

1.1995年9月~1999年6月

- 2.1999年9月~2001年6月
- 3.2005年9月~迄今


## **Publication List**

## **Journal Papers**

- <u>Vingchieh Ho</u>, Hung-kai Chen and Chauchin Su, "Energy-effective Sub-threshold Interconnect Design Using High-Boosting Pre-drivers," *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*. (to appear)
- [2] Yuhwai Tseng, <u>Vingchieh Ho</u>, Shuoting Kao and Chauchin Su, "A 0.09 μW Low Power Front-End Biopotential Amplifier for Biosignal Recording," *IEEE Trans. on Biomedical Circuits and Systems*, 2012. (to appear)
- [3] <u>Vingchieh Ho</u> and Chauchin Su, "A 0.1-0.3V 40-123 fJ/bit/ch On-chip Data Link with ISI-Suppressed Bootstrapped Repeaters," *IEEE Journal of Solid-State Circuits*, vol. 47, no.5, May, 2012. (to appear)
- [4] <u>Yingchieh Ho</u>, Chiachi Chang and Chauchin Su, "Design of a Sub-threshold-Supply Bootstrapped CMOS Inverter Based on an Active Leakage Current Reduction Technique," *IEEE Trans. on Circuits System. II*, vol. 59, no.1, pp. 55-59, Jan. 2012.
- [5] <u>Yingchieh Ho</u>, Ya-Ting Chen and Chauchin Su, "A Power Efficient On-chip Bus Design with Dynamic Voltage and Frequency Scaling Scheme," International Journal of Electrical Engineering, Vol. 17, No. 3, pp. 207-215, 2010.

## **Conference** Papers

- [1] Shu-Yu Hsu, <u>Yingchieh Ho</u>, Yuhwai Tseng, Ting-You Lin, Po-Yao Chang, Jen-Wei Lee, Ju-Hung Hsiao, Shiou-Ming Chuang, Tze-Zheng Yang, Chauchin Su, and Chen-Yi Lee," A Sub-100µW Multi-Functional Cardiac Signal Processor for Mobile Healthcare Application," *in IEEE Symp. VLSI Circuits Digest of Tech. Papers*, 2012. (accepted)
- [2] <u>Yingchieh Ho</u>, Yu-Sheng Yang and Chauchin Su, "A 0.2-0.6 V Ring Oscillator Design Using Bootstrap Technique," *in IEEE Asian Solid-State Circuits Conference (ASSCC) Digest of Tech. Papers*, Jeju, Nov. 14<sup>th</sup>-16<sup>th</sup>, 2011, pp. 333-336.
- [3] Tsunhsin Wang, <u>Yingchieh Ho</u>, Yuhwai Tseng, and Chauchin Su, "A Hearing-Aid Front-End Circuit Based on Low Power and Low Area Mix Mode AGC," The 8th IASTED International Conference on Biomedical Engineering Biomed 2011, Feb 16-18, 2011, Innsbruck Austria.
- [4] <u>Yingchieh Ho</u>, Chiachi Chang and Chauchin Su, "Bootstrapped Repeaters Design Using Precharge Enhancement Technique for Ultra-low-voltage Interconnect," 22th VLSI Design and CAD Symposium, Aug. 2011. (Best Paper Candidate)
- [5] <u>Yingchieh Ho</u>, Yu-Sheng Yang and Chauchin Su, "Design of an Ultra-Low-Power Ring Oscillator Using Bootstrap Technique," 22th VLSI Design and CAD Symposium, Aug. 2011. (Best Paper Candidate)
- [6] Che-Wei Wu, <u>Vingchieh Ho</u> and Chauchin Su, "An 80-dB SNDR, 160-nW, 0.4-V Delta-Sigma Modulator for Pacemaker Front-end Circuits," 22th VLSI Design and CAD Symposium, Aug.

2011.

- [7] Hung-Wen Lin, <u>Ying-Chieh Ho</u>, YingLin Fa, and ChauChin Su, "A 5Gb/s Pulse Signaling Interface for Low Power On-Chip Data Communication," International Symposium on Circuits and Systems, Paris, May 30th- Jun. 2nd, 2010, A1-L, pp. 201-204.
- [8] Chou-Ming Kuo, <u>Ying-Chieh Ho</u> and Chauchin Su, "A 4-bit 5-GSample/s Low-Power Digitalized A/D Converter for Pulse Amplitude Modulation System," 21th VLSI Design and CAD Symposium, Aug. 2010.
- [9] Ya-Ting Chen, <u>Ving-Chieh Ho</u> and Chauchin Su, "A Power Efficient On-chip Bus Design with Dynamic Voltage and Frequency Scaling Scheme," 20th VLSI Design and CAD Symposium, Aug. 2009. (Best Paper Candidate)

