

# 電子工程學系 電子研究所



文////

計與電壓準位轉換設計

Design of Multiphase Clocking and Level Conversion for

Ultra-Low-Voltage DVFS Systems

# 

研 究 生:陳美維

指導教授:黃 威 教授

中華民國一〇一年九月

# 可用於低電壓動態電壓與頻率調節系統之多相時脈設 計及電壓準位轉換設計

Design of Multiphase Clocking and Level Conversion for

Ultra-Low-Voltage DVFS Systems

研究生:陳美維 Student: Mei-Wei Chen

指導教授:黃 威 教授 Advisor: Prof. Wei Hwang

國立交通大學

電子工程學系電子研究所

A Thesis

碩士論

Submitted to Department of Electronics Engineering & Institute of Electronics

College of Electrical Engineering and Computer Engineering

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Master

in

**Electronics Engineering** 

September 2012

Hsinchu, Taiwan, Republic of China

中華民國一〇一年九月

可用於低電壓動態電壓與頻率調節系統之多相時脈設

# 計及電壓準位轉換設計

學生:陳美維

指導教授:黃 威 教授

#### 國立交通大學電子工程學系電子研究所

要 摘

本論文提出一個可用於超低電壓動態電壓與頻率調節的系統。多個不同電壓準位電壓源是新 興減少功耗的方法,此種方式需要電壓準位轉換器當作橋梁與不同電壓域溝通。所提出的跨接耦 合的電壓轉換器表現較小的傳遞延遲、較低功耗以及最小的功耗與延遲乘積。由於利用了反效短 通到的效應,對於溫度變化的抵抗力也提升許多。所提出的跨接耦合電壓準位轉換器是使用 TSMC 65nm CMOS 製成去設計出來的,在所有製程環境變數下都可以正確操作,並且輸入電壓從 150mV 到 1.0V 都可以操作。

在動態電壓與頻率調節系統下,電壓準位轉換器可能會導致一些傳遞延遲與功耗增加。為了 減少電壓轉換器所造成的影響,提出了一個對於製成、電壓、溫度強健的明確脈衝雙緣觸發的電 壓準位轉換觸發器。它是由時脈脈衝產生器與差動串接電壓開關鎖存器組成。所提出的電壓準位 轉換觸發器可以操作從近臨界電壓(0.4V)到超過臨界電壓(1.0V),並且擁有負值的設置時間,如 此一來,可以減少對時脈偏移與抖動的影響。

一個寬操作範圍的延遲鎖定迴路多相時脈被提出,在一個時脈週期裡擷取出八個相位,並且 有兩個控制模式。第一個模式是逐次逼近控制,可以加速鎖定速度。第二個模式計數器模式,可 以幫助監控對環境所造成的影響。此外,還有倍頻偵測器被提出可以防此倍頻的鎖定。為了使時 脈產生器可以產生 50% 負載週期,一個對製成、電壓、溫度強健及全數位操作的負載週期校正器 被提出來。

Т

# Design of Multiphase Clocking and Level Conversion for

#### Ultra-Low-Voltage DVFS Systems

Student : Mei-Wei Chen

Advisor : Prof. Wei Hwang

Department of Electronics Engineering & Institute of Electronics National Chiao-Tung University

# **ABSTRACT**

This thesis proposes an ultra-low voltage (ULV) DVFS system. A multiple supply voltage is an emerging approach to reduce the power dissipation. The scheme requires a level converter as a bridge for different voltage domains. The proposed cross-coupled level converter achieves small propagation delay, low power consumption, and best power-delay-product (PDP) performance. The reverse short channel effect is utilized to provide our level converter better process/thermal variation immunity. The proposed cross-coupled level converter is designed using TSMC 65nm bulk CMOS technology. It functions correctly across all process corners for a wide input voltage range, from 150mV to 1V.

The level converter may cause the propagation delays and power consumption in the DVFS system. In order to eliminate the overhead of level conversion, a PVT robust dual-edged triggered explicit-pulsed level converting flip-flop (DETEP-LCFF) is proposed. It is composed of a clock pulse generator and a modified differential cascode voltage switch with pass gate latch (DCVSPG). The proposed LCFF can be operated from near-threshold region (0.4V) to super-threshold region (1.0V) and have a negative setup time to reduce the effect of the clock skew and jitter.

A wide range DLL-based multiphase clocks is proposed. The eight phases is divided from a clock cycle. There are two control mode. The first mode is successive approximation register-controlled (SAR) mode which helps to accelerate the lock in speed. The second mode is counter mode to keep tracking the environmental effect. A harmonic detection is proposed to avoid a harmonic lock. To make the clock generator produce a 50% duty cycle clock signal, a PVT robust all-digital duty cycle corrector (DCC) is propose.

#### 謝誌

雨年來一路上跌跌撞撞,能夠完成這份論文,要感謝的人太多太多了,一路 上沒有身邊的這些家人、師長、朋友的話,無法成就現在的我!!! 首先很感謝我 的指導教授-黃威教授,在研究所兩年裡,老師總是鼓勵著我一直往前,除了在 研究上給予最專業的建議與指導外,在做人處事方面,也給了我最好的榜樣,真 的很謝謝黃威教授!!!

也要感謝實驗室裡學長們的幫忙-張銘宏學長、黃柏蒼學長、謝維致學長、 楊浩義學長。特別要感謝我的指導學長-張銘宏學長,在遇到問題時總是能夠跟 一起討論,並給予許多想法與建議,讓我可以釐清頭緒解決問題。除此之外,也 要謝謝其他學長的協助與幫忙,一路上有著你們的幫忙,我才能準時完成這分論 文。

最後要謝謝的就是我的朋友與家人,每每在低潮時,總是能陪身旁,給我鼓勵與加油,讓我有力氣一直往前走,有了你們的陪伴,讓我的人生更豐富與精彩, 真的很謝謝你們!!! 未來我會努力下去、永不放棄我的熱情!!!



# Content

| Chapter 2    | 1 Introduction                                                  | 1    |
|--------------|-----------------------------------------------------------------|------|
| 1.1          | Background                                                      | 1    |
| 1.2          | Motivation                                                      | 2    |
| 1.3          | Organization                                                    | 3    |
| Chapter 2    | 2 Overview on DLL-based Frequency Multiplier, Duty Cycle Correc | tor, |
| and Leve     | l Conversion                                                    | 4    |
| 2.1.1        | Frequency multiplier scheme                                     | 5    |
| 2.2          | An Overview on Duty Cycle Corrector                             | 12   |
| 2.2.1        | Digital                                                         | 13   |
| 2.2.2        | Analog                                                          | 16   |
| 2.2.3        | Mixed mode [2.30]                                               | 17   |
| 2.3          | An Overview on Level Converter                                  | 18   |
| 2.3.1        | Cross-coupled type                                              | 19   |
| 2.3.2        | Current mirror [2.44]                                           | 23   |
| 2.3.3        | B Dynamic type [2.46]                                           | 24   |
| 2.4          | An Overview on Level Converting Flip-Flop                       | 25   |
| 2.4.1        | Slave Latch Level Shifting[2.50]                                | 25   |
| 2.4.2        | Clock Level Shifted Sense Amplifier Flip-Flop[2.50]             | 26   |
| 2.4.3        | Self-Precharging Flip-Flop[2.51]                                | 27   |
| <b>2.4.4</b> | Pulsed-Triggered Level Converting Flip-Flop                     | 28   |
| Chapter :    | 3 A Wide Range DLL-based Multiphase Clock Generator with Duty   |      |
| Cycle Co     | rrection in 65nm CMOS                                           | 32   |
| 3.1          | Introduction                                                    |      |
| 3.2          | Multiphase clock applications                                   |      |
| 3.2.1        | Frequency synchronizer[3.4]                                     | 34   |
| 3.2.2        | Clock and data recovery[3.5]                                    | 35   |
| 3.2.3        | B DRAM interface[3.6]                                           | 36   |
| 3.3          | System architecture                                             |      |
| 3.4          | Circuit description                                             |      |
| 3.4.1        | Delay blocks                                                    | 39   |
| 3.4.2        | Phase detector                                                  | 43   |
| 3.4.3        | B Delay block controller                                        | 44   |
| 3.4.4        | Harmonic detection                                              | 47   |
| 3.5          | Duty cycle corrector with a PVT detection                       | 49   |
| 3.5.1        | System architecture                                             | 49   |
| 3.5.2        | PVT detection                                                   | 50   |

| 3.5.3         | Performance summary                                        | 51        |
|---------------|------------------------------------------------------------|-----------|
| 3.6 Co        | nclusion                                                   |           |
| Chapter 4 An  | n Energy-Efficient Level Converter with High Thermal Varia | ation     |
| Immunity fo   | r Sub-threshold to Super-threshold Operation               | 53        |
| 4.1 Int       | roduction                                                  | 54        |
| 4.2 Pro       | oposed Energy-Efficient Level Converter with High          | Thermal   |
| Variation     | Immunity                                                   |           |
| 4.2.1         | Diode-Connected PMOS Transistors                           |           |
| 4.2.2         | Multi-threshold-voltage CMOS (MTCMOS)                      | 60        |
| 4.2.3         | Stack Leakage Reduction Technique                          | 61        |
| 4.2.4         | Reverse Short Channel Effect [4.13]                        |           |
| 4.2.5         | Sub-threshold device sizing                                | 63        |
| 4.2.6         | Inner inverter device sizing                               | 64        |
| 4.2.7         | Proposed level converter performance                       | 66        |
| 4.3 Sin       | nulation Results                                           | 67        |
| 4.3.1         | Minimum input voltage                                      | 67        |
| 4.3.2         | Propagation delay, Power, and PDP                          | 68        |
| 4.3.3         | Monte Carlo Simulation                                     | 70        |
| 4.3.4         | Temperature-induced delay variation                        |           |
| <b>4.4 Co</b> | nclusions                                                  | 72        |
| Chapter 5 A   | PVT Robust Dual-Edged Triggered Explicit-Pulsed Level Co   | onverting |
| Flip-Flop     |                                                            | 74        |
| 5.1 Int       | roduction                                                  | 75        |
| 5.2 A I       | PVT robust dual-edged triggered explicit-pulsed LCFF with  | h a wide  |
| operation     | range                                                      | 81        |
| 5.2.1         | Modified DVSPG Latch                                       |           |
| 5.2.2         | Pulse Generator                                            |           |
| 5.2.3         | Optimal Operating Point                                    | 91        |
| 5.2.4         | Clock Pulse Generator Sharing Technique                    |           |
| 5.3 Per       | formance Comparisons                                       |           |
| 5.3.1         | Minimum Input Voltage                                      |           |
| 5.3.2         | Minimum D-Q Delay, Power, and PDP                          | 96        |
| 5.3.3         | Power Analysis with Data Switching Activity                |           |
| 5.3.4         | Monte Carlo Simulation- Data Error Rate                    |           |
| 5.4 Co        | nclusions                                                  |           |
| Chapter 6 C   | onclusion and Future Work                                  |           |
| 6.1 Conclu    | ision                                                      |           |
| () Entra      | Work                                                       |           |

| Bibliography | <sup>,</sup> 104 |
|--------------|------------------|
|--------------|------------------|



# List of Table

| Table 3.1. The truth table of the phase detector.                   | 44     |
|---------------------------------------------------------------------|--------|
| Table 3.2. The truth table of the SAR controller.                   | 46     |
| Table 3.3. The truth table of the anti-harmonic detection block     | 49     |
| Table 3.4. Performance summary of the proposed duty cycle corrector | 52     |
| Table 4.1. Performance Summary and Comparisons                      | 73     |
| Table 5.1. Performance comparisons among DETEP-LCFFs at VDDL=0.7    | V,25℃, |
| TT corner                                                           | 97     |
| Table 5.2. Performance summary of the proposed DETEP-LCFF           | 100    |



# **List of Figure**

| Figure 2.1. An all-digital clock generator for DVFS [11]7                           |
|-------------------------------------------------------------------------------------|
| Figure 2.2 Timing diagram [2.11]7                                                   |
| Figure 2.3. Cyclic clock multiplier [2.11]7                                         |
| Figure 2.4. A fast-locking programmable DLL-based clock generator [2.12]9           |
| Figure 2.5. Timing diagram [2.12]9                                                  |
| Figure 2.6.A clock generator with a high multiplication factor [2.4]                |
| Figure 2.7. Pulse generator and edge combiner [2.4]11                               |
| Figure2.8. Process variation tolerant multiphase DLL [2.20]12                       |
| Figure 2.9. 90 degree phase shift block structure [2.20]                            |
| Figure 2.10. Proposed all-digital with feedback loop DCC [2.22]14                   |
| Figure 2.11. Signal paths [2.22]. (a) DCC path (b) deskew path15                    |
| Figure 2.12. Without a feedback loop DCC [2.26]. (a) Proposed topology (b)          |
| Timing diagram16                                                                    |
| Figure 2.13. Analog DCC with an integrator [2.33]17                                 |
| Figure 2.14. Analog DCC with a charge pump [2.33]17                                 |
| Figure 2.15. Mixed mode DCC [2.30]17                                                |
| Figure 216. Conventional level converters. (a) Cross-coupled type (b) Current       |
| mirror type (c) Dynamic type18                                                      |
| Figure 2.17. Voltage doubler [2.37]19                                               |
| Figure 2.18.Cascade level converter [2.38]20                                        |
| Figure 2.19. A cross-coupled level converter with two reduced swing inverter        |
| [2.39]. (a) A modified cross-coupled level converter. (b) A reduced swing inverter. |
|                                                                                     |
| Figure 2.20. Diode-connected PMOS transistors [2.40]. (a) A diode-connected         |
| cross-coupled level converter. (b) Operation principle22                            |
| Figure 2.21. A feedback loop [2.45]                                                 |
| Figure 2.22. A wilson current mirror.[2.44]24                                       |
| Figure 2.23. Modified dynamic level converter [2.46]. (a) A dynamic level           |
| converter with a clock synchronization. (b) Schematic view of a clock               |
| synchronizer25                                                                      |
| Figure 2.24. Slave latch level shifting [2.51]26                                    |
| Figure 2.25. Clock Level Shifted Sense Amplifier Flip-Flop[2.51]27                  |
| Figure 2.26. Self-Precharging Level Converting Flip-Flop[2.52]                      |
| Figure 2.27. Single-edged triggered flip-flop[2.55]                                 |
| Figure 2.28. Dual-edged triggered flip-flop[2.58]: (a) Pulsed-triggered LCFF        |
| (b)Dual-edge pulse trigger circuit                                                  |

| Figure 2.29. Implicit-pulsed triggered LCFF[2.56]                                    |
|--------------------------------------------------------------------------------------|
| Figure 2.30. Explicit-pulsed triggered LCFF [2.59]: (a) Pulsed-triggered LCFF (b)    |
| 4T-XOR pulse generator                                                               |
| Figure 3.1. Frequency synchronizer                                                   |
| Figure 3.2. Clock and data recovery                                                  |
| Figure 3.3. Multiphase DLL used in DRAM interface [3.6]37                            |
| Figure 3.4. Proposed DLL-base multiphase clocks with a wide operation range 38       |
| Figure 3.5. Finite state machine                                                     |
| Figure 3.6. Delay blocks                                                             |
| Figure 3.7. Coarse tune delay line- nest-lattice structure40                         |
| Figure 3.8. The relationship between the digital control code and the coarse delay.  |
|                                                                                      |
| Figure 3.9. Fine tune delay line- current starve inverter                            |
| Figure 3.10. The relationship between the digital control code and the fine tune     |
| delay                                                                                |
| Figure 3.11. (a)N-bit binary-to-thermometer decoder. (b) 2-bit                       |
| binary-to-thermometer decoder. (c) 3-bit binary-to-thermometer decoder42             |
| Figure 3.12. (a) Phase detector circuit block. (b) Operation diagram43               |
| Figure 3.13. Dual mode of the delay block controller-SAR mode and counter mode.      |
|                                                                                      |
| Figure 3.14. (a) SAR mode. (b) SAR controller circuit block                          |
| Figure 3.15. (a) Normal operation of the phase detector. (b) Anti-harmonic           |
| mechanism                                                                            |
| Figure 3.16. Anti-harmonic detection circuit block                                   |
| Figure 3.17. The proposed duty cycle corrector with a PVT detection50                |
| Figure 3.18. Finite state machine                                                    |
| Figure 3.19. PVT detection circuit block                                             |
| Figure 3.20. Output duty cycle error comparison51                                    |
| Figure 4.1. (a) Convention cross-coupled level converter. (b) Monte Carlo            |
| simulation of conduction current56                                                   |
| Figure 4.2. A level converter with two diode-connected PMOS transistors [4.2]. 57    |
| Figure 4.3. A level converter with built-in short circuit current reduction [4.3]57  |
| Figure 4.4. A level converter with two cascade cross-coupled level converters [4.4]. |
|                                                                                      |
| Figure 4.5. Schematic view of the proposed level converter                           |
| Figure 4.6. Monte Carlo simulation of conduction current of two diode-connected      |
| PMOS transistors                                                                     |
| Figure 4.7. Monte Carlo simulation of using HVT devices for pull up PMOS             |

| transistors60                                                                                   |
|-------------------------------------------------------------------------------------------------|
| Figure 4.8. (a) Leakage reduction technique [4.11]. (b) Monte Carlo simulation of               |
| leakage current with/without leakage reduction technique62                                      |
| Figure 4.9. Short channel effect. (a) Delay and power simulation (b) PDP value                  |
| simulation63                                                                                    |
| Figure 4.10. Sub-threshold device sizing. (a) Delay and power simulation (b) PDP                |
| value simulation64                                                                              |
| Figure 4.11. Inner inverter sizing. (a) Delay and power simulation. (b) PDP value               |
| simulation65                                                                                    |
| Figure 4.12. Performance comparison. A: two diode-connected PMOS. B:                            |
| multi-V <sub>th</sub> devices. C: leakage reduction technique. D: reverse short channel effect. |
| E: inner inverter sizing. (a) By implementation B, delay reduction up to 22% (b)                |
| By implement C, power reduction up to 26% (c)By implement D, PDP reduction                      |
| up to 17%. Finally, combining all implementation, overall PDP reduction up to                   |
| 23%                                                                                             |
| Figure 4.13. Minimum input voltage comparison68                                                 |
| Figure 4.14. Performance comparison between the proposed level converter and                    |
| the existing level converter. (a) Propagation delay comparison (b) Power                        |
| comparison (c) PDP compariosn                                                                   |
| Figure 4.15. Monte Carlo simulation of propagation delay. (a) supply voltage is                 |
| 200mV (b) supply voltage is 500mV71                                                             |
| Figure 4.16. Temperature-induced variations on propagation delay                                |
| Figure 4.17 Layout view of the proposed level converter73                                       |
| Figure 5.1. A basic structure of a level converting flip-flop                                   |
| Figure 5.2. Separate work: a level convert is followed by a latch (a) a level                   |
| converter (b) a latch77                                                                         |
| Figure 5.3. Combination work: a level converter is embedded into a latch77                      |
| Figure 5.4. Performance comparisons between separate work and combination                       |
| work. (a) Minimum D-Q delay comparison (b) power comparison (c) PDP value                       |
| comparison                                                                                      |
| Figure 5.5. Dual-edged triggered explicit-pulsed LCFF with a feedback signal                    |
| [5.8]. (a) A clock pulse generator (b) LCFF with a feedback signal79                            |
| Figure 5.6. Dual-edged triggered explicit-pulsed LCFF based on dual $V_{th}$ [5.9]. (a)         |
| A clock pulse generator (b) LCFF employing dula V <sub>th</sub>                                 |
| Figure 5.7. Dual-edged triggered explicit LCFF with a self-precharged gate                      |
| dynamic gate [5.10]. (a) A clock pulse generator with a dynamic self-precharged                 |
| gate(b) LCFF                                                                                    |
| Figure 5.8. Schematic view of the proposed dual-edged triggered explicit-pulsed                 |

| LCFF. (a) A 4T-XOR clock pulse generator with symmetric setup time. (b) A                                         |
|-------------------------------------------------------------------------------------------------------------------|
| modified DCVSPG latch providing a wide operation range                                                            |
| Figure 5.9. (a) Conventional DCVSGP latch (b) Modified DCVSPG latch83                                             |
| Figure 5.10. Monte Carlo Simulation of conduction current. (a) Conventional                                       |
| DCVSPG latch (b) Modified DCVSPG latch84                                                                          |
| Figure 5.11. Monte Carlo simulation of conduction current after using HVT                                         |
| PMOS transistors                                                                                                  |
| Figure 5.12. Waveform of the DCVSPG latch. (a) Without using two NMOS                                             |
| transistors (MP <sub>3</sub> and MP <sub>4</sub> ). (b) With connecting two NMOS transistors (MP <sub>3</sub> and |
| MP <sub>4</sub> )                                                                                                 |
| Figure 5.13. Timing diagram of clock pulse generator87                                                            |
| Figure 5.14. Proposed clock pulse generator with a balance clock pulse at each                                    |
| clock edge. (a) Schematic view of the proposed clock generator (b) Timing                                         |
| diagram of the proposed clock pulse generator                                                                     |
| Figure 5.15.Transmission gate logic[5.8] has different pulse holding time. (a) A                                  |
| clock pulse generator (b) Timing diagram of the clock pulse generator90                                           |
| Figure 5.16. Pseudo-NMOS gate [5.9] has the different pulse triggering time. (a) A                                |
| clock pulse generator. (b) Timing diagram of the clock pulse generator90                                          |
| Figure 5.17. Performance comparisons of the clock pulse generator at                                              |
| VDDL=0.4V. (a) difference of triggering time (b) difference of hold period (c)                                    |
| power consumption                                                                                                 |
| Figure 5.18. Minimum operation point. (a) Minimum D-Q delay and power                                             |
| consumption (b) PDP value92                                                                                       |
| Figure 5.19 Analysis of sharing technique. (a) in the super-threshold region                                      |
| operation (b) in the sub-threshold region operation94                                                             |
| Figure 5.20. Comparison of minimum input voltage                                                                  |
| Figure 5.21. Performance comparison at VDDL=0.7V, 25°C, TT corner. (a)                                            |
| Minimum D-Q delay (b) Power consumption (c) PDP value                                                             |
| Figure 5.22. Power analysis with data switching activity                                                          |
| Figure 5.23. Monte Carlo simulation of data error rate                                                            |
| Figure 5.24 Layout view of the proposed DETEP LCFF100                                                             |
| Figure 6.1. 3DIC application103                                                                                   |

# Chapter 1

# Introduction

# 1.1 Background

With the increasing demand of the mobile applications and the biologic portable systems, power dissipation has become a critical issue in the modern IC designs. Reducing the supply voltage is considered as the most potential approach for energy saving because of the quadratic relation between the supply voltage and power dissipation. Ultralow-power dissipation can be achieved by operating digital circuits with scaled supply voltages. The operating voltage is scaled down to sub-threshold or near-threshold regions depending on the power and speed requirements of circuit system. There are some researches to demonstrate optimizations of sub-threshold design in device, circuit as well as architecture perspectives, which are different from the conventional super-threshold design [1.1]. However, lowering the supply voltage causes the degradation of the performance, such as incurring a large delay.

Multiple supply voltage techniques have been presented for low power design [1.2]. The critical parts of a digital system are employed a nominal supply voltage to meet the performance needs. The other parts are operated in the sub-threshold region to save the power dissipation. Such multiple voltage designs can run different blocks at the different supply voltages to perform dynamic voltage and frequency scaling (DVFS) on different voltage domains [1.3]. DVFS technique has been widely used to achieve the goal of saving power, recently. In addition, advances in ultra-low voltage

circuit design is proved to save huge power. Therefore, the combination of DVFS and ULV design techniques has a great potential for low power design.

#### **1.2 Motivation**

Between the two different voltage domains, it may occur a situation that a lower supply voltage gate drives a higher supply voltage gate. While the high output of a lower supply voltage gate is not strong enough to fully turn off a PMOS gate supplied by a higher supply voltage, this results in a DC leakage path from the voltage source to the ground and increases the power dissipation. In addition, if a higher supply voltage gate is driven by a lower supply voltage gate, it cannot have a full output swing and causes a function error. To solve these problems, a level converter is essentially inserted at the interface between two different voltage domains.

A level converter will cause a propagation delay and power dissipation. In order to get rid of the overhead of level conversion, a low voltage cluster is usually followed by pipeline flip-flops. A flip-flop emerging with a level converter is called level-converting flip-flop (LCFF). LCFF can latching and level converting simultaneously. LCFF takes VDDL input (*D*) and clock signals (*CLK*) and provides a VDDH output stored signal (Q).

A clock generator is indispensable in the IC design. The quality of the synchronous clock between different voltage domains becomes important. Phase-locked loop (PLL) and delay-locked loop (DLL) are widely used to solve the clock synchronization problem. However, the DLL is more suitable for the clock de-skew problem than PLL due to the simple design effort and innate characteristic. The DLL-based clock generator is used in many high performance applications, such as clock/data recovery (CDR) circuit, double data rate (DDR) SDRAM, and

frequency multiplier. The multiphase clock generator has been exploited for a long time. The clock signal is transmitted through the clock tree. Due to the unmatched clock tree diver, the duty cycle of the clock signal is distorted. If there is a duty cycle distortion, it may cause the degradation of the performance. Therefore, the duty cycle corrector design has been proposed to solve the duty cycle distortion problem.

# **1.3 Organization**

The thesis includes six chapters which focus on the level conversion for the multiple supply voltage domain and a wide range operation DLL-based multiphase clock generator.

Chapter 2 gives an overview of the DLL-based frequency multiplier, duty cycle corrector, level converter, and level converting flip-flop.

Chapter3 describes the proposed wide range DLL-based multiphase clock generator with a anti-harmonic detection. Also, a DLL-based duty cycle corrector is proposed.

Chapter4 presents the proposed energy-efficient level converter for sub-threshold to super-threshold operation.

Chapter5 demonstrates the proposed level converting flip-flop. The level converter from Chapter4 is utilized in the flip-flop. For the level converting flip-flop, the data can be latched and level converting at the same time.

Chapter6 gives the conclusion of this thesis and the future work.

# Chapter 2

# **Overview on DLL-based Frequency Multiplier, Duty Cycle Corrector, and Level Conversion**

# 2.1 An Overview on DLL-based frequency multiplier

As the advance of the CMOS technologies, IC performances have improved very fast. Many high-speed applications, such as microprocessor, memory IC and communication IC, require high frequency clocks in the circuits. In order to increase the bandwidth of the data rate, the clock frequency can be improved by the on-chip frequency multipliers. The output high frequency clock signal should be also synchronized with the reference clock. Thus, the synchronization problem has become a critical design issue in the clock generation field. The clock generation and synchronization can be solved by the phase-locked loop (PLL) [2.1]-[2.3] or the delay-locked loop (DLL) [2.4]-[2.6].

PLL-based clock generators require voltage-controlled oscillator (VCO), which is difficult to design and prone to the process, voltage, and temperature (PVT) variations [2.7]-[2.8]. DLL-based clock generator produces the clocks by replacing the VCOs with the delay block. Therefore, the DLLs are much simpler to implement and more immune to the PVT variations. Additionally, DLLs merely add a controllable phase delay to the input clock signal and produce an output clock signal, jitter that is present in the input clock signal is passed directly to the output clock signal. In contrast, the jitter of the input clock signal can be better filtered out by DLL. Although the DLLs are more stable than the PLLs, there is a difficulty in designing the frequency multiplier using voltage-controlled delay or digitally-controlled delay line. The frequency multiplier schemes have been presented [2.9]-[2.21]. One is that the multipliers have a fixed multiplication factor [2.9]-[2.10]. The other is that the programmable multiplication factor frequency multiplier circuits have been proposed [2.11]-[2.13].

Two kinds of DLL-based multiphase clock schemes have been presented. One is using many delay blocks to produce many consecutive delay phases and then utilizing the edge combiner [2.14]-[2.15], [2.20]-[2.21] to achieve the frequency multipliers. The multiplication ratio is usually correlated with the number of the delay cells in the delay lines, meaning higher factor the larger areas. The other is using delay cells as a ring oscillator generating the cyclic waves [2.11]-[2.12], [2.16]-[2.19] to multiply the clock frequency. The cyclic scheme has a locking initial constraint that it has to operate from the shortest delay line. Thus, this kind of the clock generation cannot switch the clock frequency from low to high.

# 2.1.1 Frequency multiplier scheme

Advances in the VLSI fabrication process have led to an increase in the clock frequencies of circuits. Hundreds of megahertz frequencies is easy to reach for nowadays technology. At such high frequencies, how to distribute the clock signal through an entire system has become a problem. An external clock cannot be used, thus an on-chip clock multiplier is essential for the high speed products. There are two ways to produce the multiple frequency of the reference clock. One is cyclic circulating wave, the other is delaying the reference clock to produce the multiphase clocks.

#### 2.1.1.1 Cyclic circulating scheme

All-digital clock generator for dynamic frequency scaling[2.11]

In this case, an all-digital clock generator using a cyclic clock multiplier (CCM) was proposed for the dynamic frequency scaling applications. It only takes the four reference clock cycles to lock the clock signal. Besides, the cyclic jitter cased by the mismatch of the delay cell can be reduced because the output clock passes through the same delay line. Particularly, it can realize a fractional or multiplied clock.

Fig. 2.1 shows the proposed multiple frequency clock generator. It includes a CCM, a finite state machine (FSM), a conventional time-to-digital converter (TDC), a counter\_K, a programmable divider and two multiplexers (MUXs). The divider varies from 2 to 8 by N[2:0]. Take the five times of reference clock as an example (factor ratio, M=5) and a typical timing diagram is shown in Fig. 2.2, The operation is divided into four steps which takes one reference clock for each step. To preset C[4:0] as M=5 and CCM measures the period of the reference clock. Five unit delay cells are selected to circulate a pulse. Counter\_K counts this multiplied clock within one reference clock. Next, the counted value is stored in K[4:0]=3. The FSM changes the number of the delay cells of CCM, from 5 to 3. After that, CCM produces 5 pulses by 3 units. Finally, TDC measures the phase error between the multiplied signal PG and the reference clock. And then the delay of the unit delay cell in the CCM is adjusted by the output of TDC.



Figure 2.3. Cyclic clock multiplier [2.11].

#### All-digital fast-locking programmable DLL-based clock generator[2.12]

For the cyclic clock multiplier, there exists the initial delay constraint. A new locking method was proposed to fix this problem. Moreover, the modified successive approximation register-controlled (MSAR) circuit was utilized to shorten the locking time and tracks the environmental variations.

Fig. 2.4 shows the proposed DLL-based clock generator. It is made up of the MSAR circuit, a timing control circuit, a digital phase-frequency detector (PFD), and a digital-controlled delay line. This clock generator provides two operation modes, binary-search mode and sequential-search mode. Each mode has two execution cycles alternatively, refresh cycle and compare cycle. Differing from the conventional cyclic circulating scheme, refreshed every reference cycle. It refreshes the output clock to eliminate the initial constraint by two execution cycles. The two-cycle refreshing technique helps to solve the initial delay constraint and achieve fast-locking time. However, this architecture provides the more accumulated jitter and a half-loop bandwidth, compared with the conventional architecture having the same loop parameters. When the binary-search mode is finished, the frequency acquisition is also carried out. The clock generator enters the sequential -search mode. The MSAR circuit becomes a counter. The clock generator operates in a closed loop to track the PVT variations and compensates for the phase error. Once the clock generator enters the sequential-search mode, it will not go back to the binary-search mode until the system is reset. Fig. 5 shows the timing diagram of the clock generator.



Figure 2.4. A fast-locking programmable DLL-based clock generator [2.12].



# 2.1.1.2 Multiphase clocks with an edge combiner

A low power and wide range programmable clock generator with a high multiplication factor[2.4]

In this case, the clock generator consists of a DLL, a pulse generator, and a pulse combiner. Each pulse is generated from one corresponding unit delay cell. A high multiplication factor can be achieved with a fewer number of delay cells stages. In addition, power dissipation is reduced because the pulse generator consists of D flip-flop and inverters which operated only when trigged in their turn. An additional pulse selection process has been eliminated because the required sub-pulses are produced from the pulse generator for a target output signal frequency. A saturated-type differential delay cell is utilized so that the clock generator can be operated in a low frequency without the area overhead.

Fig. 2.6 shows the overall block diagram of the DLL-based clock generator. The clock generator produces 24 differential phase-shifted signals. The pulse generator detects the rising edge of the selected phase-shifted signals according to the programmed 2-bit signal. The details of the pulse generator and the pulse combiner are illustrated in the Fig. 2.7. The phase shifted signal from the VCDL, $\Phi_k$ , triggers the corresponding kth DFF. Since the reset process takes two-inverter delay time, a short pulse of duration,  $\Delta \tau$ , is generated at Q<sub>k</sub>. Finally, the pulse combiner collects these pulses. In order to create the required pulses, each S is set to either 0 or 1 according to the programmed 2-bit signals, C<sub>0</sub> and C<sub>1</sub>. Thus, one of four multiplication factors-4,8,12,24 can be chosen.



Figure 2.6.A clock generator with a high multiplication factor [2.4].



Figure 2.7. Pulse generator and edge combiner [2.4].

*Process variations tolerant all-digital multiphase DLL for DDR3 interface*[2.21]

In this case, the clock generator uses the four phase shifter to produce the multiplied frequency. The conventional digital DLL may suffer from a harmonic locking problem and area overhead of the delay line control logic. A time to digital converter (TDC) was used to prevent the harmonic locking but have an area overhead. Therefore, a ring oscillator and a counter to resolve the harmonic problem was used in this case. The maximum operating frequency of a digital DLL is determined by the minimum delay of a delay line. In this architecture the minimum delay is four times larger than the conventional digital DLL. The fine-tune delay line is needed to solve this problem.

Fig. 2.8 shows the proposed DLL which is composed of four 90 degree phase shift blocks, a global delay line controller, a phase selector, and an edge combiner. To eliminate the delay mismatch among the delay lines, the operation mode of the DLL is divided into the calibration mode and the locking mode. Each 90 degree delay block has its own controller which avoids interfering with other delay blocks. During the calibration mode, each 90 degree block calibrates 90 degree delay of its own delay line. An area efficient binary to thermometer (BTC) is used to control the digital 11

coarse delay line (DCDL). After the calibration mode, the operation mode changes into the locking mode. During the locking mode, the four delay lines are operated not as a ring oscillator but as a delay line controlled by the global delay controller. Finally, an edge combiner collects a 90 degree shifted clock output with 2x multiplication. Fig. 2.9 shows the 90 degree phase shift block structure.



# 2.2 An Overview on Duty Cycle Corrector

The clock signal is described by some of parameters such as, clock frequency, clock period, clock duty cycle. The duty cycle is defined by the ratio of the on-time period in a clock cycle period. A clock with a 50% duty cycle plays an important role in some applications, such as double-data-rate (DDR) SDRAM, a clock and data recovery circuit (CDR), an analog-to-digital converter (ADC), an dual-edged triggered flip-flop. In the double-rate systems, the data is sampled at both of the clock

edge, positive clock edge and negative clock edge. For the high-speed systems, the data is sampled by the dual clock edge so that the throughput is dramatically increased if comparing to only using the single clock edge. For the low power systems, if maintaining the same throughput, the clock frequency can be decreased to a half of clock frequency. Once the clock frequency is reduced, the clock network can consume less power. Also, a clock signal with 50% duty cycle is a critical key for the dynamic logic family. There are two phase-precharge phase when clock signal is at the low level and evaluation phase when the clock signal stay at the high level in the dynamic logic family. If there is a duty cycle distortion, it may cause the degradation of the performance. However, a duty cycle of the clock signal from the off-chip is prone to deviate from 50% while operated in a high frequency. In addition, even the clock generator produces a 50% duty cycle clock signal, there is probably a deviation in the duty cycle because of the unmatched clock driver in the rising edge and falling edge. In order to solve this problem, the duty cycle corrector (DCC) have been widely used [2.22]-[2.33] to adjust the duty cycle as close to 50% as possible. DCC can be classified into two types: digital type [2.22]-[2.29], analog type[2.33]-[2.34], and mixed mode [2.30]-[2.31]. The digital DCCs are separated into the feedback type [2.22]-[2.23] and non-feedback type[2.24]-[2.29]. The analog DCCs are usually implemented as a feedback type [2.32]-[2.33] to get a better accuracy at the expense of long locking time.

# 2.2.1 Digital

# 2.2.1.1 Feedback [2.22]

The proposed structure is shown in Fig. 2.10. It provides a clock synchronization with a deskew buffer and a duty cycle correction. They are composed of three half

delay lines (HDLs), an edge combiner, an interpolator, two phase detectors, and the circuit controllers. The three half delay lines are used to reduce the mismatch between the half delay lines. The architecture is based on a cyclic time-to-digital converter to shorten the locking time. Two phase detectors are employed to give the leading-lagging information during duty cycle correction (DCC) and deskew operations. Fig 2.11 shows the signal paths in the DCC phase and the deskew phase. From Fig. 2.11(a), *CLK*<sub>IN</sub> passes through PATH 1 to produce *CLK*<sub>DL</sub> whose period is equal to one input clock period. PD1 is used in PATH 1. In PATH 2, with the fixed HDL1, HDL3 is duplicated as HDL2 by using PD2. Finally, *CLK*<sub>IN</sub> travels through three HDLs to start the deskew phase, as Fig. 2.11(b) shown. *CLK*<sub>DL</sub> is like a set signal of the edge combiner. *CLK*<sub>HDL</sub> and *CLK*<sub>RHDL</sub> can be interpolated by averaging both as a half of the input clock period. The newly interpolated signal is used like a reset signal of the edge combiner. The digital feedback DCC can provide a short duty cycle correction time when comparing with the analog method. However, the digital feedback DCC needs a more complicated duty cycle detectors structures such as





Figure 2.10. Proposed all-digital with feedback loop DCC [2.22].







(b)

Figure 2.11. Signal paths [2.22]. (a) DCC path (b) deskew path

# 2.2.1.2 Non-feedback [2.26]

The half cycle delay line (HCDL) is a key component in the proposed DCC, as Fig. 2.12(a) shown. *CKB* passes through HCDL to produce *CKR* which is delayed by a half of the clock period. *CKR* is like a reset signal of the SR latch. *CKB* travels through matching delay line (MDL) to generate *CKS* which is slightly delayed after *CKB*. The correcting precision of a digital DCC is depended on the delay time of the delay unit so there exists a quantization error. MDL is used to compensate this inherent delay of HCDL. The timing diagram is drawn in Fig. 2.12(b). The digital non-feedback DCC has a fast duty cycle correction procedure. However, the characteristics of the open loop is that it can't track the process, voltage, and temperature (PVT) variations. It is not suitable to be operated in the low voltage design.





Figure 2.12. Without a feedback loop DCC [2.26]. (a) Proposed topology (b) Timing diagram.

# 2.2.2 Analog

The analog DCCs are usually implemented as feedback type to achieve a higher duty cycle accuracy at the expense of a long duty cycle correction time. Besides, it needs a more complex designs to maintain stable operation. In [2.33], DCC is along with a negative feedback loop and the pulse shrinking/stretching mechanism is utilized to adjust the input clock duty cycle, as Fig. 2.13 shown. The differential low-pass filter is used as an integrator to generate a feedback voltage  $V_{bias}$  which is used to adjust the delay line. In [2.33], the pulse control loop is utilized to control the duty cycle of the input clock, as Fig. 2.14 shown. Using the current ratio of charge pump, the DCC can generate a programmable duty cycle of the output clock



Figure 2.13. Analog DCC with an integrator [2.33].



# 2.2.3 Mixed mode [2.30]

The other type of DCCS is the mixed mode. The duty cycle detectors are implemented by the analog circuits such as an amplifier, integrators, or comparators. In [2.30], it used a comparator as a duty cycle detector and a digital SAR-controller to control the duty-cycle adjuster. The duty-cycle adjuster is employed the phase mixers to achieve a better duty-correction resolution. The analog comparator detect the clock duty-error precisely. The other blocks are implemented with digital circuits to reduce duty cycle correction time.



Figure 2.15. Mixed mode DCC [2.30].

# 2.3 An Overview on Level Converter

With the development of the portable devices, the power consumption becomes a critical issue. Applying a voltage scaling technique that changes the supply voltage of a gates to a lower value in CMOS circuits is an effective way of reducing power consumption. Ultra-low voltage logics have been exploited[2.34]. However, supplying an ultra-low supply voltage may cause the degradation of the performance. In [2.35], a clustered voltage scaling scheme was developed, in which, a critical path is still supplied by a high voltage to meet the performance demanding and a non-critical path is provided by a low supply voltage to reduce the power consumption. When a gate in a low voltage drives the a gate in the high voltage, the high output of the low-voltage gate is unable to fully turn off the high-voltage gate. This results in a DC leakage path from the power source to the ground. To overcome this problem, a level converter is implanted at the interface between two different voltage domains. The conventional level converter are separated into three types, as Fig. 2.26 shown. Unfortunately, they were only suitable for converting a low super-threshold input into a high super-threshold output. When operated in the sub-threshold region, those conventional level converter fail to work correctly. Recently, there had been proposed many methodologies for successfully converting the a sub-threshold input to a super-threshold output, [2.37]-[2.49].



Figure 2..16. Conventional level converters. (a) Cross-coupled type (b) Current mirror type (c) Dynamic

# 2.3.1 Cross-coupled type

The cross-coupled type level converter is drawn in Fig. 2.16(a). Two cross-coupled PMOS transistors form a positive feedback loop. When operated in ultra-low voltage, the pull-down devices (NMOS) is too weak to compete with the pull-up devices (PMOS). To have a balance driving ability can solve this problem.

# 2.3.1.1 Voltage doubler [2.37]

The voltage doubler is inserted before the input signal is applied to the level converter. The voltage doubler can level up the input voltage to enhance the pull down strength, as Fig. 2.17 shown. However, there are two larger MOS capacitors to incur an area overhead. In addition, the they are susceptible to noise because one of the two capacitors has one node floating. Therefore, the capacitors lose the charges over a period of time.



Figure 2.17. Voltage doubler [2.37]

#### 2.3.1.2 Cascade level converter [2.38]

Fig. 2.18 presents a cascade level converter methodology. The difference between the output voltage and the input voltage is reduce so that the pull up driving ability and pull down driving ability is almost equally. There is no imbalanced driving strength problem. For this method, the system should provide the three intermediate voltages, 300mV, 400mV, and 600mV. This results in the power management overhead.



2.3.1.3 Reduced swing inverter (RSI) [2.39]

In the cross-coupled latch, two reduced swing inverter are inserted in the positive loop, as Fig. 2.19(a) shown. The method is to reduce the pull up driving ability. A reduced swing inverter is presented in Fig. 2.19(b). When the input signal *IN* is "0", MP3 is turned on and charges the node *OUT*. If the input signal *IN* is changed to "1", MN1 is turned on and MP3 is turned off. Because the node OUT is not directly to the ground, so the node OUT drops only a certain value due to the charges redistribution between the output node and the capacitance C1. This causes a reduced output swing. By using the reduced swing inverters, the turn on voltage of pull up devices (MP1 and MP2 in Fig. 2.19(a)) is limited to twice time of the voltage drop of the PMOS diode.

Therefore, the pull up driving ability is reduced. However, the pull up ability of MP1 and MP2 is non-scalable, which is always limited to twice time of the drop voltage of the PMOS diode. When operated at a higher input voltage, this characteristic make the level converter slower. In addition, the extra inverters limit the minimum acceptable input logic voltage, which means that the input voltage of the extra inverter should be lower enough to flip the extra inverter. The additional inverters and RSI consumes a significant amount of the power. In this methodology, 24 transistors are required. This also results in an area overhead.



Figure 2.19. A cross-coupled level converter with two reduced swing inverter [2.39]. (a) A modified cross-coupled level converter. (b) A reduced swing inverter.

# 2.3.1.4 Diode-connected PMOS transistors [2.40]

Another method to reduce the pull up driving ability is to use a diode-connected PMOS transistor in the positive feedback loop, as Fig. 2.20(a) shown. At the steady state, the difference of the voltage between the gate and the source is equal to the drop voltage of the diode, which is very small. In Fig. 2.20(b), the input signal *INL* changes from "0" to "1", MN1 can easily sink the current of the node A to turn MP2 because of a small value of the drop voltage of MP3. The positive feedback is triggered. Two NMOS transistors ,MN3 and MN4,are added to help to sink the output node to the 21

ground. The rail-to- rail structure increases the noise margin and reduce the static power loss. In addition, a higher input voltage leads to a faster increase in the difference voltage between the gate and the source so that a faster switching of the output node. This methodology speed can track with the input voltage.



Figure 2.20. Diode-connected PMOS transistors [2.40]. (a) A diode-connected cross-coupled level converter. (b) Operation principle

# 2.3.1.5 Feedback loop [3.45]

In Fig. 2.21,there are adding two PMOS devices (MP3 and MP4) in the cross-coupled feedback loop. An extra feedback loop is from the output to drive these added transistors. Among these PMOS transistors, only MP1 is used to make a transition. MP2, MP3, and MP4 are used to keep the values of the node M. Therefore, MP1 can be made strong than other PMOS devices. In addition, an inverter (MP5 and MN5) is added to help speed up the transition . However, an imbalance driving ability problem still exists when operated in the sub-threshold region.



Figure 2.21. A feedback loop [2.45].

# 2.3.2 Current mirror [2.44]

One of the conventional level converter is based on a current mirror, as Fig. 2.16(b) shown. The current mirror type level converter suffers from a sever short current problem that increases the power dissipation. NMOS, M5, is connected below the current mirror. When A is low and AN is high, M3 is turned off and M4 is turned on. The node VI is charged by M6 until M6 and M7are turned off. The node Z is discharged. If A is high and AN is low, M3 is turned on and there is a current through M6, M5, and M3. Because M6 and M7 form a current mirror, there also a current from M7 so that turns off M5. There is no static current through M6, M5, and M3.


Figure 2.22. A wilson current mirror.[2.44]

#### 2.3.3 Dynamic type [2.46]

The conventional dynamic type level converter is shown in Fig. 2.16(c). A dynamic type level converter is free from the imbalance diving strength problem but has a synchronization problem between two different voltage domains. In Fig. 2.23(a), a clock synchronizer is added before the dynamic level converter to produce a high voltage clock signal ( $M_{CLK}$ ) synchronized with the low voltage clock input (CLKL). The clock synchronizer is drawn in Fig. 2.23(b). A RST is used as a keeper to prevent the clock synchronizer from the static leakage noise. However, when operated in the sub-threshold region, the dynamic type level converter may consumes more than the static level converter.



Figure 2.23. Modified dynamic level converter [2.46]. (a) A dynamic level converter with a clock synchronization. (b) Schematic view of a clock synchronizer.

#### 2.4 An Overview on Level Converting Flip-Flop

It is essential to insert a level converter at the interface of two different voltage domains. However, a level converter may cause a propagation delay. In order to main constant throughput, a pipelining or parallelism scheme is exploited [2.49]. Therefore, low VDD clusters are followed by pipeline flip-flops and the level converter is merged in the flip-flops [2.50]-[2.52],[2.54]-[2.59]. These kind of flip-flops, latching and level converting at the same time, are called level converting flip-flops (LCFF). LCFF helps to reduce the power consumption of clock tree in the synchronous system by taking VDDL clocks signal.

#### 2.4.1 Slave Latch Level Shifting[2.50]

Slave latch level shifting (SLLS) flip-flops are based on the master-slave flip-flops, as Fig. 2.24 shown. The master part is operated in a low supply voltage. The level converter is emerged into the slave part so that the master-slave flip-flops

succeed in level conversion. The level converter is based on the cross-coupled type. However, the SLLS flip-flop has some drawbacks. For level conversion, the contention between the pull up devices and the pull down devices becomes aggravated when  $V_{DDL}$  is much lower than  $V_{DDH}$ . This may result in a large delay and consumes more power during the transition. Moreover, there are many gates on the critical path, so the data-to-output (D-Q) latency is longer. The clock signal has to drive more gates so the clock network will consume more power.



Figure 2.24. Slave latch level shifting [2.51]

#### 2.4.2 Clock Level Shifted Sense Amplifier Flip-Flop[2.50]

Another type of LCFF is based on the sense amplifier flip-flops, as Fig. 2.25 shown, called clock level shifted sense amplifier (CSSA) flip-flop. It consists of a sense amplifier latch and a set-reset latch. Because the low swing clock can't drive the PMOS transistors supplied by  $V_{DDH}$  efficiently, the clock level shifter is used to resolve this problem. The clock level shifter levels up the clock signal so that the CSSA flip-flop functions correctly. CSSA employs a dynamic precharge stype. The node *sb* and the node *rb* are precharged every clock cycle even when the data is  $\frac{26}{26}$ 

unchanged. This may cause an internal redundant switching power consumption, which makes a power penalty. Besides, when  $V_{DDL}$  is much lower than  $V_{DDH}$ , there is also a corssover contention problem.



### 2.4.3 Self-Precharging Flip-Flop[2.51]

In order to reduce the internal redundant power consumption, a self-precharing flip-flop was proposed [2.51], as Fig. 2.26 shown. It adapts a conditional capture technique [2.52] to avoid the redundant internal transitions when the data is unchanged. Assumed that Q=1 and QB=0 in the previous state. In the next state, D is unchanged which is still equal to 1. The node *SB* is not discharged because there is no discharging path to ground, NMOS transistor driven by the feedback signal Q is turned off. However, if the data switching activities are high, it can't gain the benefit from conditional capture technique. In addition, the delay of the self-precharging circuit should be long enough to make the input date to propagate to the output.



Figure 2.26. Self-Precharging Level Converting Flip-Flop[2.52]

### 2.4.4 Pulsed-Triggered Level Converting Flip-Flop

A pulsed-triggered flip-flop is composed of a pulse generator and a latching. The level converter is implemented into the latch part. The pulsed-triggered flip-flops offers an attractive method of meeting delay and energy requirement. This kind of flip-flop inherently has a zero or negative setup time so that it can absorb the clock skew and jitter from timing budget in the critical path. Additionally, the pulsed-triggered LCFF can provide a small D-Q delay and has a low logic complexity. For the pulsed-triggered LCFF can be classified into single-edged/dual-edged and implicit-triggered/explicit-triggered.

### 2.4.4.1 Single-Edged /Dual-Edged

Depending on the number of clock triggering edge, the pulsed LCFF can be separated into single-edge triggered and dual-edged triggered. Comparing these two types of LCFF, the single-edged LCFF get the input date only on one of the clock edges and dual-edged LCFF can capture the input data at both of the clock edge, rising edge and falling edge. The dual-edged triggered LCFF can maintain the same throughput as single-edge triggered LCFF with a half clock frequency. Therefore, the 28 power consumption of the clock tree can be reduce a lot. However, the dual-edged triggered LCFF should consider the timing constraints, such as the duty cycle variations [2.53]. Self precharge flip-flop[2.51] and dual-pass-transistor flip-flop[2.54] are the examples of the single-edged triggered LCFF. In [2.54], the pulse generator produce a pulse only at the rising clock edge so that NI and N2 are turned on to pass the signal, as Figure 4 shown. Recently, the dual-edged triggered LCFFs have become a promising way to reduce the delay and power overhead for the level conversion in the multiple supply voltage systems, [2.55]-[2.60]. In [2.57], the block colored in green means been supplied by a low voltage, the thick line represents high threshold voltage, and the thin line stands for low threshold voltage. *Pulse 1* is produced at the positive clock edge to turn on M3 and M5. *Pulse 2* is generated at the negative clock edge to switch on M2 and M4. There is an extra function which retains the data even when the flip-flop is in the sleep mode.



Figure 2.27. Single-edged triggered flip-flop[2.55]



Figure 2.28. Dual-edged triggered flip-flop[2.58]: (a) Pulsed-triggered LCFF (b)Dual-edge pulse trigger circuit

### 2.4.4.2 Implicit-Triggered/Explicit-Triggered

Another category of the pulsed-triggered LCFF is decided by whether has a distinctive pulse generator or not. If a pulse generator is combined into the latch, this kind of pulsed-triggered LCFF called an implicit-pulsed triggered LCFF[2.55]-[2.56]. In [2.55], the four inverters in the dot-line box construct a pulse generator, as Fig. 2.29 shown. At the positive clock edge, N3, N7, N8, and N10 are turned on to sample the input data. The capturing window width is about three inverter delays At the negative clock edge, N2, N4, N9, and N11 are switched on to capture the input data. In this architecture, it employed the conditional discharged technique so that save a redundant internal power consumption. If a pulse generator is outside the latch, this kind of pulsed-triggered LCFF is called an explicit-pulsed triggered LCFF [2.57]-[2.61]. In [2.58], it proposed a 4T-XOR logic gate to generate the pulse at the clock rising edge and the clock falling edge, as Fig. 2.30 shown. The explicit-pulsed triggered LCFF has a higher power overhead because of a pulse generator than the implicit-pulsed triggered LCFF. However, the explicit-pulsed triggered LCFF can share a common pulse generator among the latches so that reduce the power and the area overhead.



## **Chapter 3**

### A Wide Range DLL-based Multiphase Clock

## **Generator with Duty Cycle Correction in 65nm**

CMOS

In order to increase the bandwidth of the data rate in a high-speed system, the multiphase clocks has been exploited. A wide range DLL-based multiphase clocks is proposed. The eight phases is divided from a clock cycle. In this work, There are two control mode to make the proposed multiphase clocks to form a close loop. The first mode is successive approximation register-controlled (SAR) mode. The SAR mode, which is the binary search algorithm, helps to accelerate the lock in speed. When the output clock is locked, the second mode is counter mode. The digital delay block control word is added or subtracted by 1. In addition, the proposed multiphase clock generator can be operated from 80MHz to 500 MHz. A harmonic detection is proposed to avoid a harmonic lock. When the supply voltage is 1.0V and the operating frequency is 500MHz, the proposed multiphase clock generator consume 0.29 mW. The operation range is from 80MHz to 500MHz.

The clock signal is transmitted through the clock tree. Due to the unmatched

clock diver, the clock duty cycle is deviated from 50%. A PVT robust all-digital duty cycle corrector (DCC) is proposed, which is based on the SARDLL. A PVT detection is adapted in the this work so that the output duty cycle error rate can reduced. When the supply voltage is 0.5V and input frequency is 167MHz, the proposed duty cycle consumes  $26.30 \mu$ W.

Section 3.1 gives an introduction of the DLL-based frequency multiplier. The multiphase clock applications are discussed in Section 3.2. The implementation of the multiphase clock generator is given in Section 3.3 and Section 3.4. The implementation of the duty cycle is described in Section 3.5. Finally, Section 3.6 concludes our work

### 3.1 Introduction

Phase-Locked loops (PLL) and delay-locked loops (DLLs) have been widely utilized to eliminate clock signal skews and jitter in high-speed microprocessors, memory interfaces and communication integrated circuits (ICs). In addition, they are capable of producing the multiphase clock signals. Many clock multiplication schemes have been proposed. Phase-locked loops (PLLs) are usually used as clock generator, but its locking period takes hundred of reference clock cycles. To enhance the flexibility of clock generator, an all-digital clock generator is presented [3.1] which generates output clock by delaying the reference clock dynamically according to the frequency control code. However, the output frequency can only be fraction of reference clock. Delay-locked loop (DLL) [3.2] was presented for DVFS system, but it couldn't generate fractional clock. Cyclic clock multiplier (CCM) has been presented for DVFS application [3.3], and it has the advantage of creating fractional or multiplied clock. However, the cyclic clock multiplier uses TDC for phase error detection which will consume much area and power. Generally, the DLL has better jitter performance than the PLL because there is no jitter accumulation characteristic in the DLL.

For the high-speed systems, the data can be designed to be sampled by both of the positive clock edge and negative clock edge so that the throughput is increased a lot. For the low power systems, if maintaining the same throughput, the clock frequency can be decreased to a half of clock frequency. Once the clock frequency is reduced, the clock network consumes less power. Therefore, a clock signal with 50% duty cycle is a critical key for these applications. If there is a duty cycle distortion, it may cause the degradation of the performance. However, a duty cycle of the clock signal from the off-chip is prone to deviate from 50% while operated in a high frequency. In addition, even the clock generator produces a 50% duty cycle clock signal, there is probably a deviation in the duty cycle because of the unmatched clock driver in the rising edge and falling edge. In order to solve this problem, the duty cycle corrector (DCC) have been widely used to adjust the duty cycle as close to 50% as possible. 

#### **Multiphase clock applications** 3.2

#### Frequency synchronizer[3.4] 3.2.1

A DLL can operate as PLL, which uses delay line to replace VCO. Fig. 3.1 shows the simplified block diagram of DLL-based frequency synthesizer. When the loop is locked, the output phases of every delay stage are evenly spaced one reference clock period T<sub>ref</sub>. Each phase difference of two delay stage has a delay of T<sub>ref</sub>/N and the edge combiner can generates a transition for each phase output transition, hence the output frequency is the N times the reference frequency T<sub>ref</sub>. A multiplying DLL 34 overcomes the drawbacks of PLL such as jitter accumulation, high sensitivity to supply, and substrate noise. For this reason, it represents a good performance for phase noise.



### **3.2.2** Clock and data recovery[3.5]

A block diagram is shown in Fig. 3.2 There are two main components of the CDR-an analog PLL (this part can be replace with the digital DLL, multiphase clocks to sample the data) and a digital CDR. The PLL's function is to generate evenly spaced multi-phase clocks which drive the receiver samplers. There are eight such clock phases and samplers-four for clock recovery and four for data recovery. A bang-bang phase detector generates 3-level phase error information by performing early/late detection and a simple majority vote on the 32 incoming samples. This phase error is filtered by a digital loop filter consisting of a proportional and a integral path to produce a 14-bit filter output. Given the difficulty of implementing a 14-bit phase interpolator with good linearity, a fully digital CDR controller that takes advantages of the phase filtering characteristics of the PLL is employed.



Figure 3.2. Clock and data recovery [3.5].

#### 3.2.3 DRAM interface[3.6]

The calculations for timing budget show that the optimal value for tSD is approximately 20 percent of an input clock period. Since the input clock frequency range from 100MHz to 200MHz (DDR-200/266/333/400), the tSD value varies from 2ns (=10nsX0.2) to 1ns (=5nsX0.2). Therefore, a five-phase all-digital DLL was proposed in [3.6] to generate the desired tSD delay for DQS signal. The block diagram of the five-phase all-digital DLL for DDR SDRAM controller application is shown in Fig. 3.3. Like most of DLL-based multi-phase clock generators, the DLL has a multi-stage delay line with the same control word to generate equally spaced multi-phase clock output. It uses the time-to-digital (TDC) scheme to lock whole loop. Hence, a design consideration should be noticed is that sometimes it is difficult to meet the minimum delay constraint when using standard cell to build up a high resolution delay cell. Therefore, the DLL in this design is lock to two periods of the reference clock period by using TDC scheme. After DLL is locked, the phase spacing of each delay stage should be  $2*T_{FREF}/5$ , where  $T_{FREF}$  means the clock period of the reference clock. Hence the minimum delay constraint for each delay stage is extended twice as original. The total delay from DQS to DQSD becomes  $1.2xT_{\text{FREF}}$ , which

means the phase shift between DQS and DQSD is still  $0.2xT_{FREF}$ . As a result, the desired tSD delay can be generated by the multiphase DLL.



### 3.3 System architecture

The proposed all-digital DLL-based multiphase clocks architecture is shown in Fig. 3.4. It consists of four major blocks: eight digital controlled delay blocks, phase detector (PD), delay block controller, and anti-harmonic detection. In our work, when the *Reset* signal is high, the eight delay blocks are clear. If the *Reset* signal is low, the *CLK\_ref* signal passes through the eight delay block. The operation is divided into four steps. The finite state machine is shown in Fig. 3.5. At first, the proposed multiphase clocks is in the anti-harmonic detection. Our work provides a wide operation range. It may result in a harmonic problem. For example, for the ideal situation, eight phases are separated in one clock period. Due to a wide delay range, the clock generator probably lock in the output clock with the two clock periods, which means eight phases are separated from the two clock period. Therefore, the data sampling rate is reduced. While the anti-harmonic detection is finished, the next step is the SAR mode. In the SAR mode, the delay block is controlled by a digital code which is produced from SAR controller. SAR control uses

the binary search algorithm. Finishing the SAR mode step, the proposed multiphase clocks is in the lock state. Due to the characteristic of the SAR control scheme, when entering the lock state, the clock generator becomes an open loop. An open loop is easily effected by the environmental variations. Thus, the multiphase clock generator is perhaps out of the lock state. If the clock generator is locked, the counter mode is triggered. The counter block will continue tracking the means of counter which adds or substrates by 1 at a time to the digital delay block control code. By utilizing the counter mode, the whole clock generator is in always in the close loop. Even if there exists the environmental variations, the clock generator will be locked to the reference clock.



Figure 3.5. Finite state machine

#### **3.4** Circuit description

#### **3.4.1 Delay blocks**

The *CLK\_ref* signal goes through the eight delay blocks. Our target is to provide a wide range operation. In each delay block, it includes a coarse tune delay line and a fine tune delay line, as Fig. 3.6 shown. A coarse is used to enlarge the delay range and delay step so that the searching speed can be accelerated. A fine tune delay line is utilized to increase the delay resolution. A high delay resolution helps to reduce the clock jitter.



#### **3.4.1.1 Coarse tune-Nest-lattice**

For the coarse tune delay line, the nest-lattice structure [3.7] is adopted in our work. The nest-lattice delay is composed of the cascading lattice delay unit. For a conventional delay line, if the tunable delay range is increased by cascading the delay unit, the intrinsic delay is also increased. Therefore, the maximum operation frequency is limited. However, for the nest-lattice delay line, this problem can be avoided. The intrinsic delay of the next-lattice delay is four NAND gate delays. Each delay step is two NAND gate delays. The relationship between the input vector and 39



the delay is shown in Fig. 3.8. The delay resolution is about 55ps.

Figure 3.8. The relationship between the digital control code and the coarse delay.

#### 3.4.1.2 Fine tune-Current-starve

The fine tune delay unit is employed the current starve type inverter, shown in Fig. 3.9. Two inverter are cascaded to form a buffer. Each coarse delay line has two NAND gate delay step. After eight delay block, each control bit has sixteen NAND gate delay. We use the fine tune delay line to increase the delay resolution. The digital 40

control word b[2:0] is fed into the fine tune delay line. It also includes a 3-bit binary-to-thermometer decoder to output the current starve inverter word f[6:0]. The delay of the current starve inverter is controlled by the conduction current. More gates are open, the delay is smaller. Fig. 3.10 shows the relationship between the input vector and the delay. In the coarse tune, the delay resolution is about 55ps. Therefore, for the fine tune, the delay resolution is about 6ps.



Figure 3.10. The relationship between the digital control code and the fine tune delay

#### 3.4.1.3 Binary-to-thermometer decoder

The binary-to-thermometer decoder is adopted in the delay block. The thermometer code provides a monotonic characteristic. For example, the current starve inverter are open more gate when the control words are decreased. The  $_{41}$ 

thermometer code is changed one bit between the two adjacent binary numbers. Also, the thermometer decoder scheme can reduce the glitch when comparing with the binary scheme. Fig. 3.11(a) shows a N-bit binary-to-thermometer decoder architecture. Fig. 3.11(b) illustrates 2-bit binary-to-thermometer decoder structure. In order to make each signal path as equal as possible, a NAND gate is used as an inverter. Fig. 3.11(c) presents a 3-bit binary-to-thermometer decoder. A 2-bit binary-to-thermometer decoder is utilized in the 3-bit binary-to-thermometer decoder. Therefore, this kind of binary-to-thermometer decoder has a simple rule to follow. Finally, for N-bit binary-to-thermometer decoder is shown in Fig. 3.11(a).



Figure 3.11. (a)N-bit binary-to-thermometer decoder. (b) 2-bit binary-to-thermometer decoder. (c) 3-bit binary-to-thermometer decoder.

#### 3.4.2 Phase detector

The phase detector is composed of two D flip-flops, two XOR logic gates, and a delay cell, as Fig. 3.12(a) shown. The eighth phase clock is feedback into the phase detector as a clock signal of the D flip-flop. The *CLK\_ref* signal is used as the data of the D flip-flop. The eighth phase clock is delayed by two NAND gate delay as the clock signal of the second D flip-flop. The two NAND gate delay will form a detection window. Two output signal of the phase detector are *Comp* and *Lock*. Fig. 3.12(b) demonstrates how to judge the output signal. If *CLK\_ref* signal is located in the detection window, the *Lock* signal is pulled up. When the CLK\_ref signal appears after the detection window, the delay block should provide a long delay time in the next clock cycle, and vice versa. The truth table of the phase detector is shown in Table 3.1.



Figure 3.12. (a) Phase detector circuit block. (b) Operation diagram

| Q1 | Q2 | Comp | Lock |
|----|----|------|------|
| 0  | 0  | 1    | 0    |
| 1  | 1  | 0    | 0    |
| 0  | 1  | Х    | 1    |

Table 3.1. The truth table of the phase detector.

#### 3.4.3 Delay block controller

The digital DLLs are four categories. The first type is register-controlled DLL (RDDL) [3.8]. The n-bit shift register which is controlled by the output of phase detector is used to generate control signals for the digitally controlled delay line. When the operating range is increased, the additional delay stages of delay line should be added. This increases the chip area. Because the control mechanism is one by one, the more delay stages needs more shift registers to control the delay line. Thus, it also increases locking time. In the worst case, n-bit shift register needs n/2 locking cycles. The second type is counter-controlled DLL (CDLL) [3.9]. The operating principle of counter-controlled DLL is similar to register-controlled DLL expect the up/down counter substitutes for the shift register to control the delay line. The n-bit control word determiners whether the input signal goes through the delay path or passes it. The most different point between RDLL and CDLL is area requirement. In the worst case, with n-bit binary-weighted delay line, the locking time maintains n/2 locking cycles. The third type is time measurement DLL (TMDLL) [3.10]. It can measure the input clock period and convert it to digital signals within two clock cycles, then transfer the digital control word to the control block. The search time of TMDLL is quite fast, but it has an area overhead. The final type is successive approximation register-controlled DLL (SARDLL) [3.11]. The SARDLL changes the searching mechanism to binary search algorithm and adopted with binary-weighted delay line. It is not only reduces the chip area but also shorten the locking time. The SAR controller in the DLL determines the value of each bit of the word in a sequential and irreversible. Therefore, it becomes an open-loop type circuit after lock-in and never against the PVT variation.

The delay block controller has two mode-SAR mode and counter mode, as Fig. 3.13 shown. The SARDALL architecture provides a short lock-in time and a low hardware complexity. SAR mode is shown in Fig. 3.14(a). Fig. 3.14(b) illustrates a SAR controller circuit block. In the proposed architecture, the SAR has an extra control signal-*active*, which is derived from the anti-harmonic detection block. Only the *active* signal is high, SAR procedure can be operated normally. The *CLK\_ref* signal is divided by 4 to produce the *clk\_sar* signal. Table 3.2 presents the truth table of the SAR mode. However, it can't track the environmental variations after achieving the lock state. A counter mode [3.12] is adapted into the delay block controller. The counter block will continue tracking the means of counter which adds or substrates by 1 at a time to the digital delay block control code. By utilizing the counter mode, the whole clock generator is in always in the close loop.



Figure 3.13. Dual mode of the delay block controller-SAR mode and counter mode.



| Table 3.2. The truth | table of the | SAR | controller. |
|----------------------|--------------|-----|-------------|
|----------------------|--------------|-----|-------------|

| active | enable | bitk | Operation         |
|--------|--------|------|-------------------|
| 1      | 1      | X    | Memorization(k)   |
| 1      | 0      | 1    | Load comp(a)      |
| 1      | 0      | 0    | Shift right (k+1) |
| 0      | Х      | Х    | Shift right (k+1) |

#### **3.4.4 Harmonic detection**

The SAR scheme encounters a harmonic problem when applied in a wide range operation. If the eighth phase (P8) is located in the detection window from  $0.5T_{ref}$  to  $1.5T_{ref}$ , the phase detector sends a correct information to the delay block controller and the SAR scheme operates as the binary search algorithm. Fig. 3.15(a) shows the operation of the modified SAR controller. There is an extra signal-over. In the proposed multiphase clock architecture, the input clock and the final output clock are compared to generate an information to the delay block controller. If the final phase clock is located in the detection window, the phase detector can be operated normally. Otherwise, it will encounter the harmonic problem. Therefore, the first three phase clocks(P1, P2, and P3) are used to judge the eighth clock whether in the detection window or not. Three phase are employed as a clock signal of the D flip-flops and the CLK\_ref signal is as a data of the D flip-flop. Only when the three outputs of the D flip-flop are "1", the active signal is "1". Fig. 3.16 shows the anti-harmonic detection circuit block and Table 3.3 shows the truth table of the anti-harmonic detection circuit block. Because the strict restriction (only the three output are "1"), the complexity of the design is reduced a lot.



Figure 3.16. Anti-harmonic detection circuit block.

| Q2 Q3 Q1 | L    | Н      |
|----------|------|--------|
| LL       | Over | Over   |
| LH       | Over | Over   |
| HL       | Over | Over   |
| НН       | Over | Active |

Table 3.3. The truth table of the anti-harmonic detection block

### 3.5 Duty cycle corrector with a PVT detection

#### 3.5.1 System architecture

Fig. 3.17 shows the proposed architecture of duty cycle corrector (DCC) with a PVT detection. Our work provides two function - a clock synchronization and duty cycle correction. The clock synchronization is employed SARDLL architecture. There are three extra circuit blocks- half delay line, edge combiner, and PVT detection. The edge combiner acts as a SR latch. The Set (S) signal is from the output of the main delay line and the reset (R) signal is from the output of the half delay line. R signal is delayed a half of the clock period compared with S signal. Therefore, the CLK<sub>OUT</sub> has a 50% duty cycle. A PVT detection is utilized to make the half delay line to provide a half of clock period more accurately because of the PVT variations. There are three steps for the whole operation. After the reset signal is low, the first step is SAR control step. In this step, the output of DCC is feedback to the phase detector along with the CLK<sub>IN</sub>. Therefore, the output clock signal can be alignment with the input clock. When the output clock is locked, the DCC enters the Lock step After the Lock step is the PVT detection step. In this step, the PVT detection circuit block to set the half delay line so the R signal is delayed a half of the clock period. Finally, the DCC provide a 50% duty cycle output signal. Fig. 3.18 shows a finite state machine of the

proposed DCC.



Figure 3.17. The proposed duty cycle corrector with a PVT detection.



#### 3.5.2 PVT detection

For the conventional DCC architecture, the half delay line is also controlled by the SAR controller. In the proposed DCC architecture, there is a PVT detection circuit block to make the half delay line more accurate. The PVT detection circuit block consists of a PVT sensing circuit, a counter and a decoder. The PVT sensing circuits uses a ring oscillator which can be switched on or off. When the clock generator is in PVT detection state, the switch signal is turned on for one reference clock cycle. Then counter records the number of oscillated cycles.



Figure 3.19. PVT detection circuit block

#### 3.5.3 Performance summary

Adapting the PVT detection in the proposed duty cycle corrector, the output duty cycle error is reduced up to 17%, shown in Fig. 3.20.. Table 3.5 lists the performance comparisons. This work provides a correction range from 25% to 75%. The proposed duty cycle corrector is suitable to operate in the near-threshold region.



Figure 3.20. Output duty cycle error comparison.

|                  | [3.13]     | [3.14]  | This work |
|------------------|------------|---------|-----------|
| Technology       | 0.18µm     | 0.18µm  | 65nm      |
| Supply voltage   | 1.8V       | 1.8V    | 0.5V      |
| Frequency range  | 0.8 GHz    | 440 MHz | 100MHz    |
|                  | ~1.2G Hz   | ~1.5GHz | ~ 500MHz  |
| Correction range | 40%~60%    | 30%~70% | 25%~75%   |
| Locking time     | <10 cycles | 15      | 32        |
| Power            | 12mW       | 43mW    | 26.30µW   |
| 3.6 Conclusion   |            |         |           |

Table 3.4. Performance summary of the proposed duty cycle corrector.

#### Conclusion 3.6

A wide range DLL-based multiphase clock generator is proposed. The operating range is from 80 MHz to 500MHz. With the proposed harmonic detection circuit, the proposed multiphase clock generator is free from the harmonic problem. The delay block controller has the dual modes- SAR mode and counter mode. The SAR mode helps to accelerate the lock in speed and the counter mode keeps the proposed work tacking the environmental variations when finishing the SAR search. When the input voltage and frequency are 1.0V and 500MHz, the power consumption is 0.29mW. With the proposed duty cycle corrector, the clock signal has a 50% duty cycle. The proposed duty cycle can be operated as low as 0.5V. The correction range is from 25% to 75%. The operation range is 100MHz to 500MHz. With the PVT detection, the output duty cycle error can be reduced up to 17%. When the input voltage and frequency are 0.5V and 167MHz, the power consumption is  $26.30 \mu$ W.

## **Chapter 4**

# An Energy-Efficient Level Converter with High

## **Thermal Variation Immunity for Sub-threshold**

to Super-threshold Operation

A multiple supply voltage scheme is an emerging approach to reduce power dissipation. The scheme requires a level converter as a bridge for different voltage domains. Conventional level converters fail to work in sub-threshold region due to the pull-down devices and the pull-up devices operate in sub-threshold and super-threshold region respectively. By employing diode-connected PMOS transistors, multiple-threshold-voltage CMOS (MTCMOS), and stack leakage reduction techniques, the proposed cross-coupled level converter achieves small propagation delay, low power consumption, and best power-delay-product (PDP) performance. Also, the reverse short channel effect is utilized to provide our level converter better process/thermal variation immunity. We also propose a dual edge-triggered explicit-pulsed level-converting flip flop (LCFF) concept combining a DCVSPG latch and our level converter. The proposed cross-coupled level converter is designed using TSMC 65nm bulk CMOS technology. It functions correctly across all process corners for a wide input voltage range, from 150mV to 1V. The level converter has a 53

propagation delay of 52ns and a power dissipation of 21nW when the input voltage is 150mV.

In this chapter, we propose a power-delay-product optimized and robust level converter with high thermal variation immunity for sub-threshold to super-threshold operation. This chapter is organized as follows. A introduction is given in Section 4.1. Diode-connected PMOS transistors, multiple-threshold-voltage CMOS, and stack leakage reduction techniques are discussed. Also, reverse short channel effect, sub-threshold device sizing, and inner inverter device sizing are analyzed in Section 4.2. The simulation results of this work under TSMC 65nm CMOS technology are proposed in Section 4.3. Finally, Section 4.4 concludes this work.

### 4.1 Introduction

Power dissipation becomes a critical concern in emerging portable applications such as biological systems or wireless electronics. Constrained by a small form factor, the battery lifetime is a critical challenge. Ultra-low voltage design has been proofed to be an effective solution since supply voltage is quadratic function of energy. However, the side effect of scaling down the supply voltage is the degradation of performance and robustness. Multiple supply voltage techniques have been presented for low power design [4.1]. Some parts of a digital system are employed a nominal supply voltage to meet the performance needs. The other parts are operated in the sub-threshold region to save the power dissipation. Such multiple voltage designs can run different blocks at the different supply voltages to perform dynamic voltage and frequency scaling (DVFS) on different voltage domains. Between the two different voltage domains, it may occur a situation that a lower supply voltage gate drives a higher supply voltage gate. While the high output of a lower supply voltage gate is not strong enough to fully turn off a PMOS gate supplied by a higher supply voltage, this results in a DC leakage path from the voltage source to the ground and increases the power dissipation. In addition, if a higher supply voltage gate is driven by a lower supply voltage gate, it cannot have a full output swing and causes a function error. To solve these problems, a level converter is essentially inserted at the interface between two different voltage domains. Nonetheless, the level converter also consumes power and causes a considerable timing delay. In the multiple supply voltage systems, it is crucial to design a high-speed and energy-efficient level converter.

Fig. 4.1(a) shows a conventional level converter. Two cross-coupled PMOS transistors form a positive feedback loop to make the output full swing. However, the cross-coupled level converter encounters an imbalance driving strength problem so that the positive feedback can't be triggered. For the signal converting from sub-threshold to super-threshold, some transistors of the level converter are operated in the sub-threshold region and the other transistors perform in the super-threshold operation. A Monte Carlo simulation of the conduction current in 65nm CMOS technology is shown in Fig. 4.1(b). It demonstrates that the driving ability of super-threshold PMOS is much larger than sub-threshold NMOS. Such driving ability difference leads to a level converter failure.



Figure 4.1. (a) Convention cross-coupled level converter. (b) Monte Carlo simulation of conduction current

Recently, many level converters [4.2]-[4.10] have been designed to operate in the sub-threshold region. In [4.2], the cross-coupled PMOS transistors are connected by two diode PMOS transistors. They can reduce the pull-up devices driving ability to enable sub-threshold operation. There is a short current problem occurred in [4.2] resulting large power dissipation, as Fig. 4.2 shown. A level converter with a short current limiting technique was presented in [4.3]. It inserted two NMOS transistors in the positive feedback loop between the latch PMOS transistors and the PMOS diodes, as Fig. 4.3 shown. The two NMOS transistors speed up the transition to avoid a short current path. However, the reliability of two NMOS transistors is susceptible to the

variations. In [4.4], it cascaded two conventional cross-coupled level converters to prevent the short current path, as Fig. 4.4 shown. Nevertheless, the cascaded architecture results in a slow propagation speed at higher supply voltage.



Figure 4.4. A level converter with two cascade cross-coupled level converters [4.4].

# 4.2 Proposed Energy-Efficient Level Converter with High Thermal Variation Immunity

The schematic view of the proposed level converter is shown in Fig. 4.5. It is based on the cross-coupled level converter and adapts two diode-connected PMOS transistors in [4.5]. The multiple-threshold-voltage CMOS design is also employed in the proposed level converter. In addition, a stack leakage reduction technique is used to reduce the power consumption. Reverse short channel effect is also exploited to make the proposed level converter more reliable and robust across all the process corners and temperature variations.



Figure 4.5. Schematic view of the proposed level converter

#### 4.2.1 Diode-Connected PMOS Transistors

From Fig. 4.1(b), we can find that the conventional level converter has an imbalance driving ability problem when converting a signal from the sub-threshold region to the super-threshold region. For TSMC 65nm CMOS technology, the ratio of pull-down device (NMOS) and pull-up device (PMOS) should be larger than 200X so

that the conventional level converter can barely be operated at an input supply voltage of 200mV. The resulting width of NMOS causes a huge area overhead. In Fig. 4.5, the proposed level converter utilizes diode-connected PMOS transistors to reduce the pull-up driving ability [4.5], MP<sub>3</sub> and MP<sub>4</sub> serve as a current limiter. Two PMOS diodes maintain its initial value during the transition. The initial value is equal to a small diode voltage drop and limits the PMOS strength. As a result, the pull-down devices, MN<sub>1</sub> and MN<sub>2</sub>, can sink the I<sub>1</sub> current even when the circuit is operated in the sub-threshold region. Comparing the results in Fig. 4.1(b) and Fig. 4.6, the PMOS conduction current is decreased dramatically and closer to the NMOS conduction current. Thus, they have a comparable driving ability. By connecting two PMOS diodes, the modified cross-coupled level converter overcomes the imbalance conduction current problem when operated at low voltage. Thus, the proposed level converter can convert the signal from sub-threshold region to super-threshold region



Figure 4.6. Monte Carlo simulation of conduction current of two diode-connected PMOS transistors
### 4.2.2 Multi-threshold-voltage CMOS (MTCMOS)

The MTCMOS design is usually provided in the modern technology. Low-threshold voltage (LVT) devices take the advantage of the speed but have a severe leakage current problem. High-threshold voltage (HVT) devices have a less leakage current but sacrifice the speed. There is a trade-off between the propagation delay and the power consumption. Therefore, power-delay-product (PDP) should be used as a figure of merit for level converter analysis. To further weaken the PMOS strength, MP<sub>1</sub>, MP<sub>2</sub>, MP<sub>3</sub>, and MP<sub>4</sub>, use the HTV devices. To enhance the NMOS strength, MN<sub>1</sub>, MN<sub>2</sub>, MN<sub>3</sub>, and MN<sub>4</sub>, are considered using the LTV devices. If all the NMOS transistors are utilized the LTV devices, the power consumption will be increased very much. In the proposed level converter, only MN<sub>3</sub> and MN<sub>4</sub> use the LTV devices. This configuration can make a faster speed when output changes from high to low and improve the total propagation delay time. From Fig. 4.7, we can find that the pull up current is shifted to left and becomes more convergent.



Figure 4.7. Monte Carlo simulation of using HVT devices for pull up PMOS transistors

### 4.2.3 Stack Leakage Reduction Technique

The leakage current causes the static power consumption. With the scaling down technology, this problem becomes severely in the LVT logic block. In Fig. 4.8(a), the LVT block is connected by a HVT NMOS transistor [4.11]. When the LVT block is in a sleep mode, the connected HVT transistor is also turned off by a sleep signal to avoid a leakage path to the ground. A sleep mode means the block function is turned off. From the Monte Carlo simulation in Fig. 4.8(b), the leakage current is reduced quietly a lot when using the leakage reduction technique. This technique is adapted in the proposed level converter. MN<sub>3</sub> and MN<sub>4</sub> are the LTV devices, they are connected by the HTV NMOS transistors, MN<sub>5</sub> and MN<sub>6</sub>, as shown in Fig. 4.5. The signal of *Vout* and *Voutb* are feedback to control MN<sub>5</sub> and MN<sub>6</sub> adaptively. When the input is "1",  $MN_3$  is in an active mode and  $MN_4$  is in a sleep mode. *Vout* is charged by  $MP_2$ and turns on MN<sub>5</sub>, so the left branch works as usual. *Voutb* is discharged by the MN<sub>1</sub> and turns off MN<sub>6</sub>. The right branch is in a sleep mode. When the input is "0", Vout turns off MN<sub>5</sub> and Voutb turns on MN<sub>6</sub>. The left branch is in a sleep mode and the right branch works as usual. In both of the situations, there are no leakage path existing.



Figure 4.8. (a) Leakage reduction technique [4.11]. (b) Monte Carlo simulation of leakage current with/without leakage reduction technique

#### 4.2.4 Reverse Short Channel Effect [4.13]

The minimum channel length is typically selected for the optimal speed and power performance in the super-threshold operation since the short channel effect is a dominant factor. However, there is a different scenario in the sub-threshold region. Because of the significantly reduced drain-induced-barrier-lowering (DIBL), the reverse short channel effect becomes a major factor in the sub-threshold operation. Due to the reverse short channel effect, the threshold voltage decreases monotonically and the conducting current increases exponentially when the channel length is longer. Thus, the best PMOS channel length of the proposed level converter is not the minimum length. From Fig. 4.9(a), the propagation delay increases with an increase of length. Based on the simulation data, the optimal device sizing is 85nm in this work, as Fig. 4.9(b) shown. While the channel length is longer than the optimal length, the reverse short channel effect is weak.



Figure 4.9. Short channel effect. (a) Delay and power simulation (b) PDP value simulation

### 4.2.5 Sub-threshold device sizing

Except for reducing the pull up driving ability, enhancing the pull down driving current is another way to solve the imbalance current problem. The pull down devices are operated in the sub-threshold region so that the sizing technique has a linear impact on the current. To increase the width of the NMOS transistors makes the pull down driving ability stronger. From Fig. 4.10(a), the propagation delay is reduced by

increasing the width of the NMOS transistors. However, this results in the power consumption. From Fig. 4.10(b), we can find the optimal point is at 160nm. Comparing with other techniques, the sub-threshold device sizing has a small improvement of the performance.



(b)

Figure 4.10. Sub-threshold device sizing. (a) Delay and power simulation (b) PDP value simulation

### 4.2.6 Inner inverter device sizing

In the proposed level converter, there is an inner inverter responsible for providing the differential input. Its supply voltage level is as the same as the input signal voltage level. Therefore, the inner inverter is also operated in the sub-threshold 64

region and causes a propagation delay time. As Fig. 4.5 shown,  $MN_2$  and  $MN_4$  are controlled by the output of the inner inverter. The positive feedback loop has to wait for  $MN_2$  and  $MN_4$  settling down to be triggered. However, the faster speed brings larger power consumption, as Fig 4.11 shown. Therefore, there is a trade-off between delay and power. From Fig. 4.11(b), we find an optimal point when the width of inner inverter is 400nm.



(b)

Figure 4.11. Inner inverter sizing. (a) Delay and power simulation. (b) PDP value simulation

### **4.2.7** Proposed level converter performance

Combining the above mentioned techniques, the overall PDP value can be reduced up to 23%, as Fig. 4.12(c) shown. By connecting two PMOS diodes, the conventional cross-coupled level converter can successfully convert the signal from the sub-threshold region to super-threshold region. Using multi- $V_{th}$  devices improve the propagation delay up to 22%, as Fig. 4.12(a) shown. The leakage reduction technique compensates the LVT logic leakage problem so that the power can be reduced up to 26%, as Fig. 4.12(b). Employing the reverse short channel effect reduce the PDP value approximately up to 17%. Applying these techniques make the proposed level converter more robust and reliable.





(c)

Figure 4.12. Performance comparison. A: two diode-connected PMOS. B: multi-V<sub>th</sub> devices. C: leakage reduction technique. D: reverse short channel effect. E: inner inverter sizing. (a) By implementation B, delay reduction up to 22% (b) By implement C, power reduction up to 26% (c)By implement D, PDP reduction up to 17%. Finally, combining all implementation, overall PDP reduction up to 23%.

### 4.3 Simulation Results

For comparison, we implemented the following three level converters: conventional cross-coupled type, short current reduction type in [4.3] and two cross-coupled cascaded type in [4.4]. Iso-area analysis is used for the fair comparison.

### 4.3.1 Minimum input voltage

Comparing with the other three level converters, the proposed level converter has a minimum input voltage below the target voltage 200mV, as Fig. 4.13 shown. The minimum input voltage is defined as an input voltage which the level converter can function correctly at five process corners. We set VDDH to 1.0V at room temperature and swept the input voltage from 100mV to 1.0V at five process corners. For the sub-threshold level converter, the NMOS transistor should overpower the corresponding PMOS transistor to make the switch successfully. Therefore, the worst case is slow NMOS-slow PMOS corner. The minimum input voltage of the worst case is 150mV. Typical NMOS-typical PMOS and fast NMOS-fast PMOS are the best cases. The minimum input voltage of the best case is as low as 100mV. From the simulation, the proposed level converter has a wide operation range.



### 4.3.2 Propagation delay, Power, and PDP

The propagation delay comparison is drawn in Fig. 4.14(a). Our work is slight slower than the level converter in [4.3] when supply voltage is higher than 0.5V. The level converter in [4.4] provides a slow speed because of a cascaded architecture. For the propagation delay comparison, the proposed level converter shows a better performance when operated in the sub-threshold region. The power consumption comparison is shown in Fig. 4.14(b). Due to the leakage reduction technique, the LVT device leakage problem is compensated. Also, the reverse short channel effect device sizing helps the proposed level converter consumes less power. PDP comparison is drawn in Fig. 4.14(c). Our work presents a small PDP value, especially when operating at higher supply voltage. The proposed level converter is more energy-efficient.



Figure 4.14. Performance comparison between the proposed level converter and the existing level converter. (a) Propagation delay comparison (b) Power comparison (c) PDP comparison

### 4.3.3 Monte Carlo Simulation

Fig. 4.15 presents 5000-point Monte Carlo simulations for the supply voltage 500mV. Monte Carlo simulation demonstrates how the process variations affect the level converter characteristics. For the sub-threshold region, the Monte Carlo simulation of propagation delay is shown in Fig. 4.15(a). For the near-threshold region, the proposed level converter has the normalized variance value  $\sigma/\mu$  is 0.07, which is a relatively small value among the other two level converter. Our work shows the less sensitivity towards the process variations. The proposed level converter is more robustness than the other two level converters in [4.3] and in [4.4].





Figure 4.15. Monte Carlo simulation of propagation delay. (a) supply voltage is 200mV (b) supply voltage is 500mV

### 4.3.4 Temperature-induced delay variation

MOSFET mobility and threshold voltage are changed with the temperature. Consequently, the drain current is related with the temperature. We swept the temperature from 0°C to 125°C and measured the temperature variations on the propagation delay. For the simplicity, take the absolute value of the propagation delay difference as the temperature-induced variation delay. Comparing with the other two level converters in [4.3] and [4.4], Fig. 4.16 presents that the proposed level converter has less the temperature effects on the propagation delay. The temperature variations on the delay can be reduced up to 99% at the higher supply voltage. Thus, the proposed level converter is more process, voltage, temperature robustness.



### 4.4 Conclusions

A power-delay-product optimized and robust level converter is presented for sub-threshold to super-threshold signal converting. By combining energy-efficient techniques, PDP value of this work is decreased by 23%. Temperature induced variation on propagation delay is reduced up to 99%. The performance summary of the proposed level converter is given in Table. 4.1. The performance comparisons with [4.2]-[4.4] are also listed. This work provides a wide operation range, from 150mV to 1V across five process corners. When the input voltage is 150mV, it can achieve a propagation delay of 52ns and consume only 21nW. The proposed level converting flip flop with a modified DCVSPG latch. It is suitable to be the interface of two different rates and the proposed level converting flip flop with a modified DCVSPG latch. It is suitable to be the interface of two different rates and the proposed level for the proposed level converting flip flop with a modified DCVSPG latch. It is suitable to be the interface of two different proposed propagation pro

voltage domains in emerging dynamic voltage frequency scaling wireless applications.



Figure 4.17 Layout view of the proposed level converter

|                       | [2]        | [3]         | [4]        | This work  |
|-----------------------|------------|-------------|------------|------------|
| Technology            | 0.18µm     | 0.18µm      | 130nm      | 65nm       |
| Propagation Delay     | 10µs@127mV | 6.3ns@400mV | 35ns@227mV | 52ns@150mV |
| Power Consumption     | 20µW       | 7.9µW       | N.A.       | 21nW       |
| VDDH                  | 1.8V       | 1.8V        | 1.2V       | 1.0V       |
| Minimum Input Voltage | 127mV      | 400mV       | 227mV      | 150mV      |
| Transistor Numbers    | 10         | 14          | 11         | 12         |

Table 4.1. Performance Summary and Comparisons

# **Chapter 5**

### **A PVT Robust Dual-Edged Triggered**

1111

## **Explicit-Pulsed Level Converting Flip-Flop**

In a multiple supply voltage system, the level converters are inserted between two different voltage domains. However, those level converter may cause the propagation delays and power consumption. In order to eliminate the overhead of level conversion, a level converting flip-flop (LCFF) has been exploited. LCFFs provide a level converting function and a data latching function. A PVT robust dual-edged triggered explicit-pulsed level converting flip-flop (DETEP-LCFF) with a wide operation range is proposed. It is composed of a clock pulse generator and a modified differential cascode voltage switch with pass gate latch (DCVSPG). The clock pulse generator has the symmetric pulse triggering time and holding period helping shorten the D-Q delay. By employed diode-connected PMOS transistors and multiple-threshold devices in the DCVSPG latch, the proposed LCFF can operated from near-threshold region to super-threshold region. In addition, two NMOS transistor are stacked below the diode PMOS transistors to prevent a sneaky leakage current. The proposed dual-edged triggered explicit-pulsed LCFF is designed using TSMC 65nm CMOS technology. It function correctly across all process corners with a wide input voltage range, from 400mV to 1V. The proposed LCFF has a minimum 74 D-Q delay of 781ps, a setup time of -610ps, and a power dissipation of 2.3 µW when the input voltage is 0.4V.

Section 5.1 gives an introduction of the LCFFs and some discussion of the previous DETEP-LCFFs. The circuit implementation of proposed DETEP-LCFF is described in Section 5.2. The simulation results and comparisons are given in Section 25 OUF we. 5.3. Finally, Section 5.4 concludes our work.

#### Introduction 5.1

With the increasing demand of the mobile applications and the biologic portable systems, power dissipation has become a critical issue in the modern IC designs. Reducing the supply voltage is considered as the most potential approach for energy saving because of the quadratic relation between the supply voltage and power dissipation. However, lowering the supply voltage causes the degradation of the performance, such as incurring a large delay. The cluster voltage scaling techniques [5.1]-[5.2] had been proposed to reduce the power consumption without sacrificing the performance. For the cluster voltage scaling scheme, there exists two different voltage domains. A low supply voltage is applied to the noncritical path to save the power. A high supply voltage is employed in the critical path to meet the performance demanding. Recently, the similar philosophy are applied into the multiple supply voltage systems [5.3]-[5.4] and the dynamic voltage and frequency scaling (DVFS) systems [5.5]. By providing different supply voltage to different circuit blocks, we can meet the performance demanding and save the power consumption simultaneously. However, when a gate in a low voltage (VDDL) drives a gate in a high voltage (VDDH), a high output of a gate in a low voltage can not fully turn off a PMOS transistor in a high voltage. This may result in a function error and a short current problem. Therefore, it is essential to insert a level converter between two different voltage domains. A level converter will cause a propagation delay and power dissipation. In order to get rid of the overhead of level conversion, a low voltage cluster is usually followed by pipeline flip-flops. A flip-flop emerging with a level converter is called level-converting flip-flop (LCFF). LCFF can latching and level converting simultaneously. LCFF takes VDDL input (D) and clock signals (CLK) and provides a VDDH output stored signal (Q), as Fig. 5.1 illustrates. We did a simulation to show that we can gain some benefits from LCFFs. Fig. 5.2 shows a separate work, in which a level converter and a latch work respectively. In this architecture, a VDDL input D is converted into VDDH output Dout first. And then Dout is captured when pulse is high and stored in a latch. Fig. 5.3 presents a combination work, in which a level converter is emerged into a latch. The performance comparisons in 65nm CMOS technology are shown in Fig. 5.4. First, the combination work apparently has the less number of the transistors. In addition, the combination work has the less power consumption. The power is reduced approximately 68% when VDDL is 0.6V. The minimum D-Q delay is slightly lower than the separate work because there is no level conversion in the latch of the separate work. Finally, we can find that the combination work has a smaller product of propagation delay and power consumption (PDP) value. The PDP value is decreased up to 70% when VDDL is 0.6V, as Fig. 5.4(c) shown. According to the simulation results, the combination work, LCFF, shows a better performance.



Figure 5.1. A basic structure of a level converting flip-flop.



Figure 5.3. Combination work: a level converter is embedded into a latch



Figure 5.4. Performance comparisons between separate work and combination work. (a) Minimum D-Q delay comparison (b) power comparison (c) PDP value comparison

In the past, many LCFF designs[5.8]-[5.16] had been proposed. However, those LCFFs were only suitable for the super-threshold region operation. If operated in a ultra-low voltage, those previous LCFFs may encounter some challenges, such as an imbalanced pull up strength and pull down strength or a long transition time resulting in a short current. In the DVFS systems, it is possible to converter a sub-threshold region or near-threshold region input signal into a VDDH output signal. Therefore, exploiting a LCFF which can be operated in a ultra-low voltage region is very crucial. In this chapter, I will focus on the dual-edged triggered explicit-pulsed LCFF (DETEP-LCFF) because DETEP-LCFF benefits from saving without at expense of the operation speed. In [5.8], it proposed a DETEP-LCFF with a feedback signal, as Fig. 5.5 shown. The input data is sampled when *pulse* is high. Because the feedback signal Qb controlled the charge/discharge path, there is no redundant internal power dissipation when the input data is unchanged. However, when the input signal voltage is lower than the output signal voltage, the input data D can not fully turn off a PMOS transistor so that results in a short current path. In [5.9], it employed high-threshold voltage (HVT) devices and low-threshold voltage (LVT), as Fig. 5.6 shown. In the clock pulse generator architecture, all transistors are used LVT devices which lead to a severe leakage current problem. In addition, a PMOS transistor in the clock generator was connected to the ground and this causes a short current during the data transition. 78 In [5.10], it used a self-precharged technique in the clock pulse generator to avoid a short current, as Fig. 5.7 shown. Specially, the clock pulse was level up in the clock pulse generator. A high voltage pulse signal leads to more power consumption. However, in this architecture, the clock pulse will precharge the internal node every clock cycle. It results in a redundant power consumption when the data keeps unchanged.



Figure 5.6. Dual-edged triggered explicit-pulsed LCFF based on dual  $V_{th}$  [5.9]. (a) A clock pulse generator (b) LCFF employing dula  $V_{th}$ 



Figure 5.7. Dual-edged triggered explicit LCFF with a self-precharged gate dynamic gate [5.10]. (a) A clock pulse generator with a dynamic self-precharged gate(b) LCFF

In this chapter, we proposed a dual-edged triggered explicit-pulsed LCFF (DETEP-LCFF) which has a wide range operation from near-threshold region to super-threshold region. In our work, it is composed of a 4T-XOR clock pulse generator and a modified DCVSPG latch. From [5.6], we found that a DCVSPG pulsed [5.7] latch performs better than other pervious flip-flops. Therefore, we adapted a DCVSPG latch which was combined with the proposed level converter in Chapter 4 to provide a wide operation range.

IIII

# 5.2 A PVT robust dual-edged triggered explicit-pulsed LCFF with a wide operation range

The schematic view of the proposed dual-edge triggered explicit-pulsed LCFF is shown in Fig. 5.8. The clock pulse generator is adapted a 4T-XOR logic gate[5.11], shown in Fig. 5.8(a). The clock generator provides the symmetric pulse triggering time and pulse hold period at both of the clock edge. Due to the inherent property of the clock pulse generator, the proposed DETEP-LCFF has a negative setup time so that the impact of the clock skew and the clock jitter is eliminated. For the latch part, a DCVSPG latch is adapted in our work, shown in Fig. 5.8(b). There is an imbalanced current driving ability problem when the conventional DCVSPG latch is operated in near-threshold region or sub-threshold region. Diode-connected PMOS transistors are utilized to solve this problem. The multiple-threshold voltage devices are also employed in the modified DCVSPG latch. Two NMOS transistors are stacked below the diode PMOS transistors to provide a discharge path when the clock pulse window is close. This prevents the storage node from floating and makes the proposed DETEP-LCFF more robust and reliable.



Figure 5.8. Schematic view of the proposed dual-edged triggered explicit-pulsed LCFF. (a) A 4T-XOR clock pulse generator with symmetric setup time. (b) A modified DCVSPG latch providing a wide

operation range

### **5.2.1 Modified DVSPG Latch**

A conventional DCVSPG latch is based on a cross-coupled latch, as Fig. 5.9(a) shown. In an ultra-low voltage operation, the latch fails to work because of an imbalanced conduction current. To make the proposed DETEP-LCFF function correctly across the five process corners from 0°C to 125°C, two diode-connected PMOS transistors and multiple-threshold voltage CMOS devices are exploited. The

DCVSPG latch captures the input data while the pulse window is open. If the clock pulse window is close, the storage node ( $n_1$  or  $n_2$ ), which stays at zero voltage, is floating. Two NMOS transistors ( $MN_3,MN_4$ ) are connected below the diode PMOS transistors on each branch to provide a discharge path, as Fig. 5.9(b) illustrated.



### 5.2.1.1.1 Diode-connected PMOS transistors

Based on the conventional DCVSPG latch, we did a Monte Carlo simulation of conduction current. The result is presented in Fig. 5.10. From Fig. 5.10(a), the conduction current of a pull-up PMOS transistor is larger than a pull-down NMOS transistor. The ratio of pull up driving ability and pull down driving ability is larger than 100X. Because of the large gap between pull up strength and pull down strength, the conventional DCVSPG latch fail to work in an ultra-low input voltage. We use two diode PMOS transistors ,MP<sub>3</sub> and MP<sub>4</sub>, as a current limiter to reduce the pull up strength, as Fig. 5.9(b) shown. In Fig. 5.10(b), the PMOS conduction current is shifted to left so that the gap between pull up devices (PMOS) and the pull down devices  $\frac{83}{83}$ 

(NMOS) is apparently decreased a lot. Therefore, the pull down device can sink the conduction current, I, as Figure 5.9(b) shown. Two diode-connected PMOS transistors method makes the DCVSPG latch can be operated in an ultra-low voltage.



The cross-coupled PMOS latch is described in Fig. 5.9(a). When the node  $d_1$  changes from "1" to "0", a PMOS transistor, MP<sub>2</sub>, is turned on and charges the node  $d_2$ . Finally, a PMOS transistor, MP<sub>1</sub>, is turned off. After the three steps, the circuit has completed the latch procedure. The most important thing for triggering the latch is the first step-discharging one of the drain nodes. It is known that there is an imbalance conduction current problem in the conventional DCVSPG latch. In addition to using diode-connected PMOS transistors, the high-threshold voltage (HVT) devices are employed. HVT devices have less conduction current so that reduce driving ability. This helps to trigger the latch more easily. From Fig. 5.11, we can find that the conduction current of pull up PMOS is shifted to left, comparing to Figure 5.10(b). Additionally,  $\sigma/\rho$  is smaller when using HVT devices. According to the simulation

results, the pull down devices can overpower the pull up devices when operated in an ultra-low voltage .



Figure 5.11. Monte Carlo simulation of conduction current after using HVT PMOS transistors

### 5.2.1.2 Storage node floating

When the pulsed window is open, the input data is sampled and stored in the nodes,  $n_1$  and  $n_2$ , as Fig. 5.9(b) drawn. When the pulse window is close, one of the storage nodes becomes floating. Assumped that the input data (*D*) is "1" and the inverter of input data (*Db*) is "0". While the data is captured, the storage node  $n_1$  is "0" and the storage node  $n_2$  is "1". After the pulse window is close, node  $n_1$  is floating. From Fig. 5.12 (a), the floating node  $n_1$  is charged and then turn off MP<sub>2</sub>. Therefore, the storage node  $n_2$  may be discharged due to the leakage current. This results in a serious function error. In addition, floating node may lead to a short current in the following inverter. In order to prevent the storage node from floating, we connect two NMOS transistors to the storage nodes, MN<sub>3</sub> and MN<sub>4</sub>, as Fig. 5.9(b) shown. MN<sub>3</sub> is driven by the storage node  $n_2$  and MN<sub>4</sub> is controlled by the storage node  $n_1$ . Therefore, the storage node  $n_1$  is kept "0" because there is a discharging path even the pulse window is close. MP<sub>2</sub> is kept turned on and the storage node  $n_2$  will not be discharged.

However, MP<sub>3</sub> and MP<sub>4</sub> should be weak enough to maintain the correct function. They are employed HVT devices. From Fig. 5.2(b), the storage node  $n_1$  keeps staying "0" and the latch works correctly.



(b)

Figure 5.12. Waveform of the DCVSPG latch. (a) Without using two NMOS transistors (MP<sub>3</sub> and MP<sub>4</sub>). (b) With connecting two NMOS transistors (MP<sub>3</sub> and MP<sub>4</sub>).

### **5.2.2 Pulse Generator**

The clock pulse will be generated after the clock edge due to the propagation delay of the clock pulse generator,  $\Delta t$ , as Fig. 5.13 demonstrated. This characteristic makes the pulse-triggered LCFF has a zero or negative setup time property so that helps eliminate the impact of the clock skew or the clock jitter. In our work, the input data will be sampled not only at the positive clock edge but also at the negative clock edge. To maintain the same throughput of the other types of LCFFs, the clock frequency of the dual-edged triggered LCFFs can be reduce to be half so that the power of clock network is reduced approximately 40% [5.17]. The clock pulse generator is based on a XOR logic gate and is described in the next section.



### 5.2.2.1 XOR logic gate

From [5.11], we can find the relations between the symmetric setup time and the D-Q delay of DETEP-LCFF. If the clock pulse generator provides the symmetric clock pulse at both of the clock edge, the D-Q delay can be decreased. If the clock pulses is produced at each clock edge by the different propagation delay times, LCFF will capture a wrong input data. Having almost the same propagation path for each clock edge can resolve this problem. Also, the clock pulse window should be opened long enough for the DCVSPG latch to be triggered and store the input data. In

addition, the clock pulse holding time should be also symmetric. In order to have a sufficient time for triggering the DCVSPG latch, we use four inverters to make four different phases of clock. Take <u>*CLK*</u> and <u>*CLK3*</u> as two inputs of a XOR gate, as shown in Fig. 5.14 (a). The proposed clock pulse generator is adapted from [5.11]. Produced <u>*pulse*</u> is a low swing signal. At the positive clock edge,  $MN_2$  is turned on and pass the <u>*CLK4*</u> signal to output node. When <u>*CLK3*</u> is falling, MP1 is turned on and pass the <u>*CLK4*</u> signal. However, <u>*CLK4*</u> is changing after <u>*CLK3*</u> for an inverter delay. MP<sub>1</sub> and MN<sub>2</sub> may be switched on at the same time. To close the pulse window, MP<sub>1</sub> should be stronger than MN<sub>2</sub>. At the negative clock edge, MN<sub>1</sub> passes the <u>*CLK*</u> to produce the clock pulse. When <u>*CLK3*</u> is rising, MP<sub>2</sub> passes it to close the window. Also, there is a contention between MP<sub>2</sub> and MN<sub>1</sub>. MP<sub>2</sub> is designed stronger than MN<sub>1</sub>. At both of the clock edge, the clock pulse window is opened by <u>*CLK*</u> and is closed by <u>*CLK3*</u> so that the proposed clock pulse generator has a symmetric pulse triggering time and holding period.



Figure 5.14. Proposed clock pulse generator with a balance clock pulse at each clock edge. (a) Schematic view of the proposed clock generator (b) Timing diagram of the proposed clock pulse generator.

Fig. 5.15 illustrates another type of the transmission gate from [5.8]. In this architecture, take <u>CLK</u> and <u>CLK2</u> are as two input signals of a XOR logic gate. However, the pulse window is closed by the different path so that the pulse holding <sup>88</sup>

time is quite different at each clock edge. Fig. 5.16 demonstrates a pseudo-NMOS gate forming a clock pulse generator. This clock pulse generator uses LVT devices to speed up the propagation. Obviously, the triggering time of the clock generator is different. At the positive clock edge, the pulse is triggering right after the clock edge. However, the clock pulse at the negative clock edge is triggered after an inverter delay from the clock edge. Fig. 5.17 shows the performance comparisons of the clock generators among [5.8], [5.9], and our work. From Fig. 5.17(a), pseudo-NMOS type [5.9] has a larger difference of pulse triggering time at each clock edge about an inverter delay time. Our work can reduce the difference of the pulse triggering time up to 94%. From Fig. 5.17(b), transmission gate type [5.8] has a larger difference of pulse holding time at each clock edge due to a different propagation path. Relatively, the propose clock pulse generator decreased the difference of the pulse holding period up to 74%. In addition, the proposed clock pulse generator consumes less power, which is reduced approximately 42%, as Fig. 5.17(c) shown. The clock pulse generator in [5.9] has a larger power consumption due to a short current problem. According to the simulation results in Fig. 5.17, the proposed clock generator can provide the symmetric clock pulses.





Figure 5.16. Pseudo-NMOS gate [5.9] has the different pulse triggering time. (a) A clock pulse generator.

(b) Timing diagram of the clock pulse generator.



Figure 5.17. Performance comparisons of the clock pulse generator at VDDL=0.4V. (a) difference of triggering time (b) difference of hold period (c) power consumption

### **5.2.3 Optimal Operating Point**

The proposed DETEP-LCFF has a minimum input voltage is as low as 0.4V, which is in the near-threshold region. The product of delay and power (PDP) value is taken into consideration to find a optimal operation point. We sweep VDDL from 0.4V to 0.7V. The simulation shows that there is trade-off between the delay and the power consumption, as in Fig. 5.18. In a lower voltage operation, the power consumption is smaller but the propagation delay is longer, and vice versa. From Fig. 5.18(b), the optimal operation appears at 0.5V. The smallest PDP value is not at the lowest input voltage (0.4V) due to a penalty of a larger propagation delay.



Figure 5.18. Minimum operation point. (a) Minimum D-Q delay and power consumption (b) PDP value

### **5.2.4 Clock Pulse Generator Sharing Technique**

Im

One of the advantages of DETEP-LCFF is that the clock generator can be shared among many latches so that we can eliminate the overhead of the area and the power consumption. However, when the sharing number increases, the minimum D-Q delay becomes larger with increasing loading of the clock pulse generators. The power consumption of the latch has a different scenario. From Fig. 5.19(a), for the super-threshold region operation, the optimal sharing number is two due to the smallest power consumption. From Fig. 5.19(b), for the near-threshold region operation, when the clock pulse generator is shared with three latches, the latches consume less power. Therefore, the optimal sharing number is three. By the sharing technique, a set of the clock pulse generator can be shared with many latches so that the area overhead is reduced.

1111



Figure 5.19 Analysis of sharing technique. (a) in the super-threshold region operation (b) in the sub-threshold region operation



### **5.3** Performance Comparisons

For comparison, we implemented the following three dual-edged triggered explicit-pulsed LCFFs: feedback type [5.8], dual  $V_{th}$  type in [5.9] and self-precharged dynamic type in [5.10]. Iso-area analysis is used for the fair comparison.

### 5.3.1 Minimum Input Voltage

Comparing with the other three DETEP-LCFFs, the proposed DETEP-LCFF has a minimum input voltage, as Fig. 5.20 shown. We set VDDH to 1.0V at room temperature and swept the input voltage from 100mV to 1.0V. For a pulsed-triggered LCFF, the latch has a difficulty in operating in the ultra-low voltage. By using two diode-connected PMOS transistors in the DCVSPG latch, an imbalanced current problem is solved so that the proposed DETEP-LCFF can be operated in the near-threshold region through five corners from 0°C to  $125^{\circ}$ C. From the simulation, the proposed DETEP-LCFF has a wide operation range, from 0.4V to 1.0V.



Figure 5.20. Comparison of minimum input voltage
#### 5.3.2 Minimum D-Q Delay, Power, and PDP

The propagation delay for flip-flops is defined as the delay from data to output (D-Q). The D-Q delay includes the setup time and clock to output delay (CLK-Q). The minimum D-Q delay is corresponding to the optimum setup time. The simulation results are obtained under power supplies VDDH=1.0V and VDDL=0.7V, clock frequency of 50MHz, data switching activity of 50%, and the value of the capacitance load at Q is selected a fan out of four inverters. We choose VDDL=0.7V because all the DETEP-LCFFs can be operated in this supply voltage for the comparisons. The performance comparisons are drawn in Fig. 5.21. DETEP-LCFF with feedback [5.8] has a smallest min, D-Q delay but consumes more power. From Fig. 5.21(b), the proposed DETEP-LCFF reduce the power consumption about 52%. Table I lists the numerical results for different DETEP-LCFF. We set the ratio of the largest PDP value to 1, the other are compared with it. All of DETEP-LCFFs have a negative setup time. The proposed DETEP-LCFF has the smallest PDP ratio and is more energy-efficiently.



Figure 5.21. Performance comparison at VDDL=0.7V,  $25^{\circ}$ C, TT corner. (a) Minimum D-Q delay (b)

Power consumption (c) PDP value

| Table 5.1. Performance comparisons among DETEP-LCFFs at VDDL=0.7V, 25°C, TT corner |       |       |        |           |  |  |  |  |
|------------------------------------------------------------------------------------|-------|-------|--------|-----------|--|--|--|--|
| LCFF                                                                               | [5.8] | [5.9] | [5.10] | This work |  |  |  |  |
| Transistor count                                                                   | 20    | 27    | 33     | 28        |  |  |  |  |
| Min. D-Q delay (ps)                                                                | 213   | 570   | 421    | 407       |  |  |  |  |
| Setup time (ps)                                                                    | -10   | -70   | -40    | -40       |  |  |  |  |
| Power (µW)                                                                         | 9.85  | 6.87  | 7.04   | 4.68      |  |  |  |  |
| PDP(fJ)                                                                            | 2.10  | 3.92  | 2.96   | 1.91      |  |  |  |  |
| PDP ratio                                                                          | 0.4   | 1     | 0.76   | 0.36      |  |  |  |  |

#### 5.3.3 Power Analysis with Data Switching Activity

The conditional capturing technique was used to prevent the redundant internal switching in [5.8]. For high data switching activities, the conditional capturing method shows less benefit. Fig. 5.22 shows that the DETEP-LCFF in [5.8] consumes more power when the data switching activity is high. DETEP-LCFF in [5.9] has a short current problem when during the transition. When the data switching activity is high, LCFF [5.9] consumes more power. DETEP-LCFF in [5.10] precharges the internal node every clock edge. When the data activity is low, LCFF [5.10] has more power consumption. The proposed DETEP-LCFF consumes less power than other three type DETEP-LCFF on matter which data switching activity is.



Figure 5.22. Power analysis with data switching activity

#### **5.3.4 Monte Carlo Simulation- Data Error Rate**

Fig. 5.23 presents 5000-point Monte Carlo simulations of the data error rate with different input voltage. Monte Carlo simulation demonstrates how the process variations affect the LCFF characteristics. For the flip-flops, storing the right input

data is very critical. From the simulation result, our work has a data error rate of zero when the input voltage is above 0.4V. It is proven that the proposed DETEP-LCFF is more stable than other three DETEP-LCFFs when operated in a input voltage as low as the near-threshold region. The other three LCFFs is suitable for the super-threshold region operation.



#### 5.4 Conclusions

A power-delay-product optimized and robust dual-edged triggered explicit-pulsed level converting flip-flop is presented. By combining energy-efficient techniques, the power dissipation of this work is decreased by 52%. The clock pulse generator has a symmetric clock pulses at both of the clock edge and provides a sharing technique. The performance summary of the proposed dual-edged triggered explicit-pulsed level converting flip-flop is given in Table 5.2. The performance comparisons with [5.8]-[5.9] are listed in Table I. This work provides a wide operation range, from 0.4V to 1V across five process corners. When the input voltage is 0.4V, it can achieve a minimum D-Q delay of 781ps, a setup time of -610ps and consume only

2.3µW. It is suitable to be the interface of two different voltage domains in emerging dynamic voltage frequency scaling wireless applications.



Figure 5.24 Layout view of the proposed DETEP LCFF.

| VDDL | Setup time (ps) | Hold time (ns) | Min. D-Q   | Power (µW) | PDP (fJ) |  |  |
|------|-----------------|----------------|------------|------------|----------|--|--|
|      |                 |                | delay (ps) | -2 -       |          |  |  |
| 0.4V | -610            | 1.1            | 781        | 2.30       | 1.80     |  |  |
| 0.5V | -140            | 0.34           | 424        | 2.80       | 1.19     |  |  |
| 0.6V | -70             | 0.18           | 393        | 3.58       | 1.41     |  |  |
| 0.7V | -40             | 0.12           | 407        | 4.68       | 1.91     |  |  |
|      |                 |                |            |            |          |  |  |
|      |                 |                |            |            |          |  |  |
|      |                 |                |            |            |          |  |  |
|      |                 |                |            |            |          |  |  |
|      |                 |                |            |            |          |  |  |
|      |                 |                |            |            |          |  |  |

Table 5.2. Performance summary of the proposed DETEP-LCFF

## **Chapter 6**

## **Conclusion and Future Work**

#### **6.1 Conclusion**

A wide range DLL-based multiphase clock generator is proposed. The operating range is from 80 MHz to 500MHz. With the proposed harmonic detection circuit, the proposed multiphase clock generator is free from the harmonic problem. The delay block controller has the dual modes- SAR mode and counter mode. The SAR mode helps to accelerate the lock in speed and the counter mode keeps the proposed work tacking the environmental variations when finishing the SAR search. When the input voltage and frequency are 1.0V and 500MHz, the power consumption is 0.29mW. With the proposed duty cycle corrector, the clock signal has a 50% duty cycle. The proposed duty cycle can be operated as low as 0.5V. The correction range is from 25% to 75%. The operation range is 100MHz to 500MHz. With the PVT detection, the output duty cycle error can be reduced up to 17%. When the input voltage and frequency are 0.5V and 167MHz, the power consumption is 26.30  $\mu$ W.

A power-delay-product optimized and robust level converter is presented for sub-threshold to super-threshold signal converting. By combining energy-efficient techniques, PDP value of this work is decreased by 23%. Temperature induced variation on propagation delay is reduced up to 99%. This work provides a wide operation range, from 150mV to 1V across five process corners. When the input voltage is 150mV, it can achieve a propagation delay of 52ns and consume only 21nW.

A PVT robust dual-edged triggered explicit-pulsed level converting flip-flop is presented. By combining energy-efficient techniques, the power dissipation of this work is decreased by 52%. The clock pulse generator has a symmetric clock pulses at both of the clock edge and provides a sharing technique. When the input voltage is 0.4V, it can achieve a minimum D-Q delay of 781ps, a setup time of -610ps and consume only  $2.3\mu$ W. It is suitable to be the interface of two different voltage domains in emerging dynamic voltage frequency scaling wireless applications.

#### **6.2 Future Work**

System heterogeneity offered by 3D integration usually requires different supply voltages for different function blocks, ranging from high (3.3V or higher) to ultra-low (sub-threshold operation) voltages. The multiple voltages requirement can be achieved by adopting the proposed DVFS system, shown in Fig. 6.1. The first layer signal TSVs are connected to clock source. The clock signal passes through TSVs to the different layers. In each layer, the proposed DLL-based clock generator produces the multiple frequency to meet each layer requirement. The clock drivers are followed by the proposed duty cycle corrector. The proposed level converters or level converting flip-flops are inserted between the different voltage domain.



# **Bibliography**

Chapter 1.

- [1.1] S. K. Gupta, A. Raychowdhury, and K. Roy, "Digital computation in subthreshold region for ultralow-power operation: a device-circuit-architecture codesign perspective," *Proceeding of the IEEE*, vol.98, no.2, pp.160-190, Feb. 2010.
- [1.2] J.-C. Chi, H.-H. Lee, S.-H. Tsai, and M.-C. Chi, "Gate Level Multiple Supply Voltage Assignment Algorithm for Power Optimization Under Timing Constraint" *IEEE Transactions on Very Large Scale Integration Systems*, Vol. 15, No. 6, pp.637-648, June 2007.
- [1.3] M. E. Salehi, M. Samadi, M. Najibi, A. Afzali-Kusha, M. Pedram, and S.M. Fakhraie, "Dynamic Voltage and Frequency Scheduling for Embedded Processors Considering Power/Performance Tradeoffs," *IEEE Transactions on Very Large Scale Integration Systems*, vol.19, no.10, pp.1931-1935, Oct. 2011.

Inn

Chapter 2.

- [2.1] J. Masuch and M. Delgado-Restituto, "A 350 µW 2.3 GHz integer-N frequency synthesizer for body area network applications," *IEEE Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems*, pp.105-108, Jan. 2011.
- [2.2] K.-H. Cheng, Y.-C. Tsai, Y.-L. Lo, and J.-S. Huang, "A 0.5-V 0.4–2.24-GHz Inductorless Phase-Locked Loop in a System-on-Chip," *IEEE Transactions* on Circuits and Systems I: Regular Papers, vol.58, no.5, pp.849-859, May 2011.
- [2.3] D. Tasca, M. Zanuso, G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, "A 2.9–4.0-GHz Fractional-N Digital PLL With Bang-Bang Phase Detector and 560fsrms Integrated Jitter at 4.5-mW Power," *IEEE Journal of Solid-State Circuits*, vol.46, no.12, pp.1-14, Dec. 2011.
- [2.4] C. Jaehyouk, S. T. Kim, W. Kim, K.-W. Kim, K. Lim, and J. Laskar, "A Low Power and Wide Range Programmable Clock Generator With a High Multiplication Factor," *IEEE Transactions on Very Large Scale Integration Systems*, vol.19, no.4, pp.701-705, April 2011.
- [2.5] J. Moon and H.-Y. Lee, "A dual-loop delay locked loop with multi digital delay lines for GHz DRAMs," *IEEE International Symposium on Circuits and Systems*, pp.313-316, May 2011.
- [2.6] J. Sungchun, H. Song, S. Ye, and D.-K. Jeong, "A 13.8mW 3.0Gb/s clock-embedded video interface with DLL-based data-recovery circuit," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp.450-452, Feb. 2011.
- [2.7] Chaodong Ling, Miaoyi Luo, and Mengzhang Cheng, "A Low Noise CMOS Phase Locked Loop," *IEEE International Conference of Anti-counterfeiting*, *Security, and Identification In Communication*, pp.343-346, Aug. 2009.
- [2.8] Jiwie Huang, Liang Tao, and Zhengpin Li, "A Low-Jitter and Low-Power Clock Generator," *IEEE International Conference on Solid-State and Integrated Circuit Technology*, pp. 385-387, Nov. 2010.
- [2.9] D. J. Foley and M. P. Flynm, "CMOS DLL-based 2-V 3.2-ps jitter 1-GHz clock synthesizer and temperature-compensated tunable oscillator," *IEEE Journal of Solid-State Circuits*, vol.36, no.3, pp.417-423, March 2001.

- [2.10] G. Chien and P. R. Gray, "A 900-MHz local oscillator using a DLL-based frequency multiplier technique for PCS applications," *IEEE Journal of Solid-State Circuits*, vol.35, no.12, pp.1996-1999, Dec 2000.
- [2.11] W.-M. Lin, C.-C. Chen, and S.-I. Liu, "An all-digital clock generator for dynamic frequency scaling," *International Symposium on VLSI Design*, *Automation and Test*, pp.251-254, April 2009.
- [2.12] C.-K. Liang, R.-J. Yang, and S.-I. Liu, "An All-Digital Fast-Locking Programmable DLL-Based Clock Generator," *IEEE Transactions on Circuits* and Systems I: Regular Papers, vol.55, no.1, pp.361-369, Feb. 2008.
- [2.13] M. Faisal, and M. A. Bayoumi, "A low-area, low-power programmable frequency multiplier for DLL based clock synthesizers," *IEEE International Symposium on Circuits and Systems*, pp.1460-1463, May 2008.
- [2.14] J. Koo, S. Ok, and C. Kim, "A Low-Power Programmable DLL-Based Clock Generator With Wide-Range Antiharmonic Lock," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol.56, no.1, pp.21-25, Jan. 2009.
- [2.15] P.-C. Huang, C.-J. Shih, Y.-C. Tsai, and K.-H. Cheng, "A phase error calibration DLL with edge combiner for wide-range operation," *IEEE International New Circuits and Systems Conference*, pp.1-4, June 2011.
- [2.16] C.-S. Hwang, P. Chen, and H.-W. Tsao, "A wide-range and fast-locking clock synthesizer IP based on delay-locked loop," *IEEE Proceedings of the International Symposium on Circuits and Systems*, pp. 785-7888, May 2004.
- [2.18] R. Farjad-Rad, W. Dally, H.-T. Ng, R. Senthinathan, M.-J.E. Lee, R. Rathi, and J. Poulton, "A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips," *IEEE Journal of Solid-State Circuits*, vol.37, no.12, pp. 1804- 1812, Dec 2008.
- [2.19] M. Combes, K. Dioury, and A. Greiner, "A portable clock multiplier generator using digital CMOS standard cells," *IEEE Journal of Solid-State Circuits*, vol.31, no.7, pp.958-965, July 1996.

- [2.20] H.-C. Kang, K.-H. Ryu, D.-H. Lee, W. Lee, S.-H. Kim, J.- R. Choi, and S.-O. Jung, "Process variation tolerant all-digital multiphase DLL for DDR3 interface," *IEEE Custom Integrated Circuits Conference*, pp.1-4, 1 Sept. 2010.
- [2.21] H. Kang, K. Ryu, D.-H. Jung, D. Lee, W. Lee, S. Kim, J. Choi, and S.-O. Jung,
   "Process Variation Tolerant All-Digital 90 degree Phase Shift DLL for DDR3 Interface," *IEEE Transactions on Circuits and Systems I: Regular Papers*, no.99, pp.1-11, 2012
- [2.22] Y.-G. Chen, H.-W. Tsao, and C.-S. Hwang, "A Fast-Locking All-Digital Deskew Buffer With Duty-Cycle Correction," *IEEE Transactions on Very Large Scale Integration Systems*, no.99, pp.1-11, Feb. 2012.
- [2.23] J.-W. Ke, S.-Y. Huang, and D.-M. Kwai, "A high-resolution all-digital duty-cycle corrector with a new pulse-width detector," *IEEE International Conference of Electron Devices and Solid-State Circuits*, pp.1-4, Dec. 2010.
- [2.24] S.-K. Kao and S.-I. Liu, "A Wide-Range All-Digital Duty Cycle Corrector with a Period Monitor," *IEEE Conference on Electron Devices and Solid-State Circuits*, pp.349-352, Dec. 2007.
- [2.25] Y.-M. Wang, J.-T. Yu; Y. Surya, and C.-H. Huang, "A compact delay-recycled clock skew-compensation and/or duty-cycle-correction circuit, " *IEEE International SOC Conference*, pp.42-47, Sept. 2011.
- [2.26] J. Gu, J. Wu, D. Gu, M. Zhang, and L. Shi, "All-Digital Wide Range Precharge Logic 50% Duty Cycle Corrector," *IEEE Transactions on Very Large Scale Integration Systems*, vol.20, no.4, pp.760-764, April 2012
- [2.27] D. Shin, J. Song, H. Chae, K.-W. Kim, Y.-J. Choi, and C. Kim, "A 7ps-Jitter 0.053mm2 Fast-Lock ADDLL with Wide-Range and High-Resolution All-Digital DCC, "*IEEE Journal of Solid-State Circuits*, vol.44, no.9, pp.2437-2451, Sep. 2009.
- [2.28] S.-K. Kao and S.-I. Liu, "All-Digital Fast-Locked Synchronous Duty-Cycle Corrector," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol.53, no.12, pp.1363-1367, Dec. 2006.
- [2.29] B.-J. Chen, S.-K. Kao, and S.-I. Liu, "An All-Digital Duty Cycle Corrector," *International Symposium on VLSI Design, Automation and Test*, pp.1-4, April 2006.

- [2.30] Y.-J. Min, C.-H. Jeong, K.-Y. Kim, W.- H. Choi, J.-P. Son, C. Kim, and S.-W. Kim, "A 0.31–1 GHz Fast-Corrected Duty-Cycle Corrector With Successive Approximation Register for DDR DRAM Applications," *IEEE Transactions on Very Large Scale Integration Systems*, vol.20, no.8, pp.1524-1528, Aug. 2012.
- [2.31] K.-S. Song, C.-H. Koo, N.-K. Park, K.-W. Kim, Y.-J. Choi, J.-H. Ahn, and B.-T. Chung, "A single-loop DLL using an OR-AND duty-cycle correction technique," *IEEE Asian Solid-State Circuits Conference*, pp.245-248, Nov. 2008.
- [2.32] P. Chen, S.-W. Chen, and J.-S. Lai, "A low power wide range duty cycle corrector based on pulse shrinking/stretching mechanism," *IEEE Asian Solid-State Circuits Conference*, pp.460-463, Nov. 2007.
- [2.33] K.-H. Cheng, C.-W. Su, and K.-F. Chang, "A High Linearity, Fast-Locking Pulsewidth Control Loop With Digitally Programmable Duty Cycle Correction for Wide Range Operation," *IEEE Journal of Solid-State Circuits*, vol.43, no.2, pp.399-413, Feb. 2008.
- [2.34] H. Soeleman and K. Roy, "Ultra-low power digital subthreshold logic circuits," *Proceeding of International Symposium on Low Power Electronics* and Design, pp. 94-96, 1999.
- [2.35] M. Hamada, M. Takahashi, H. Arakida, A. Chiba, T. Terazawa, T. Ishikawa, M. Kanazawa, M. Igarashi, K. Usami, and T. Kuroda, "A Top-Down Low Power Design Technique Using Clustered Voltage Scaling With Variable Supply-Voltage Scheme," *Proceeding of IEEE Custom Integrated Circuits Conference*, pp. 495-498, May 1998.
- [2.36] A .Chavan and E. MacDonald, "Ultra Low Voltage Level Shifters to Interface Sub and Super Threshold Reconfigurable Logic Cells," *IEEE Aerospace Conference*, pp.1-6, March 2008.
- [2.37] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand, T. Austin, D. Sylvester, and D. Blaauw, "Energy-Efficient Subthreshold Processor Design," *IEEE Transactions on Very Large Scale Integration Systems*, vol.17, no.8, pp.1127-1137, Aug. 2009.

- [2.38] I.- J. Chang, J.-J. Kim, and K. Roy, "Robust Level Converter Design for Sub-threshold Logic," *Proceedings of the 2006 International Symposium on Low Power Electronics and Design*, pp.14-19, Oct. 2006.
- [2.39] H. Shao and C.-Y. Tsui, "A robust, input voltage adaptive and low energy consumption level converter for sub-threshold logic," *European Solid State Circuits Conference*, pp.312-315, Sept. 2007.
- [2.40] S. N. Wooter, B.H. Calhoun, and T.N. Blalock, "An Energy-Efficient Subthreshold Level Converter in 130-nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol.57, no.4, pp.290-294, April 2010.
- [2.41] S. Ali, S. Tanner, and P. A. Farine, "A robust, low power, high speed voltage level shifter with built-in short circuit current reduction," *European Conference on Circuit Theory and Design*, pp.142-145, Aug. 2011.
- [2.42] A. Hasanbegovic and S. Aunet, "Low-power subthreshold to above threshold level shifter in 90 nm process," NORCHIP, pp.1-4, Nov. 2009.
- [2.43] S. Lütkemeier and U. Rückert, "A Subthreshold to Above-Threshold Level Shifter Comprising a Wilson Current Mirror," *IEEE Transactions on Circuits* and Systems II: Express Briefs, vol.57, no.9, pp.721-724, Sept. 2010.
- [2.44] M. Ashouei, H. Luijmes, J. Stuijt, and J. Huisken, "Novel wide voltage range level shifter for near-threshold designs," *IEEE International Conference on Electronics, Circuits, and Systems*, pp.285-288, Dec. 2010.
- [2.45] I-. J. Chang, J.-J. Kim, K.-J. Kim and K. Roy, "Robust Level Converter for Sub-Threshold/Super-Threshold Operation:100 mV to 2.5 V, " *IEEE Transactions on Very Large Scale Integration Systems*, vol.19, no.8, pp.1429-1437, Aug. 2011.
- [2.46] B. Zhang, L. Liang, and X. Wang, "A New Level Shifter with Low Power in Multi-Voltage System," *International Conference on Solid-State and Integrated Circuit Technology*, pp.1857-1859, 2006.
- [2.47] K.-H. Koo, J.-H. Seo, M.-L. Ko, and J.-W. Kim, "A new level-up shifter for high speed and wide range interface in ultra deep sub-micron," *IEEE International Symposium on Circuits and Systems*, vol.2, pp. 1063- 1065, May 2005.

- [2.48] Y.-S. Lin and D.M. Sylvester, "Single stage static level shifter design for subthreshold to I/O voltage conversion," *IEEE International Symposium on Low Power Electronics and Design*, pp.197-200, 11-13 Aug. 2008.
- [2.49] A. Chandrakasan, S. Sheng, and R.W. Brodersen, "Low-Power CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, vol. 27, pp.473-484, Apr. 1992.
- [2.50] F. Ishihara, F. Sheikh, and B. Nikolic, "Level Conversion for Dual-Supply Systems," *IEEE Transactions on Very Large Scale Integration Systems*, vol.12, no.2, pp. 185-195, Feb. 2004.
- [2.51] M. M. Hamid and K. Roy, "Self-Precharging Flip-Flop: A New Level Converting Flip-Flop," *Proceedings of European Slid-State Circuits Conference*, pp. 407-410, Sept. 2002.
- [2.52] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-Capture Flip-Flop for Statistical Power Reduction," *IEEE Journal of Solid-Sate Circuits*, vol.36, no.8, pp. 1263-1271, Aug. 2001.
- [2.53] N. Nedovic and V. G. Oklobdzija, "Dual-Edged Triggered Storage Elements and Clocking Strategy for Low-Power Systems," *IEEE Transactions on Very Large Scale Integration Systems*, vol.13, no.5, pp. 577-590, May 2005.
- [2.54] H. S. Park, H. B. Che, W. Kim, and Y. H. Kim, "High Performance Level-Converting Flip-Flop with a Simple Pulse Generator and a Fast Latch,"*International Technical Conference on Circuits/Systems, Computer and Communications*, pp. 561-564, 2008.
- [2.55] P. Zhao, G. P. Kumar, C. Archana, and M. Bayoumi, "A Double-Edge Implicit-Pulsed Level Convert Flip-Flop," *Proceedings of IEEE Symposium* on Computer society Annual, pp. 141- 144, Feb. 2004.
- [2.56] P. Zhao, J.B. McNeely, P.K. Golconda, S. Venigalla, N. Wang, M.A. Bayoumi, W. Kuang, and L. Downey, "Low-Power Clocked-Pseudo-NMOS Flip-Flop for Level Conversion in Dual Supply Systems," *IEEE Transactions on Very Large Scale Integration Systems*, vol.17, no.9, pp.1196-1202, Sept. 2009
- [2.57] L.-Y. Chiou and S.-C. Lou, "An Energy-Efficient Dual-Edge Triggered Level-Converting Flip-Flop," *IEEE International Symposium on Circuits and Systems*, pp.1157-1160, May 2007.
- [2.58] L.-Y. Chiou and S.-C. Luo, "Energy-Efficient Dual-Edge-Triggered Level Converting Flip Flops With Symmetry in Setup Times and Insensitivity to

Output Parasitics," *IEEE Transactions on Very Large Scale Integration Systems*, vol.17, no.11, pp.1659-1663, Nov. 2009.

- [2.59] H. Mahmoodi-Meimand and K. Roy, "Dual-edge triggered level converting flip-flops," *Proceedings of International Symposium on Circuits and Systems*, vol.2, pp. 661-664, May 2004.
- [2.60] A.-S. Seyedi and A. Afzali-Kusha, "Double-edge Triggered Level Converter Flip-Flop with Feedback," *International Conference on Microelectronics*, pp.44-47, Dec. 2006.
- [2.61] Q.-X. Wang, Y.-S. Xia, and L.-Y. Wang, "Dual-Vth based double-edge explicit-pulsed level-converting flip-flops," *International Conference on Electronics, Communications and Control*, pp.837-840, Sept. 2011.



Chapter 3.

- [3.1] A. Shibayama, K. Nose, Sunao Torii, M. mizuno, and M. Edahiro, "Skew-Tolerant global synchronization based on periodically al-in-phase clocking for Multi-Core SOC platforms," *IEEE Symposium on VLSI Circuits*, pp.158-159, June 2007.
- [3.2] J.-H. Kim, Y.-H. Kwak, M.-Y. Kim, S.-W. Kim, and C. Kim, "A 120MHz-1.8GHz CMOS DLL-Based clock generator for dynamic frequency scaling," *IEEE Journal of Solid-State Circuits*, vol.41, no.9, pp.2077-2082, Sept. 2006.
- [3.3] W.-M. Lin, C.-C. Chen, and S.-I. Liu, "An All-Digital Clock Generator for Dynamic Frequency Scaling," *International Symposium on VLSI Design*, *Automation and Test*, pp.251-254, April 2009.
- [3.4] Sunghwa Ok, Kyunghoon Chung, Jabeom Koo, and Chulwoo Kim, "An Antiharmonic, Programmable, DLL-Based Frequency Multiplier for Dynamic Frequency Scaling, "IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.18, no.7, pp.1130-1134, July 2010.
- [3.5] P.-K. Hanumolu, G.-Y. Wei, and U.-K. Moon, "A Wide-Tracking Range Clock and Data Recovery Circuit," *IEEE Journal of Solid-State Circuits*, vol.43, no.2, pp.425-439, Feb. 2008.
- [3.6] C.-C. Chung, P.-L. Chen, and C.-Y. Lee, "An All-Digital Delay-Locked Loop for DDR SDRAM Controller Applications," *International Symposium on VLSI Design, Automation and Test*, pp.1-4, April 2006.
- [3.7] R.-J.Yang and S.-I. Liu, "A 40–550 MHz Harmonic-Free All-Digital Delay-Locked Loop Using a Variable SAR Algorithm," *IEEE Journal* of Solid-State Circuits, vol.42, no.2, pp.361-373, Feb. 2007.
- [3.8] Y.-J. Jeon, J.-H. Lee, H.-C. Lee, K.-W. Jin, K.-S. Min, J.-Y. Chung, H.-J. Park, 112

"A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs, "*IEEE Journal of Solid-State Circuits*, vol.39, no.11, pp. 2087- 2092, Nov. 2004.

- [3.9] H. Sutoh, K. Yamakoshi, M. Ino, "Circuit technique for skew-free clock distribution," *Proceedings of the IEEE Custom Integrated Circuits Conference*, pp.163-166, May 1995.
- [3.10] C.-C. Chung and C.-Y. Lee, "A New DLL-Based Approach for All-Digital Multiphase Clock Generation," *IEEE Journal of Solid-State Circuits*, vol. 39, no.3, pp. 469–475, March 2004.
- [3.11] G.-K. Dehng, J.-M. Hsu, C.-Y. Yang, and S.-I. Liu, "Clock-deskew buffer using a SAR-controlled delay-locked loop," *IEEE Journal of Solid-State Circuits*, vol.35, no.8, pp.1128-1136, Aug. 2000
- [3.12] C.-K. Liang, R.-J. Yang, and S.-I. Liu, "An All-Digital Fast-Locking Programmable DLL-Based Clock Generator," *IEEE Transactions on Circuits* and Systems I: Regular Papers, vol.55, no.1, pp.361-369, Feb. 2008.
- [3.13] s.-K. Kao and S.-I. Liu, "All-Digital Fast-Locked Synchronous Duty-Cycle Corrector, "IEEE Transactions on Circuits and Systems II: Express Briefs, vol.53, no.12, pp.1363-1367, Dec. 2006.
- [3.14] D. Shin, J. Song, H. Chae, and C. Kim, "A 7 ps jitter 0.053 mm fast lock all-digital DLL with a wide range and high resolution DCC," *IEEE Journal* of Solid-State Circuits, vol.44, no.9, pp.2437-2451, Sept. 2009.

Chapter 4.

- [4.1] J.-C. Chi, H.-H. Lee, S.-H. Tsai, and M.-C. Chi, "Gate Level Multiple Supply Voltage Assignment Algorithm for Power Optimization Under Timing Constraint" *IEEE Transactions on Very Large Scale Integration Systems*, Vol. 15, No. 6, pp.637-648, June 2007.
- [4.2] Hui Shao and Chi-Ying Tsui, "A Robust, Input Voltage Adaptive and Low Energy Consumption Level Converter for Sub-threshold Logic," *European Solid State Circuits Conference*, pp.312-315, Sept. 2007.
- [4.3] S. Ali, S. Tanner, and P.A. Farine, "A Robust, Low Power, High Speed Voltage Level Shifter with Built-in Short Circuit Current Reduction," *IEEE European Conference on Circuit Theory and Design*, pp. 142-145, Aug. 2004.
- [4.4] S.N. Wooters, B.H. Calhoun, and T.N. Blalock, "An Energy-Efficient Subthreshold Level Converter in 130-nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, Vol.57, No. 4, pp. 290-294, April 2010.
- [4.5] S. Lütkemeier and U. Rückert, "A Subthreshold to Above-Threshold Level Shifter Comprising a Wilson Current Mirror," *IEEE Transactions on Circuits* and Systems II: Express Briefs, Vol. 57, No. 9, pp.721-724, Sept. 2010.
- [4.6] A. Hasanbegovic and S. Aunnet, "Low-Power Subthreshold to Above Threshold Level Shifter in 90 nm Process," NORCHIP, pp.1-4, Nov. 2009.
- [4.7] I.-J. Chang; J.-J. Kim; K. Kim, and K. Roy, "Robust Level Converter for Sub-Threshold/Super-Threshold Operation: 100 mV to 2.5 V," *IEEE Transactions on Very Large Scale Integration Systems*, Vol.19, No.8, pp.1429-1437, Aug. 2011.
- [4.8] Y.-S. Lin and D. Sylvester, "Single Stage Static Level Shifter Design for Subthreshold to I/O Voltage Conversion," *IEEE International Symposium on* 114

Low Power Electronics and Design, pp. 197-2000, Aug. 2008.

- [4.9] Y. Chavan and E. MacDonald, "Ultra Low Voltage Level Shifter to Interface Sub and Super Threshold Reconfigurable Logic Cells," *IEEE Aerospace Conference*, pp. 1-6, March 2008.
- [4.10] K.-H. Koo, J.-H. Seo, M.-L. Ko, and J.-W. Kim, "A New Level-Up Shifter for High Speed and Wide Range Interface in Ultra Deep Sub-Micron," *IEEE International Symposium on Circuits and Systems*, Vol. 2, pp. 1063-1065, May 2005.
- [4.11] K. Agarwal, V. Venkateswarlu, and D. Anvekar, "A Level Shifter Deep-Submicron node using Multi-Threshold Technique," *IEEE Recent Advances in Intelligent Computational Systems*, pp. 925-929, Spet. 2011.
- [4.12] Fang-Shi Lai and Wei Hwang, "Design and implementation of differential cascode voltage switch with pass-gate (DCVSPG) logic for high-performance digital systems," *IEEE Journal of Solid-State Circuits*, Vol. 32, No. 4, pp. 563-573, Apr. 1997.
- [4.13] T.-H. Kim, H. Eom, J. Keane, and C. Kim, "Utilizing Reverse Short Channel Effect for Optimal Subthreshold Circuit Desing," *IEEE International Symposium on Low Power Electronics and Design*, pp. 127-130, Oct. 2006.

Chapter 5.

- [5.1] M. Hamada, M. Takahashi, H. Arakida, A. Chiba, T. Terazawa, T. Ishikawa, M. Kanazawa, M. Igarashi, K. Usami, and T. Kuroda, "A top-down low power design technique using clustered voltage scaling with variable supply-voltage scheme," *Proceedings of the IEEE Custom Integrated Circuits Conference*, pp.495-498, May 1998.
- [5.2] B. Amelifard, A. Afzali-Kusha, and A. Khadernzadeh, "Enhancing the efficiency of cluster voltage scaling technique for low-power application," *IEEE International Symposium on Circuits and Systems*, vol.2, pp. 1666- 1669, May 2005.
- [5.3] Y.-J. Yeh and S.-Y. Kuo, "An optimization-based low-power voltage scaling technique using multiple supply voltages," *IEEE International Symposium on Circuits and Systems*, vol.5, pp.535-538, May 2001.
- [5.4] K.-L. Tsai, J.-Y. Lee, S.-J. Ruan, and F. Lai, "Low power scheduling method using multiple supply voltages," *Proceedings of IEEE International Symposium on Circuits and Systems*, pp.5295-5298, May 2006
- [5.5] M. E. Salehi, M. Samadi, M. Najibi, A. Afzali-Kusha, M. Pedram, and S.M. Fakhraie, "Dynamic Voltage and Frequency Scheduling for Embedded Processors Considering Power/Performance Tradeoffs," *IEEE Transactions on Very Large Scale Integration Systems*, vol.19, no.10, pp.1931-1935, Oct. 2011
- [5.6] P.-T. Huang, X.-R. Lee, H.-C. Chang, C.-Y. Lee, and W. Hwang, "A Low Power Differential Cascode voltage Switch with Pass Gate Pulsed Latch for Viterbi Decoder," *Journal of Low Power Electronics*, vol.6, no.4, pp. 551-562, 2010.

- [5.7] F.-S. Lai and W. Hwang, "Design and implementation of differential cascode voltage switch with pass-gate (DCVSPG) logic for high-performance digital systems," *IEEE Journal of Solid-State Circuits*, vol.32, no.4, pp.563-573, April 1997.
- [5.8] A.-S. Seyedi and A. Afzali-Kusha, "Double-edge Triggered Level Converter Flip-Flop with Feedback," *International Conference on Microelectronics*, pp.44-47, Dec. 2006
- [5.9] Q.-X. Wang, Y.-S. Xia, and L.-Y. Wang, "Dual-Vth based double-edge explicit-pulsed level-converting flip-flops," *International Conference on Electronics, Communications and Control*, pp.837-840, Sept. 2011.
- [5.10] H. Mahmoodi-Meimand and K. Roy, "Dual-edge triggered level converting flip-flops," Proceedings of the International Symposium on Circuits and Systems, vol.2, pp. 661-664, May 2004.
- [5.11] L.-Y. Chiou and S.-C. Lou, "An Energy-Efficient Dual-Edge Triggered Level-Converting Flip-Flop," *IEEE International Symposium on Circuits and Systems*, pp.1157-1160, May 2007.
- [5.12] H. Mahmoodi-Meimand and K. Roy, "Self-Precharging Flip-Flop: A New Level Converting Flip-Flop," *Proceedings of European Slid-State Circuits Conference*, pp. 407-410, Sept. 2002.
- [5.13] H. S. Park, H. B. Che, W. Kim, and Y. H. Kim, "High Performance Level-Converting Flip-Flop with a Simple Pulse Generator and a Fast Latch,"*International Technical Conference on Circuits/Systems, Computer and Communications*, pp. 561-564, 2008.
- [5.14] P. Zhao, G. P. Kumar, C. Archana, and M. Bayoumi, "A Double-Edge Implicit-Pulsed Level Convert Flip-Flop," *Proceedings of IEEE Symposium on Computer society Annual*, pp. 141- 144, Feb. 2004.

- [5.15] P. Zhao, J.B. McNeely, P.K. Golconda, S. Venigalla, N. Wang, M.A. Bayoumi,
  W. Kuang, and L. Downey, "Low-Power Clocked-Pseudo-NMOS Flip-Flop for Level Conversion in Dual Supply Systems," *IEEE Transactions on Very Large Scale Integration Systems*, vol.17, no.9, pp.1196-1202, Sept. 2009.
- [5.16] F. Ishihara, F. Sheikh, and B. Nikolic, "Level Conversion for Dual-Supply Systems," *IEEE Transactions on Very Large Scale Integration Systems*, vol.12, no.2, pp. 185-195, Feb. 2004.
- [5.17] P.E. Gronowski, W.J. Bowhill, R.P. Preston, M.K. Gowan, and R.L. Allmon, "High-performance microprocessor design," *IEEE Journal of Solid-State Circuits*, vol.33, no.5, pp.676-686, May 1998.



# Vita

### 陳美維 Mei-Wei Chen

#### PERSONAL INFORMATION

Birth Date: January 13, 1986

Birth Place: Tainan, TAIWAN

E-Mail Address: waverly.ee95@gamil.com

#### **EDUCATION**

09/2010 – 09/2012 M.S. in Electronics Engineering, National Chiao Tung University Thesis: Design of multiphase clocking and level conversion for ULV DVFS systems

09/2006 – 06/2010 B.S. in Electronics Engineering, National Chiao Tung University

#### **PUBLICATIONS**

- Ming-Hung Chang, Chung-Ying Hsieh, Mei-Wei Chen, and Wei Hwang, "Logical Effort models with voltage and temperature extensions in super-/near-/sub-threshold regions," *IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT)*, pp.1-4, April 2011.
- Ming-Hung Chang, Chung-Ying Hsieh, Mei-Wei Chen, and Wei Hwang, "Near-/sub-threshold DLL-based clock generator with PVT-aware locking range compensation," *IEEE International Symposium on Low Power Electronics and Design (ISLPED)*, pp.15-20, Aug. 2011.
- Mei-Wei Chen, Ming-Hung Chang, Yuan-Hua Chu, and Wei-Hwang, "An Energy-Efficient Level Converter with High Thermal Variation Immunity for Sub-threshold to Super-threshold Operation" *IEEE International Conference System-on-Chip (SOCC)*, Sep. 2012. (accepted)