## 國立交通大學

電子工程學系電子研究所 博士論文

應用於系統晶片之低功率

全數位式時脈產生器

# Low-Power All-Digital Clock Generators for SoC Applications

研究生:盛 鐸

指導教授:李鎮宜博士

中華民國九十九年六月

# 應用於系統晶片之低功率 全數位式時脈產生器

# Low-Power All-Digital Clock Generators for SoC Applications

研究生: 盛 鐸 Student: Duo Sheng

指導教授: 李鎮宜博士 Advisor: Dr. Chen-Yi Lee

國立交通大學



#### A Dissertation

Submitted to Department of Electronics Engineering & Institute Electronics

College of Electrical and Computer Engineering

National Chiao Tung University in Partial Fulfillment of Requirements for the Degree of Doctor of Philosophy

in

Electronics Engineering
June 2010

Hsinchu, Taiwan, Republic of China

中華民國 九十九 年 六 月

## 應用於系統晶片之低功率 全數位式時脈產生器

研究生:盛鐸 指導教授:李鎮宜教授

國立交通大學電子工程學系電子研究所

### 摘要

隨著製程技術的進步以及電子產品功能需求的增加,系統晶片的複雜度日益增高。在複雜的系統晶片設計中,需要許多種類不同的時脈訊號以因應不同的功能需求。因此,如何設計適合於系統晶片的各種時脈產生器就成為一重要的議題。傳統上,時脈產生器常使用類比方式實現,但是類比時脈產生器於低供應電壓準位時面臨強大的設計挑戰,同時它有較低的系統整合度與較高的面積成本。相對於類比方式,全數位的實現方式則具有高系統整合度與低面積成本的特性,十分適合於系統晶片的應用。除此之外,在系統晶片應用中,功率和效能是設計時脈產生器最主要需要克服的問題。因此,本論文提出使用全數位的設計方案來實現多種應用於系統晶片的時脈產生器,並有效降低功率消耗與增進電路效能。

在全數位的時脈產生器設計中,最核心的電路模組為數位控制振盪器與延遲細胞元。數位控制振盪器與延遲細胞元的效能表現與功率消耗對全數位時脈產生器的整體效能表現有顯著與重要的影響。因此,本論文首先提出一低功率高效能的數位控制振盪器與延遲細胞元,而這樣的數位控制振盪器設計可同時應用於多種全數位時脈產生器之中。此數位控制振盪器使用粗調-微調的串接架構來提高操作頻率範圍同時維持高延遲精準度。粗調的部分使用分割延遲線的架構來節省不必要的功率消耗,細調的部分則使用遲滯延遲細胞元來減少電路的負載與複雜度進而減少功率消耗。因此,數位控制振盪器的整體的功率消耗可大幅降低同時維持高效能表現。

i

鎖相迴路是時脈產生器中最常見與最基本的一種。在具有功率管理功能的系統中,鎖相迴路需要能快速的提供已鎖定的時脈訊號,因此本論文接著提出具快速鎖定特性的全數位鎖相迴路。所提出的二階層快閃式時間數位量測轉換器能大幅縮短鎖定時間同時只需少量的硬體成本。除此之外,全數位展頻時脈產生器則是另一個常使用於系統晶片的電路,其作用為降低時脈訊號對系統的電磁干擾。本論文提出重排程片段三角調變的演算法來完成可程式化的展頻比率並同時保持對輸入時脈相位的追蹤能力。

在系統晶片中,記憶體是不可或缺的基礎元件。而其中雙資料速率記憶體因其高效能而廣為使用。由於雙資料速率記憶體控制器需要特殊的時脈控制訊號使雙資料速率記憶體能正確的工作,因此本論文提出以全數位延遲迴路與數位控制相位變換器為基礎的可調式時脈產生器,並可克服因長距離佈線所造成的延遲不匹配問題。而記憶體則需要同步映射延遲電路來解決內部因佈線長短不同而造成的時脈扭曲問題。本論文所提出的全數位式同步映射延遲電路使用邊緣觸發映射延遲細胞元以及使用高精確度延遲細胞元的微調延遲線來擴大可接受責任週期的範圍與縮小靜態相位誤差。

本論文所提出的全數位時脈產生器設計方案中,除了使用所提出的數位控制 振盪器與各種設計技巧來提高效能與降低功率消耗,並且皆使用標準函式庫元件 來實現硬體。因其具有的可移植性,它可如同軟矽智產一般的輕易將其設計轉換 於不同的製程上。因此,所提出的全數位時脈產生器非常適合應用於系統晶片與 系統層次整合。

## Low-Power All-Digital Clock Generators for SoC Applications

Student: Duo Sheng Advisor: Chen-Yi Lee
Department of Electronics Engineering and Institute of Electronics,
National Chiao-Tung University

#### Abstract

As IC technology migrates to nano-scale era and the demand of electronic product function increases, the design of system-on-chip (SoC) becomes more complex. In the complex SoC design, it needs many different clock signals for the different functional requirements. Thus, how to design the various clock generators for SoC applications becomes an important topic. Traditional clock generators are designed by analog approach. However, the analog clock generator not only encounters a high design challenge as supply voltage decreases, but also it is hard to be integrated into system design due to large area. In contrast to analog approach, all digital design approach is very suitable for SoC applications due to high portability and low design cost. In addition, power consumption and performance are major design considerations of clock generator in SoC applications. Thus, this work proposes a systematic all-digital design approach to implement various clock generators with high performance and low power for SoC applications.

The kernel module of all-digital clock generators is digitally controlled oscillator (DCO) and delay cell. Because DCO and delay cell dominate the overall performance and power consumption of all-digital clock generator, this work proposes a high-performance and low-power DCO and delay cell that can apply to all kinds of all-digital clock generators for SoC applications. The proposed DCO employs a cascadable structure with coarse and fine-tuning stage to achieve high resolution and

wide frequency range at the same time. The coarse-tuning stage utilizes a segmental delay line (SDL) to reduce redundant power, and the proposed hysteresis delay cell (HDC) can reduce the circuit complexity and loading of the fine-tuning stage to further lower down the power consumption. As a result, the power consumption of the proposed DCO can be reduced significantly while keeping high performance.

The phase-locked loop (PLL) is the most essential type of clock generator. For the power management system application, PLL should provide the locked clock signal in a short time. Thus, this work proposes a fast-lock-in all-digital PLL (ADPLL) which employs a novel 2-level flash time-to-digital converter (TDC) to reduce lock-in time with low hardware cost. Besides, an all-digital spread spectrum clock generator (ADSSCG) that reduces the electromagnetic interference (EMI) effect is another important design in SoC applications. The proposed rescheduling division triangular modulation (RDTM) scheme can enhance the phase tracking capability and provide wide programmable spreading ratio at the same time.

Memory is an essential component of SoC design. Double data rate (DDR) memories have been widely used for high-performance system in modern SoC designs to meet required data bandwidth. Because DDR memory controller needs specified clock and control signal to ensure the functionality and performance of data accesses, a tunable phase shift scheme based on all-digital delay locked loop (ADDLL) and digital control phase shifter (DCPS) has been proposed in this work to solve the delay mismatching issue. In addition, memory design utilizes the synchronous mirror delay (SMD) to eliminate the clock skew by wire delay mismatching. The proposed all-digital SMD (ADSMD) uses edge-trigger mirror delay cells to enlarge the input

duty cycle range and fine-tuning delay line with high-resolution delay cell to reduce the static phase error.

The proposed all-digital clock generators not only use the proposed DCO/delay cell and several design techniques to enhance performance and reduce power consumption, but also can be realized by standard cells in standard CMOS processes, making it easily portable to different processes as a soft intellectual property (IP). As a result, the proposed all-digital clock generators are very suitable for SoC applications as well as system-level integration.



#### 誌 謝

能完成博士班的相關研究,首先最要感謝的就是我的指導教授李鎮宜博士。 老師教導我要以樂觀積極的精神在研究上不斷的創新,並以宏觀的系統角度來觀察問題,同時要以嚴謹的態度面對研究上的挑戰。老師無論在治學態度或是待人處世上都是我心中永遠的典範。此外,要感謝我的口試委員:王進賢教授、劉深淵教授、李泰成教授、黃錫瑜教授、黃威教授與許騰尹教授在百忙中參加我的口試,給我許多寶貴的建議,讓我看到研究上許許多多不同的面向,並啟發我未來的研究方向。

在此要特別感謝鍾菁哲教授,鍾教授在我遇到電路設計或論文撰寫上的相關問題時總是不吝的對我提出許多懇切的建議與指導,讓我能在研究上不斷的突破與成長,讓我穫益良多。除此之外也要感謝 Si2 實驗室的好夥伴: 黎峰學長、軒字學長、建青學長、瑞元學長、子明兄、元哥、志龍、曜哥、義澤、芳年與琇茹,一同分享研究的心得,並提供許多系統晶片設計與應用上的寶貴建議。

此外,要感謝我在職場上的好長官:柯裕豐與曾友信對於我的鼓勵與包容,讓我能在工作與學業上取得平衡,並且有能將研究成果結合實際產品的機會。當然不能忘記工作上的好同事:Andy、志文、阿寬、詠松、世一、雄哥、思蔚、Steven、Mark、江龍與小芳,除了在工作上的互相協助與鼓勵,也讓我在生活與休閒上找到許多樂趣。

最後,要感謝我的父母親,沒有您的栽培與養育,就不會有這本論文的完成。 另外,更要感謝一路陪我走來的太太,總是陪我度過一次次的困境與低潮,也伴 我感受所有最深刻的感動與喜悅。

最深的感激, 無以言盡。

## **Contents**

| Chapte            | er 1 | Introduction                                             | 1               |
|-------------------|------|----------------------------------------------------------|-----------------|
| 1.1               | Mo   | otivation                                                | 1               |
| 1.2               | Go   | al and Contribution                                      | 4               |
| 1.3               | Dis  | ssertation Organization                                  | 8               |
| Chapte<br>Delay ( |      | Low-Power Digitally Controlled Oscillator Design with Hy | ysteresis<br>10 |
| 2.1               | Int  | roduction                                                | 10              |
| 2.2               | Ну   | steresis Delay Cell                                      | 13              |
| 2.3               | Th   | e Proposed DCO Architecture                              | 16              |
| 2.3               | 3.1  | Segmental Coarse-Tuning Stage                            | 17              |
| 2.3               | 3.2  | Fine-Tuning Stage                                        | 18              |
| 2.4               | DC   | CO Performance Comparisons                               | 20              |
| 2.4               | 4.1  | Coarse-Tuning Stage Performance Comparisons              | 20              |
| 2.4               | 4.2  | Fine-Tuning Stage Performance Comparisons                | 21              |
| 2.5               | Ex   | perimental Results and Comparisons                       | 24              |
| 2.6               | Su   | mmary                                                    | 26              |
| Chapte            | er 3 | Fast Lock-In All-Digital Phase-Locked Loop Design        | 27              |
| 3.1               | Int  | roduction                                                | 27              |
| 3.2               | Bir  | nary Search ADPLL Overview                               | 29              |
| 3.2               | 2.1  | Binary Search ADPLL Architecture                         | 29              |
| 3.2               | 2.2  | Binary Search Algorithm                                  | 31              |
| 3.3               | Th   | e Proposed TDC-Based ADPLL                               | 33              |
| 3.4               | Tir  | me-to-Digital Converter                                  | 36              |
| 3.4               | 4.1  | TDC Overview                                             | 36              |
| 3.4               | 4.2  | The Proposed 2-level flash TDC                           | 38              |

| 3.5     | Experimental Results                                 | 41 |
|---------|------------------------------------------------------|----|
| 3.6     | Summary                                              | 42 |
|         |                                                      |    |
| Chapter | 4 All-Digital Spread Spectrum Clock Generator Design | 43 |
| 4.1     | Introduction                                         | 43 |
| 4.2     | The Proposed ADSSCG Design                           | 45 |
| 4.2.    | ADSSCG Architecture Overview                         | 45 |
| 4.2.2   | 2 Spread Spectrum Algorithm                          | 46 |
| 4.3     | DCO Design                                           | 51 |
| 4.3.    | DCO Architecture                                     | 51 |
| 4.3.2   | Auto-Adjustment Algorithm for Monotonic DCO          | 53 |
| 4.4     | Experimental Results and Comparisons                 | 55 |
| 4.5     | Summary                                              | 58 |
| Chapter | 5 All-Digital Delay-Locked Loop Design               | 60 |
| 5.1     | Introduction 1896                                    | 60 |
| 5.2     | The Proposed Clock Generator Architecture            | 63 |
| 5.2.2   | Tunable Phase Shift Scheme                           | 63 |
| 5.2.2   | ADDLL and DCPS Design                                | 64 |
| 5.3     | ADDLL Circuit Design                                 | 67 |
| 5.3.    | Digitally Controlled Delay Line                      | 67 |
| 5.3.2   | 2 Time-to-Digital Converter                          | 70 |
| 5.4     | Experimental Results and Comparisons                 | 71 |
| 5.5     | Summary                                              | 73 |
|         |                                                      |    |
| Chapter | 6 All-Digital Synchronous Mirror Delay Design        | 75 |
| 6.1     | Introduction                                         | 75 |
| 6.2     | SMD Overview                                         | 77 |
| 6.3     | The Proposed ADSMD Design                            | 79 |

| 6.4     | Experimental Results             | 82 |
|---------|----------------------------------|----|
| 6.5     | Summary                          | 83 |
| Chapte  | r 7 Conclusions and Future Works | 85 |
| 7.1     | Conclusions                      | 85 |
| 7.2     | Future Works                     | 86 |
|         |                                  |    |
| Referen | nea                              | 80 |



## List of Figures

| Fig. 2.1  | Power profiling of ADPLL                                                                                 | 11 |
|-----------|----------------------------------------------------------------------------------------------------------|----|
| Fig. 2.2  | (a) Proposed HDC(b) Equivalent circuit of HDC for analysis                                               | 14 |
| Fig. 2.3  | Hysteresis phenomenon of HDC                                                                             | 15 |
| Fig. 2.4  | The relation among input voltage of TINV, effective driving current, and INV1 delay                      | 15 |
| Fig. 2.5  | Architecture of the proposed DCO                                                                         | 16 |
| Fig. 2.6  | Proposed segmental coarse-tuning stage with SDL                                                          | 17 |
| Fig. 2.7  | Proposed fine-tuning stage with HDC and DCV                                                              | 18 |
| Fig. 2.8  | Power comparisons of different coarse-tuning designs                                                     | 20 |
| Fig. 2.9  | Power and resolution comparisons of different fine-tuning designs                                        | 22 |
| Fig. 2.10 | Microphotography and layout of DCO test chip                                                             | 23 |
| Fig. 2.11 | Comparisons of measurement and post-layout                                                               |    |
|           | simulation results                                                                                       | 24 |
| Fig. 2.12 | Jitter histogram of DCO at 952MHz                                                                        | 25 |
| Fig. 3.1  | Binary search ADPLL architecture                                                                         | 29 |
| Fig. 3.2  | Binary search algorithm                                                                                  | 30 |
| Fig. 3.3  | Flowchart of phase tracking mode                                                                         | 31 |
| Fig. 3.4  | TDC-based ADPLL architecture                                                                             | 32 |
| Fig. 3.5  | Counter-based TDC                                                                                        | 33 |
| Fig. 3.6  | <ul><li>(a) Single delay chain flash TDC</li><li>(b) Operation of Single delay chain flash TDC</li></ul> | 35 |
| Fig. 3.7  | Vernier delay line TDC                                                                                   | 36 |
| Fig. 3.8  | The proposed 2-level flash TDC architecture                                                              | 38 |
| Fig. 3.9  | Simulation of 2-level flash TDC                                                                          | 39 |
| Fig. 3.10 | Transient response of binary search ADPLL                                                                | 40 |
| Fig. 3.11 | Transient response of TDC-based ADPLL                                                                    | 41 |
| Fig. 4.1  | Architecture of the proposed ADSSCG                                                                      | 46 |

| Fig. 4.2  | <ul><li>(a) Conventional triangular modulation.</li><li>(b) Division triangular modulation</li><li>(c) Rescheduling division triangular modulation</li></ul> | 47       |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| Fig. 4.3  | <ul><li>(a) Architecture of the proposed DCO</li><li>(b) Fine-tuning cells of DCO</li></ul>                                                                  | 51       |
| Fig. 4.4  | Flowchart of auto-adjustment algorithm                                                                                                                       | 54       |
| Fig. 4.5  | Comparison between original and adjusted timing                                                                                                              | 54       |
| Fig. 4.6  | Microphotograph of ADSSCG test chip                                                                                                                          | 55       |
| Fig. 4.7  | Measurement spectrum of 54MHz (a) Without frequency spreading (b) With 1% frequency spreading                                                                | 56       |
| Fig. 4.8  | Measurement spectrum of 27MHz (a) Without frequency spreading (b)With 10% frequency spreading                                                                | 56       |
| Fig. 5.1  | <ul><li>(a) Interconnection of DDR memory and core system</li><li>(b) Waveform of read operation</li></ul>                                                   | - 1      |
| Fig. 5.2  | (c) Waveform of write operation  Architecture of the proposed tunable phase shift scheme for DDR controller                                                  | 61<br>64 |
| Fig. 5.3  | Flowchart of the proposed tunable phase shift scheme                                                                                                         | 65       |
| Fig. 5.4  | Architecture of (a) ADDLL (b) DCPS                                                                                                                           | 66       |
| Fig. 5.5: | (a) Proposed DCDL (b) Coarse-tuning stage (c) Fine-tuning stage                                                                                              | 67       |
| Fig. 5.6: | (a) Proposed TDC (b) Waveform of TDC                                                                                                                         | 68       |
| Fig. 5.7  | Layout of ADDLL and DCPS                                                                                                                                     | 69       |
| Fig. 5.8  | (a) Transient response of ADDLL(b) ADDLL at steady state                                                                                                     | 70       |
| Fig. 5.9  | Tunable signal phase scheme in read operation when (a) DQS leads DQ (b) DQS lags DQ                                                                          | 71       |
| Fig. 5.10 | Phase shift between CLOCK1 and CLOCK2 at 400MHz                                                                                                              | 72       |
| Fig. 5.11 | Jitter and phase shift of ADDLL under different PVT                                                                                                          | 73       |
| Fig. 6.1  | Architecture of the conventional SMD                                                                                                                         | 77       |
| Fig. 6.2  | (a) Architecture of the proposed SMD (b) Circuit of EMDC                                                                                                     | 78       |
| Fig. 6.3  | Block diagram and equivalent circuit of DCV                                                                                                                  | 79       |

| Fig. 6.4 | Timing waveform (a) without blocking scheme                 |    |  |  |
|----------|-------------------------------------------------------------|----|--|--|
|          | (b) with blocking scheme                                    | 80 |  |  |
| Fig. 6.5 | Microphotography of SMD test chip                           | 81 |  |  |
| Fig. 6.6 | (a) Timing diagram of the proposed SMD                      |    |  |  |
|          | (b) Acceptable Input duty cycle under different frequencies | 82 |  |  |



## List of Tables

| Table 2.1 | Comparisons of Different DCO Approaches                   | 12 |
|-----------|-----------------------------------------------------------|----|
| Table 2.2 | Performance Comparisons with Different Fine-Tuning Stages | 21 |
| Table 2.3 | Measurement Results of Step/Range of Tuning Stage         | 23 |
| Table 2.4 | DCO Performance Comparisons                               | 25 |
| Table 4.1 | Jitter and Timing Comparisons of DTM and RDTM             | 50 |
| Table 4.2 | Simulation Results of Delay of Tuning Stage               | 52 |
| Table 4.3 | SSCG Performance Comparisons                              | 58 |
| Table 5.1 | ADDLL Performance Comparisons                             | 74 |
| Table 6.1 | Comparisons of Different SMD Approaches                   | 76 |
| Table 6.2 | Phase Error Under Different PVT Condition                 | 83 |
| Table 6.3 | ADSMD Performance Summary                                 | 83 |



## Chapter 1

## Introduction

#### 1.1 Motivation

As IC technology grows up rapidly, the focus of the modern VLSI design moves from single functional block to system-level integration and single chip solution. Because the demand of electronic product function increases, many different functional blocks are integrated into single chip, leading to increase the design complexity of system-on-chip (SoC). In the complex SoC design, it needs the various clock signals to meet the different functional block requirements. Hence, how to design the various clock generators to provide suitable clock signals for SoC applications becomes an important topic.

The design for realizing clock generator can be partitioned into analog and all-digital design approaches. Traditionally, the clock generators are realized by analog approach. However, as supply voltage decreases, both gain and frequency range need to be traded off in voltage-controlled oscillator (VCO) which is the most important block in analog clock generator. In addition, due to serious leakage current problem, it is hard to design a charge-pump circuit that is the essential block in analog clock generator in more advanced process technology. Thus it needs more design efforts to integrate analog clock generators in SoC with lower supply voltage and advanced process. Moreover, because the analog clock generator employs the passive

components such as resistor and capacitor to form the loop filter, it induces large area and cost. Furthermore, as technology migrates, the analog blocks in clock generator need to be re-designed, leading to enlarge the design turn around time.

In contrast to analog clock generator, all-digital design approach does not utilize any passive components and use digital design approaches, making it easily be integrated into digital and low-supply voltage systems. Because all-digital clock generator is reusable as a soft intellectual property (IP), it can radically decrease time-to-market for a design and be very suitable for SoC applications as well as system-level integration. As a result, it motivates us to focus on all-digital clock generator design for SoC applications in this dissertation.

Performance and power are always the most important design considerations in SoC design. Because the all-digital clock generator controls timing discretely, the minimum controllable delay resolution should be quite high to achieve low steady-state jitter. In addition, because a large number of clock generators are to be integrated into single chip, each clock generator should have low-power characteristic to further reduce overall power consumption of system. Among the functional blocks of all-digital clock generators, digitally controlled oscillator (DCO) is the kernel module, because it dominates overall performance and power consumption of all-digital clock generator. For example, DCO occupies over 50% power consumption of all-digital clock generator [1], and the delay resolution and operating range affect jitter performance and output frequency range of all-digital clock generator, respectively. According to these design requirements, all-digital clock generators require a high-performance and low-jitter DCO. Thus, before we start to study and design all-digital clock generators, a high-performance and low-power delay cell and

DCO that can be applied to all kinds of all-digital clock generators for SoC applications should be proposed first.

After we complete the design of a low-power DCO, the follow-up work focuses on all-digital clock generators. There are four important types of clock generators in SoC applications, namely phase-locked loop (PLL), spread-spectrum clock generator (SSCG), delay-locked loop (DLL), and synchronous mirror delay (SMD). The function and application of these clock generators are demonstrated as follows:

- PLL: It is widely used in microprocessor (μp) based and digital system [2]-[4]. It receives reference clock from the external components, for example a quartz crystal, and generates a set of system clock signals with frequency multiplication for system operation.
- SSCG: In SoC applications, the radiated emissions of system should be kept below an acceptable level to ensure the functionality and performance of system and adjacent devices, especially in high-speed serial link and video/display systems [5]. The SSCG can reduce the electromagnetic interference (EMI) effect significantly by the frequency-spreading clock and maintain the system performance [6].
- DLL: In the high-speed serial link and data transmission applications, a DLL-based multiphase clock generator generates the multiphase clocks that can be used to find a better sampling point and process data streams at a bit rate higher than internal clock frequencies to improve overall system performance [7], [8]. In addition, DLL also can eliminate the clock skew

among different functional blocks due to large wire loading in single chip or among multiple chips.

• SMD: Memory is an essential component of SoC design. In order to eliminate the internal clock skew by wire delay mismatching, memory design needs a synchronous mirror delay (SMD), with low complexity and small area, to quickly provide a small static phase error clock as compared with the external clock [9].

The design for SoC applications not only has to achieve high performance, low power, and low complexity, but it requires high portability to migrate to other processes easily and have a short design turn around time. Hence, this work attempts to implement the proposed all-digital clock generator only with standard cells, making it easily portable to different processes and very suitable for SoC applications.

## 1.2 Goal and Contribution

From the descriptions in the previous section, we can find that low-power all-digital clock generators are highly demanded in SoC applications. On the other hand, the performance requirement of clock generator grows rapidly as design complexity increases. Thus, the low-power all-digital clock generators design becomes more challenging than before. In this dissertation, an ultra-low-power DCO designed for all-digital clock generators is proposed. And based on this proposed DCO and delay cell, the overall power consumption can be saved significantly. The proposed all-digital clock generators not only utilize the proposed DCO and delay cell to raise performance, but also have performance improvement from the algorithmic

and architectural level. Furthermore, the proposed clock generators are truly portable because of realization by standard cells only. The goal and contribution of this dissertation are summarized as follows.

#### 1. Digitally controlled oscillator (DCO)

 Goal: Propose a low-power and high-resolution DCO for all-digital clock generators.

#### • Contribution:

- Based on the proposed segmental delay line (SDL) and hysteresis delay cell (HDC), the total power consumption of the proposed DCO can be improved to 140µW at 200MHz. As compared with conventional approaches, power consumption can be saved by 70% and 86.2% in coarse-tuning and fine-tuning stages respectively.
- The proposed DCO employs a cascade-stage structure to achieve high resolution with 1.47ps and wide range at the same time.

#### 2. All-digital phase-locked loop (ADPLL)

 Goal: Propose a fast lock-in and low-power ADPLL for power management scheme.

#### • Contribution:

- The proposed ADPLL uses a novel 2-level flash time-to-digital converter (TDC) to lock in within 2 reference clock cycles. In contrast to single level type, our proposed design takes only 12 D-flip-flops, thus it can reduce hardwire complexity and power consumption.
- The proposed ADPLL employs the proposed low-power DCO saves the overall power consumption.
- 3. All-digital spread-spectrum clock generator (ADSSCG)
  - Goal: Propose a low-power and programmable spreading ratio
     ADSSCG for EMI reduction of liquid crystal display (LCD) display system.
  - Contribution:
    - The proposed ADSSCG employs a novel rescheduling division triangular modulation (RDTM) to enhance the phase tracking capability and provide wide programmable spreading ratio. The reduction of peak power is 9.5dB at 54MHz with 1% of spreading ratio, and the reduction of peak power is 15dB at 27MHz with 10% of spreading ratio.
    - The proposed ADSSCG employs the proposed low-power DCO with auto-adjustment algorithm saves the power consumption while keeping delay monotonic characteristic.

The total power consumption is 1.2mW at @54MHz, and the power index is 22.2 (μW/MHz) that is the highest power-to-frequency ratio as compared with the state-of-the art designs, implying the proposed ADSSCG is more effective in power saving for a given operating frequency.

#### 4. All-digital delay-locked loop (ADDLL)

 Goal: Propose a tunable phase shift scheme based on ADDLL for DDR memory interface.

#### • Contribution:

- The proposed phase shift scheme provides an all-digital and suitable solution to eliminate the non-ideal effect of data transmission between multi-chip interconnections especially for high data rate interconnection applications.
- The proposed ADDLL that employs the high-performance digitally controlled delay line (DCDL) with HDC and TDC can achieve small phase-shift error in 1.3° at 400MHz and locking time of less than 13 clock cycles. As compared with the conventional ADDLLs, it can achieve the fastest phase lock and keep the smallest phase-shift error.

#### 5. All-digital synchronous mirror delay (ADSMD)

 Goal: Propose a wide input duty cycle range and small static phase error ADSMD for clock synchronization in SoC applications.

#### • Contribution:

- The proposed SMD uses the edge-trigger mirror delay cell to enlarge the input duty cycle range (from 20% to 80%) and the blocking edge-trigger scheme to ensure the functionality and performance.
- The phase error can be reduced to 18ps at 400MHz by the proposed delay-matching structure and fine-tuning delay line with high-resolution delay cell.

## 1.3 Dissertation Organization

This dissertation is organized as follows. Chapter 2 describes the proposed architecture and circuit of high-resolution and ultra-low-power DCO. The proposed DCO and HDC can be applied to the following clock generators. In Chapter 3, the general binary search-based ADPLL is discussed and the proposed TDC-based ADPLL for fast-lock-in demand is presented. Chapter 4 focuses on the proposed ADSSCG employs a novel rescheduling division triangular modulation (RDTM) to enhance the phase tracking capability and provide wide programmable spreading ratio. And the auto-adjustment algorithm for monotonic delay characteristic also has been proposed. In Chapter 5, the proposed tunable phase shift scheme based on ADPLL

for DDR controller application is presented. Chapter 6 describes the proposed ADSMD employs a delay-matching structure and a high-resolution delay cell to achieve small static phase error and an edge-trigger mirror delay cell to extend input duty cycle range. Finally, the conclusions and future works are given in Chapter 7.



## Chapter 2

# Low-Power Digitally Controlled Oscillator with Hysteresis Delay Cell

#### 2.1 Introduction

Digitally controlled oscillator (DCO) and digitally controlled delay line (DCDL) is the most important module in ADPLL/ADSSCG and ADDLL/ADSMD respectively. The delay cells are used to construct a ring oscillator in ADPLL/ADSSCG and a delay line in ADDLL/ADSMD. In this chapter, the high-performance and low-power delay cell will be described first, and the follow-up works will focus on the DCO architecture design with the proposed delay cell.

Basically, digitally controlled oscillator (DCO) dominates the major performances of the all-digital clock generators such as power consumption and jitter, and hence is the most important component of such clocking circuits [1], [10]-[14]. In terms of power, DCO occupies over 50% power consumption of an all-digital clock generator [1]. For example, the DCO occupies 59% power consumption of an all-digital phase-locked loop (ADPLL) as shown in Fig. 2.1. As a result, the power consumption of DCO should be reduced further to save overall power dissipation to meet low-power demands in SoC designs. Besides, the resolution of DCO has large influences on jitter performance and frequency or phase error of output clock.



Fig. 2.1: Power profiling of ADPLL

Furthermore, if DCO can provide wide operating frequency range, it can extend the output frequency range of all-digital clock generator for the wider applications.

Recently, different architectural solutions have been proposed to implement the DCO. The current-starved type DCO [15] controls the supply current of delay cell to obtain different delay values. Although it has high resolution, it needs a static current source that will consume more static power dissipation. The LC tank DCO [16] can also achieve high delay resolution, however, it needs advanced process and requires intensive circuit layout. These approaches demand high complexity at circuit level, resulting in long design cycle and low portability.

In order to reduce design cycle when process or specification is changed, many DCO's implemented with standard cells have been proposed to enhance portability [1], [11], [17], [18]. Driving capability modulation (DCM) changes the driving current of each delay cell by controlling number of enabled tri-state buffers/inverters [1], [17]. The design concept of this approach is straightforward, but it has a poor performance in linearity and power consumption, and the resolution is insufficient.

Table 2.1: Comparisons of Different DCO Approaches

| Performance Indices | Driving capability modulation (DCM)  [1], [17]  Or-and-inverter (OAI) cell [11] |        | Digitally controlled varactor (DCV) [18] |  |
|---------------------|---------------------------------------------------------------------------------|--------|------------------------------------------|--|
| Resolution          | Poor                                                                            | High   | High                                     |  |
| Power               | High                                                                            | Medium | High                                     |  |
| Linearity           | Poor                                                                            | Poor   | Good                                     |  |
| Operation Range     | Wide                                                                            | Narrow | Narrow                                   |  |

The or-and-inverter (OAI) cells are proposed to enhance resolution by different input pattern combinations; however linearity remains to be solved [11]. Although digitally controlled varactor (DCV) has a good performance in resolution and linearity [18], it is hard to take a few cells to provide wider operation range. As a result, large power consumption is demanded due to many DCV cells to maintain an acceptable operation range. The brief summary of the different DCO approaches is listed in Table 2.1.

Thus, we attempt to propose a low-power, high-resolution, and wide-range DCO with high portability. Because the applications of our research focus on the general µp-based systems and communication baseband processors, the frequency operating range of the proposed DCO should be extended easily, and the maximum operation frequency of DCO would not be higher than 1GHz. In addition, the design target of power saving is an-order power reduction of the conventional works while keeping high delay resolution. However, because we want to propose a cell-based DCO design,

how to overcome the limitations of the standard cells to build up such low-power, high-resolution, and wide-range DCO are the important design challenges for our research.

This chapter is organized as follows. Section 2.2 describes the proposed hysteresis delay cell. Section 2.3 describes the proposed architecture and circuit of DCO. And how to reduce power consumption of DCO is also presented in this section. Section 2.4 discusses and analyzes the performance comparison results of the different DCO structures. In Section 2.5, the implementation and measurement results of the fabricated DCO chip are presented. Overall performance comparison with the state-of-the-art DCO's is also listed and discussed. Finally, a brief summary is addressed in section 2.6.

## 2.2 Hysteresis Delay Cell

Because DCO/DCDL usually utilizes many delay cells to generate the desired clock output, how to design a low-power delay cell is an important design issue in all-digital clock generator design. The delay cell should provide suitable and controllable delay value with low power and hardware penalty. Thus, the proposed hysteresis delay cell (HDC) which can reduce the gate count and loading is very suitable for all-digital clock generator applications. Fig. 2.2(a) illustrates the proposed HDCs used in the DCO and each of which contains one inverter (INV2) and one tri-state inverter (TINV). As the input state of control signal (F1ON [0] ~ F1ON [P-1]) of TINV in HDC changes, different delay can be obtained. The operation concept of HDC is to control driving current to obtain different propagation delay. When TINV of the HDC is enabled, the output signal of enabled TINV has the hysteresis



Fig. 2.2: (a) Proposed HDC. (b) Equivalent circuit of HDC for analysis.

phenomenon in the transition state to produce different delay times from the delay chain. Fig. 2.2(b) illustrates the equivalent circuit of HDC for analysis. The propagation delay  $T_p$  from  $N_1$  to  $N_2$  is a function of loading capacitance and equivalent resistance of turn-on MOS [19] and is given by

$$T_{p} = 0.69 C_{L} \left( \frac{R_{eqp} + R_{eqn}}{2} \right)$$
 (2.1)

where  $C_L$  is the loading capacitance of  $N_2$ ,  $R_{eqn}$  and  $R_{eqp}$  are equivalent resistance of NMOS and PMOS in the driving inverter (INV1) respectively. In the general operating situation,  $C_L$  remains as a constant value. But, the equivalent resistance of turn-on MOS in INV1 varies with saturation current and drain-source voltage and is expressed by

$$R_{eq} = \frac{1}{V_{DD}/2} \int_{VDD}^{V_{DD}/2} \frac{V}{I_{DSAT} (1 + \lambda V)} dV$$
 (2.2)



Fig. 2.3: Hysteresis phenomenon of HDC.



Fig. 2.4: The relation among input voltage of TINV, effective driving current, and INV1 delay.

where  $I_{DSAT}$  is the saturation current of transistor device. When TINV is enabled, since the input signal of TINV  $(N_3)$  does not vary with the input of INV1  $(N_I)$  instantaneously, it will sink the inverse current  $I_2$  to reduce the effective driving current from  $I_1$  to  $I_3$ . This leads to enlarge delay time of the delay chain. Fig. 2.3 shows the hysteresis phenomenon of this HDC, where input signal transition is observed from SPICE simulation. In the beginning,  $N_I$  and  $N_3$  remain at high level and  $N_2$  is at low level. As  $N_I$  signal level changes from high to low, the signal level of  $N_2$  attempts to vary from low to high. However, because  $N_3$  remains at high level for a while

(delayed by INV2), TINV sinks the inverse current to slow down the pull-high speed of  $N_2$ . Thus, (2.2) should be rewritten as follows

$$R_{eq} = \frac{1}{V_{DD}/2} \int_{VDD}^{V_{DD}/2} \frac{V}{(I_{LDSAT} - I_{2DSAT})(1 + \lambda V)} dV$$
 (2.3)

The effective driving current changes from  $I_{IDSAT}$  to  $I_{IDSAT} - I_{2DSAT}$  as TINV is enabled. The relation among input voltage of TINV, effective driving current, and INV1 delay is shown in Fig. 2.4. As the input voltage of TINV increases, the effective driving current of INV1 will decrease, leading to enlarge the delay of inverter chain. In addition, based on the different driving capability tri-state inverters in a given cell library, a set of different delay steps of HDC can be constructed for a specified DCO requirement.

## 2.3 The Proposed DCO Architecture

Fig. 2.5 illustrates the architecture of the proposed ultra-low-power DCO. Based on standard cells, our proposal can save power consumption and keep resolution. To preserve the control code resolution and operation range, the proposed DCO employs



Fig. 2.5: Architecture of the proposed DCO.

cascading structure for both coarse-tuning and fine-tuning stages to maintain control code-to-delay linearity and extend operation range easily. Two low-power circuit design techniques are proposed here. First, the proposed segmental delay line (SDL) can disable the transition of redundant segmental delay cells which is a two-input AND gate in coarse-tuning stage at target operation frequency. Second, the hysteresis delay cell (HDC) is proposed for fine-tuning stage to reduce the number of short-delay cells.

#### 2.3.1 Coarse-Tuning Stage

Fig. 2.6 shows the proposed segmental coarse-tuning stage, which is composed of  $2^M$ -1 two-input AND gates that form a SDL and a path-selection multiplexer. It can provide  $2^M$  different delay values by selecting different delay paths organized by these



Fig. 2.6: Proposed segmental coarse-tuning stage with SDL.

2<sup>M</sup>-1 two-input AND gates. In the conventional delay line of path-selection schemes [11], [12], [18], the delay cell is composed of two inverters. When delay line is requested to provide higher operation frequency, a shorter delay path is selected and



Fig. 2.7: Proposed fine-tuning stage with HDC and DCV.

the rest delay cells will not be used. However, these delay cells are not disabled. To reduce power consumption as the operating frequency changes, some enabling input controlled signals (EN [2<sup>M</sup>-2:0]) are set to low level to disable those redundant two-input AND gates.

#### 2.3.2 Fine-Tuning Stage

Because the resolution of the above mentioned coarse-tuning stage is not sufficient for typical DCO applications, a fine-tuning stage is added. In order to achieve better resolution and less power consumption, this fine-tuning stage is divided into three different sub-stages as shown in Fig. 2.7. It should be noted that the controllable range of each stage is larger than the delay step of the previous stage. As a result, the cascading DCO structure does not have any dead zone larger than the LSB resolution of DCO. The delay steps of these fine-tuning sub-stages are different; delay cells of the 1<sup>st</sup> stage and 3<sup>rd</sup> stage have the largest and smallest delay step, respectively. Therefore, delay cell of the 3<sup>rd</sup> fine-tuning stage determines the DCO

LSB resolution and controllable range of the 1<sup>st</sup> fine-tuning stage can cover the delay step of the coarse-tuning stage easily. Since the proposed HDC can provide larger delay step than DCV, the 1<sup>st</sup> fine-tuning stage employs P HDCs to replace many DCV cells, leading to save power consumption. Due to better resolution capability, different DCVs are exploited in the 2<sup>nd</sup> and 3<sup>rd</sup> fine-tuning stages to improve the overall resolution of DCO. The operation concept of DCV is to control the gate capacitance of logic gate with input state to adjust the delay time [12], [18]. The 2<sup>nd</sup> and 3<sup>rd</sup> fine-tuning stages employ Q long-delay DCV cells (two-input NAND) and R short-delay DCV cells (tri-state inverter) respectively.

To optimize both power consumption and resolution, a strategy of allocating the proportion of the sub-stages in the proposed fine-tuning stage is introduced. First, in order to achieve high operation frequency, P should be limited to enlarge the length of total delay line in the fine-tuning stage. Then a suitable delay step of HDC can be determined by P. Second, because the delay resolution is only determined by the delay step of DCV in the 3<sup>rd</sup> fine-tuning stage, it needs to select a short-delay DCV from the cell library to meet the resolution requirement. After delay step has been determined, R can be chosen for the range of the 3<sup>rd</sup> fine-tuning stage and the loading capacitance consideration. Finally, after the delay step adjustment of HDC and short-delay DCV, the delay step of long-delay DCV and Q in the 2<sup>nd</sup> fine-tuning stage can also be determined. Note that Q can be reduced significantly by exploiting HDC to save power. For example, if the requirement of output delay is 260ps, it uses 4 HDCs to cover such delay range and 8 short-delay DCV cells to achieve high resolution. By the final step, 32 long-delay DCV cells are utilized to form the 2<sup>nd</sup> fine-tuning stage. As a result, total power consumption and resolution of the proposed

fine-tuning stage is  $40.28\mu W$  and 0.97ps respectively under 200MHz and 0.8V in a  $0.13\mu m$  CMOS process.

### 2.4 DCO Performance Comparisons

#### 2.4.1 Coarse-Tuning Stage Performance Comparisons

For performance comparison, we rebuild those published approaches with an in-house 0.13µm CMOS standard cell library and then compare with our proposal. Because the DCO consists of coarse and fine tuning stages in general, the performance comparisons are divided into two parts as well.

In the coarse-tuning stage, we reconstruct the conventional delay line of path-selection type by two-inverter delay cells for power consumption comparisons. For fair comparisons, both conventional and the proposed segmental coarse-tuning stages have the same operation range. In terms of different operation frequencies, the



Fig. 2.8: Power comparisons of different coarse-tuning designs.

Table 2.2: Performance Comparisons with Different Fine-Tuning Stages

|              | Resolution | Total Power | Partial Power* |            | Danga (na) |
|--------------|------------|-------------|----------------|------------|------------|
|              | (ps)       | (µW)        | (µW)           | Gate Count | Range (ps) |
| Proposed     | 0.97       | 40.28       | 36.31          | 48         | 261.34     |
| Approach I   | 4.28       | 291.59      | -              | 256        | 263.66     |
| Approach II  | 1.07       | 233.61      | 228.77         | 128        | 266.9      |
| Approach III | 0.97       | 105.29      | 98.89          | 80         | 260.38     |

<sup>\*</sup> Power consumption of long-delay stage

simulation results of power consumption are shown in Fig. 2.8. As compared with conventional approaches, the proposed segmental coarse-tuning stage can reduce 70% and 25% of the power consumption at 500MHz and 200MHz respectively. Because the number of disabled redundant delay cells varies with different operation frequencies; the segmental scheme has different power reduction ratio in different operation frequencies.

#### 2.4.2 Fine-Tuning Stage Performance Comparisons

The fine-tuning stage determines many major performance indices of DCO, such as LSB resolution, delay linearity, and power consumption. Therefore, the performance comparisons of fine-tuning stage focus on these important performance indices. In the cell-based design approach, many designs exploit DCM or DCV to construct fine-tuning stage [1], [12], [17], [18]. For fair comparisons, these designs are rebuilt under the similar operation range, delay resolution, and number of control bit. To ensure correct functionality, the operation range of fine-tuning stage in all comparison candidates should be larger than the minimum delay step of two-input AND gate, which is 200ps in an in-house 0.13µm standard cell library. The rebuilt fine-tuning stages by different design approaches are: DCM type (Approach I) [1],



Fig. 2.9: Power and resolution comparisons of different fine-tuning designs.

[17], DCV type (Approach II) [18], and combination of DCM and DCV type (Approach III) [12]. The operation frequency range should be similar for fair comparisons, resulting in the different number of delay cells in different structures. For example, Approach I, Approach II, and Approach III utilize 256, 128, and 80 tri-state inverters, respectively. In contrast to these approaches, the proposed structure only needs 12 tri-state inverters, 4 inverters, and 32 two-input NAND gates (based on the strategy mentioned in subsection 2.3.2 with P, Q, and R are assigned to 4, 32, and 8 respectively).

The performance comparisons simulated at 200MHz at 0.8V and typical corner cases, are summarized in Table 2.2. Note that all of them have the similar performance in LSB resolution except Approach I. But, in terms of power consumption and area, the proposed design has significant improvement. Since the proposed HDC can replace many DCV cells to obtain wider operation range, the number of delay cells connected with each driving inverter and loading capacitance can be reduced, leading to save power consumption and gate count as well. The



Fig. 2.10: Microphotography and layout of DCO test chip.

Table 2.3: Measurement Results of Step/Range of Tuning Stage

|            | Coarse-Tuning | 1 <sup>st</sup> Fine-Tuning | 2 <sup>nd</sup> Fine-Tuning | 3 <sup>rd</sup> Fine-Tuning |
|------------|---------------|-----------------------------|-----------------------------|-----------------------------|
| Range (ps) | 3726.36       | 296.74                      | 116.02                      | 10.26                       |
| Step (ps)  | 120.21        | 98.91                       | 3.74                        | 1.47                        |

reduction ratios are 86.2%, 82.8%, and 61.7%, as compared with Approach I, Approach II, and Approach III, respectively. Fig. 2.9 also shows that our proposal has the high LSB resolution and low-power features as compared with the other designs.

Except Approach I, all of comparison candidates employ a short-delay DCV cell to form the finest delay cell; however, they utilize different type long-delay stages. Thus, we focus on the power comparison of long-delay stage in different approaches. In contrast to Approach II whose long-delay stage only utilizes long-delay DCV cell, our proposal exploits HDC and hence has less long-delay DCV cells compared with Approach II. As a result, power-to-delay ratio of long-delay stage of our proposal and Approach II is 0.14μW/ps (36.31μW/261.34ps) and 0.86μW/ps (228.77μW/266.9ps) respectively. Based on this power comparison, it is clear that HDC-based structure



Fig. 2.11: Comparisons of measurement and post-layout simulation results.

can provide better power-to-delay ratio than pure DCV type structure, implying HDC is more effective in power saving for a given delay.

### 2.5 Experimental Results and Comparisons

Based on the requested frequency range and resolution for our application, the design parameters of the proposed DCO are determined as follows: N=10, M=5, P=4, Q=32, and R=8. In order to verify the feasibility and performance of the proposed DCO in advanced processes, a test chip has been fabricated in 90nm 1P9M CMOS process, where the chip microphoto and layout of the DCO chip is shown in Fig. 2.10. The DCO output signal is measured using LeCroy SDA4000A at 1V/25°C (supply of I/O pad is 2.5V) to test the performance. Due to the speed limitation of I/O pad, the DCO output frequency has to be divided by 2 when DCO operates at high frequency. Table 2.3 shows the delay step and operation range of different tuning stages in the proposed DCO. It shows that the controllable range of each stage is larger than the step of the previous stage, and the average DCO resolution is 1.47ps. Fig. 2.11 shows the comparison between measurement results and post-layout simulation to illustrate



Fig. 2.12: Jitter histogram of DCO at 952MHz.

Table 2.4: DCO Performance Comparisons

| Performance Indices | Proposed DCO    | JSSC'05 [15]        | TCAS2'05 [18]  | JSSC'04 [1]     | JSSC'03 [11]     |
|---------------------|-----------------|---------------------|----------------|-----------------|------------------|
| Process             | 90nm CMOS       | 0.18μm CMOS         | 0.35μm CMOS    | 0.35μm CMOS     | 0.35µm CMOS      |
| Supply Voltage (V)  | 1               | 1.8                 | 3.3            | 3               | 3.3              |
| DCO Control Word    | 15              | 5 1896 15           |                | 7               | 12               |
| Length              | 13              |                     |                |                 |                  |
| Operation Range     | 191 ~ 952       | 413 ~ 485           | 18 ~ 214       | 152 ~ 366       | 45 ~ 510         |
| (MHz)               | 191 ~ 932       | 413 ~ 463           | 16 ~ 214       | 132 ~ 300       | 45 ~ 510         |
| LSB Resolution (ps) | 1.47            | 2                   | 1.55           | 10 ~ 150        | 5                |
| Power Consumption   | 140μW (@200MHz) | 340μW (Static only) | 18mW (@200MHz) | 12mW (@366MHz)* | 50mW (@500MHz) * |
| Portability         | Yes             | No                  | Yes            | Yes             | Yes              |

<sup>\*</sup> Power consumption calculated from 50% of ADPLL [2].

the linearity analysis of the proposed DCO. Both rms and peak-to-peak phase jitter at 417MHz is 8.18ps and 49.05ps respectively. Fig. 2.12 shows the rms and peak-to-peak phase jitter is 8.24ps and 49.95ps respectively over 150,000 sweeps at 952MHz under 1V and 60mV supply noise.

Table 2.4 lists comparison results with the state-of-the-art DCOs. In terms of power consumption, the proposed DCO has the lowest power consumption compared

with other DCO designs. Furthermore, the proposed low-power solution does not induce any performance loss. Additionally, since the proposed DCO can be implemented with standard cells, it has a good portability. As a result the proposed DCO has the benefits of better resolution, operation range, linearity, and portability.

### 2.6 Summary

In this chapter, we have proposed a hysteresis delay cell an ultra-low-power DCO with cell-based design for SoC applications. The proposed HDC not only can be used in low-power DCO, but also can reduce the DCDL power consumption. With the proposed segmental tuning structure and HDC, the power consumption of coarse-tuning and fine-tuning stages can be further reduced by 70% and 86.2% respectively, as compared with conventional designs. Measurement results show that our proposed DCO can achieve 1.47ps resolution and 140μW at frequency of 200MHz. The proposed DCO achieve over an-order power reduction of the conventional works. As a result our proposal achieves not only less power consumption, but also better LSB resolution and delay linearity of DCO. Moreover, because the proposed DCO has a good portability as a soft intellectual property (IP), it is very suitable for SoC applications as well as system-level integration.

### Chapter 3

# Fast Lock-In All Digital

## Phase-Locked Loop Design

### 3.1 Introduction

In this chapter, a fast lock-in all-digital phase-locked loop design is presented. As mentioned in Chapter 1, many applications such as microprocessor, communication baseband processor, and multimedia system require a clock synthesizer or clock multiplier. Hence, PLL had become an essential component in SoC design. In order to reduce overall power consumption of SoC design, especially in portable and mobile applications, system uses the power management commonly to save the redundant power dissipation. To support this low-power technique, the PLL should provide fast entry and exit from power management techniques [10]. As a result, the locking time of PLL is a very important design specification for low-power SoC applications. In addition, for the fast locking frequency synthesizer applications, such as a frequency hopping multiple access systems, locking time is also the most critical design issue.

For fast acquisition requirement, the traditional analog PLL requires tuning of the voltage-controlled oscillator (VCO) free-running frequency near the desired frequency in advance or increases loop bandwidth. However, the exact VCO tuning range is not easy due to process, voltage, and temperature variations (PVT variations), and the increased loop bandwidth degrades jitter performance [20]. Many researchers have focused on overcoming such structural handicap. A digital frequency-difference detector (DFDD) is proposed in [20] to convert the frequency difference directly to the digital code, and then control the VCO gain adaptively. The adaptive loop bandwidth scheme is proposed by [21] to reduce the locking time. But, the circuit complexity will be increased due to the adaptive loop bandwidth architecture.

In contrast to analog approaches, all-digital phase-locked loop (ADPLL) using binary search algorithm is proposed to achieve locking with 50 [10] and 46 [11] cycles, respectively. The binary search ADPLL can not only achieve fast lock, but also have good performance as compared with the analog PLL. To further reduce locking time, a time-to-digital converter (TDC) based ADPLL is proposed in [22]. This ADPLL uses TDC to quantize the reference clock period into multiples of inverter delay times. Because TDC and DCO are influenced by the same PVT variations, the TDC measured code is more accurately and can cope with PVT variations. However, the power consumption and design complexity will be increased due to the TDC digital processing unit.

As a result, the research target of the proposed ADPLL is to achieve fast lock-in using TDC with small hardware penalty. In addition to locking time, power consumption is another important design specification of ADPLL, thus the proposed fast lock-in ADPLL employs the high-resolution and low-power DCO as described in Chapter 2 to save overall power and enhance performance.



Fig. 3.1: Binary search ADPLL architecture.

This chapter is organized as follows. Section 3.2 introduces and describes the proposed design of the binary search ADPLL. The proposed TDC-based ADPLL for fast locking is described in Section 3.3. In Section 3.4, the proposed low-complexity 2-level flash TDC is presented, and the review of previous work of TDC is also dicussed in this section. In Section 3.5, the simulation results of the proposed ADPLLs are presented and discussed. Finally, a brief summary is given in Section 3.6.

### 3.2 Binary Search ADPLL Overview

### 3.2.1 Binary Search ADPLL Architecture

Fig. 3.1 illustrates the proposed binary search ADPLL architecture. It consists of seven major functional blocks: a phase/frequency detector (PFD), two digitally controlled oscillators (DCO's) (tracking and average DCOs), an ADPLL controller, and three frequency dividers (pre-divider, DCO divider, and output divider). N, M, and K are inputs for programming pre-divider, DCO divider, and output divider respectively. There are two DCO's in the ADPLL: the tracking DCO is used for



Fig. 3.2: Binary search algorithm.

tracking reference clock and the average DCO can generate the output clock with small jitter by the average mechanism.

The PFD detects the frequency difference and phase error between the divided reference clock (Ref\_N) and the divided DCO output clock (DCO\_M), and it generates LEAD/LAG signals to speed up or slow down the DCO output frequency. When controller receives LEAD from PFD, it increases the DCO control code (DCO code [16:0]) to decrease the output frequency of the tracking DCO. Oppositely, when controller receives LAG from PFD, it decreases the DCO control code to increase the output frequency of the tracking DCO. These blocks form a close-loop to achieve the "phase-lock" function. For frequency synthesis application, the controller can filter DCO control code variation and control average DCO to provide a low-jitter clock output (OUTPUT CLK). For clock multiplier application, the in-phase clock is generated directly from the tracking DCO.



Fig. 3.3: Flowchart of phase tracking mode.

### 3.2.2 Binary Search Algorithm

The locking procedure of the binary search ADPLL can be divided into two modes: frequency acquisition and phase tracking. Phase lock starts from frequency acquisition mode. The frequency acquisition mode employs binary search algorithm to search the target frequency of input clock. Fig.3.2 illustrates the binary search algorithm for frequency acquisition. In the beginning, DCO oscillates at the middle of DCO frequency band, and the search step is one fourth of DCO frequency band. If output frequency is higher than target frequency, ADPLL controller adds current search step to DCO control code to lower the output frequency. Conversely, if output frequency is lower than target frequency, ADPLL controller adds DCO control code to increase the output frequency. Whenever PFD output changes from LAG to LEAD



Fig. 3.4: TDC-based ADPLL architecture

or vice versa, the search step is divided by 2. After the search step reduces to 1, the frequency acquisition completes.

After the frequency acquisition completes, the locking procedure enters into phase tracking mode. Fig.3.3 shows the flowchart of phase tracking mode. In the beginning of this mode, the speed-up count (SPEEDUP\_COUNT) sets to zero. When the PFD output changes from LAG to LEAD or vice versa, that means the phase polarity changes, the search step will be reduced half of the previous step. If the search direction keeps the same way, the speed-up count will add one. When the speed-up count equals to the boundary value, the search step will be doubled as the previous step to accelerate the phase tracking. If the boundary value is too large, the PLL may not track the input phase. Conversely, small boundary value will occur the unstable issue. By the simulation, the boundary value is selected to eight.

Due to the PFD dead zone and the reference clock noise, the DCO control code has small variations even the frequency and phase has been locked. In order to reduce jitter, the proposed ADPLL uses an average mechanism to eliminate such non-ideal effects. In the beginning, the ADPLL controller detects the maximum and minimum



Fig. 3.5: Counter-based TDC.

of the DCO control code within 256 reference clock cycles and then takes the average of these two values. The average value will be the average DCO control code (avg\_code [16:0]) for average DCO. Without the tracking noise, the ADPLL will generate a more stable and low-jitter output clock.

# 3.3 The Proposed TDC-Based ADPLL

The locking time is the most critical design specification for fast-locking application. In order to achieve fast locking, the TDC-based ADPLL is proposed and described in this section. In the proposed architecture, the locking procedure is divided into two modes: coarse locking and fine locking. Phase lock starts from coarse locking mode. In this mode, TDC is used to calculate the nearest control code quickly for DCO to produce the desired frequency. Because TDC can convert the input clock period information to multiples of delay time of delay cell, ADPLL controller can take this period information to jump to desired frequency quickly. After the coarse locking mode completed, ADPLL enters fine locking mode to reduce the residual frequency and phase error by binary search algorithm as described in

previous section. As a result, overall lock-in time can be reduced by adding TDC module significantly.

Fig. 3.4 illustrates the proposed TDC-based ADPLL architecture. There are several functional blocks: a TDC, a phase/frequency detector (PFD), an ADPLL controller, a DCO, and two frequency dividers (pre-divider and DCO divider). Through the DCO divider, the signal DCO\_M is the output of DCO divided by M. The Ref\_N comes from reference clock divided by N. Once the ADPLL is enabled, TDC provides the coarse DCO control code (TDC\_code [5:0]) to the ADPLL controller after two reference clock cycles, and then DCO generates the desired frequency output by this coarse DCO control code. After TDC operation is completed, the PFD generates the signal "lead" or "lag" depending on the phase and frequency difference between Ref\_N and DCO\_M. If DCO\_M leads Ref\_N, PFD generates a "lead" signal to slow down the DCO. Conversely, when DCO\_M lags Ref\_N, PFD generates a "lag" signal to speed up the DCO. When the ADPLL controller receives "lead" or "lag" from the PFD, it changes the DCO control code (DCO code [13:0]). And then DCO control code controls the DCO to generate the output clock (DCO\_CLK). These blocks form a close-loop to achieve the "phase-locked" function.

Because the proposed TDC-based ADPLL uses the novel 2-level flash TDC, the coarse locking only takes two input clock cycles. In the fine locking mode, the worst case for lock time of the binary search algorithm [11], in terms of input clock cycle,

$$T_{L} = \left(2 \times \log_2 2^{N}\right) - 1 \tag{3.1}$$





Fig. 3.6: (a) Single delay chain flash TDC. (b) Operation of single delay chain flash TDC.

where  $\ensuremath{T_L}$  is the lock time of fine tuning and N is number of bits of the binary search



Fig. 3.7: Vernier delay line TDC.

control code. In the proposed ADPLL, the DCO control code is 14 bits, as a result, the entire phase locking procedure takes 29 clock cycles including 2 cycles TDC operation and 27 cycles (N=14) for the fine-tuning phase locking.

### 3.4 Time-to-Digital Converter

### 3.4.1 TDC Overview

Time-to-digital converters have been widely used for measurement system, temperature sensor, and communication system [23]-[25]. Because TDC can convert the time information to digital code, it is an essential component for the interface of analog and digital signals. Many approaches have been proposed to implement a TDC [1], [23]-[25]. The counter-based TDC uses a high-frequency clock or multi-phase clock to sample the timing interval and convert to multiples of period of

high-frequency sampling clock as shown in Fig. 3.5 [1]. The design concept of counter-based TDC is very straightforward, but the power consumption is very high due to the high-frequency counter design.

Another approach is the flash TDC that is analogous to flash analog-to-digital converters for voltage amplitude encoding and operate by comparing a signal edge to various reference edges all displaced in time [23], [24]. The elements that compare the input signal to the reference are usually flip-flops. In the single delay chain flash TDC shown in Fig. 3.6 (a), each buffer produces a delay equal to t. Suppose it is desired to determine the period of input clock using the eight buffers converter in Fig. 3.6 (b). Each flip-flop compares the displacement in time of the delayed the first rising edge to the first falling edge of input clock. The thermometer-encoded output indicates the value of delay time of buffer; assuming the flip-flops are given sufficient time to resolve. The drawback to this implementation is that the resolution can not be smaller than a single gate delay. In addition, when the frequency of the input clock is low, it will require numbers of flip-flops and buffers to cover large clock period, leading to suffer large power consumption and hardware cost.

In order to enhance resolution, the flash converter can be constructed with a Vernier delay line as shown in Fig. 3.7 [25]. This architecture achieves a resolution of t1 - t2, where t1 >t2. However, the power and area issues still need to be resolved when the sampled clock with low frequency.

Because the proposed TDC-based ADPLL uses TDC to lock the input clock frequency coarsely, the high resolution is not the design target of the TDC. In contrast,



Fig. 3.8: The proposed 2-level flash TDC architecture.

how to lower the power and circuit complexity of TDC is more important design issue for the fast lock-in ADPLL application.

### 3.4.2 The Proposed 2-Level Flash TDC

As mentioned in the previous subsection, the single level flash TDC needs a large of flip-flops, leading to increase power consumption and design cost. In contrast to single level type, the proposed 2-level flash TDC takes only 12 D-flip-flops (8+4) as shown in Fig. 3.8, thus it has lower hardwire complexity and power consumption. There are several functional blocks, namely a 1<sup>st</sup> level flash TDC, a 2<sup>nd</sup> level flash TDC, a delay selection multiplexer, and a period calculator. The 1<sup>st</sup> level flash TDC consists of 4 large delay cells whose delay time is eight times of small delay cell (8t) and 4 D-flip-flops. In contrast to the 1<sup>st</sup> level flash TDC; the 2<sup>nd</sup> level flash TDC has only 8 small delay cells and D-flip-flops. The small delay cells used in the 1<sup>st</sup> and 2<sup>nd</sup> level flash TDC's remain the same as those for DCO coarse-tuning stage.



Fig. 3.9: Simulation of 2-level flash TDC.

When the TDC is enabled, Ref\_N is sent to the 1<sup>st</sup> level flash TDC, and the input signal will propagate through the 4 large delay cells. When the first falling edge of Ref\_N arrives, the outputs of the large delay cells will be sampled by D-flip-flops and selects one of large delay cell outputs for the 2<sup>nd</sup> level flash TDC. All outputs of D-flip-flops (Q1 [3:0]) are also sent to the thermometer-to-binary converter to generate the 1<sup>st</sup> level flash TDC output (L1\_SEL). Then the 2<sup>nd</sup> level flash TDC generates the delay selection signal (L2\_SEL) based on the sampled delay outputs (Q2 [7:0]). The outputs of the 1<sup>st</sup> and 2<sup>nd</sup> level flash TDC section are thermometer code type that can be used to generate selection signals easily. After both L1\_SEL and L2\_SEL have been generated, the period calculator can estimate the period of Ref\_N based on these values. The conversion equation can be given as

$$Tr = (L1\_SEL \times 8 + L2\_SEL) \times 2$$
(3.2)

where Tr is the period of Ref\_N. For example, as shown in Fig. 3.9, if the period equals to 36 times of delay cell delay time, L1\_SEL and L2\_SEL should be 2 and 2 respectively. In order to reduce lock-in time, the TDC only measures half period of Ref\_N, and the calculated value should be shifted left to obtain the period of Ref\_N. The TDC takes only two reference clock cycles to complete lock-in operation. From



Fig. 3.10: Transient response of binary search ADPLL.

the simulation results with 0.13µm CMOS standard cell library, the TDC resolution equals delay time of one delay cell (165ps), and the frequency error is 3.3% at 200MHz in the lock-in state.

In the proposed TDC-based ADPLL architecture, the frequency of Ref\_N is the same as the frequency of DCO divided by M (DCO\_M) as frequency locked. The delay time of coarse-tuning stage in DCO equals Tr divided by N. In order to reduce the hardware complexity of division, we propose a novel method to approximate this division operation results. This simplified operation can be divided into two steps. First, if the value of division ratio (M) is the power of two, this division operation is only a shift-right operation. If not, we extract the value of power of two of MSB in M (MS) and ML (M+1). Second, the division ratio will be shifted right by MS and ML, and then the TDC output equals the average of these two values (TL and TS). For example, if M=6, MS and ML is 2 and 3 respectively. The average of the shifted



Fig. 3.11: Transient response of TDC-based ADPLL.

value (4 and 9) equals 6. As a result, the division can be completed approximately with small hardware cost.

# 3.5 Experimental Results

The proposed ADPLL's are designed and implemented by 0.13µm CMOS standard cell library and cell-based design flow, thus the proposed architecture is modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Moreover, we also use transistor-level simulator with Hspice to verify the performance of the timing critical circuits including DCO, PFD and TDC. To achieve high performance and low power, the proposed binary search ADPLL and TDC-based ADPLL use the ultra-low power DCO as described in Chapter 2.

The simulation of the binary search ADPLL is shown in Fig. 3.10. The frequency of the reference clock is 20MHz, and the division ratio is 10, thus the frequency of the ADPLL output clock is 200MHz (=20MHz \* 10). When the ADPLL controller

receives the "lead" or "lag" signal from the PFD, the DCO control code will be decreased or increased respectively, and the frequency of DCO will be changed too. In the Fig. 3.10, we can see that either the tracking DCO control code or the average DCO control code will be converged to a stable value and complete the lock function.

Fig. 3.11 shows the transient response of the proposed TDC-based ADPLL, where the reference clock is 20MHz, and the division ratio (M) is 10. Thus the output frequency is 200MHz (=20MHz \* 10). The TDC takes 2 reference clock cycles to complete coarse lock-in operation and 27 cycles to align phase. After TDC operation is completed, the DCO control code is changed by PFD output frequency to generate desired DCO output frequency. As shown in Fig. 3.11, the DCO control code will be converged to a stable value and complete the lock function. The simulation results show the power consumption is 250µW at 200MHz and 1.2V.

### 3.6 Summary

In this chapter, the binary search algorithm and the proposed TDC-based ADPLL have been presented. Because the locking time of TDC-based ADPLL can be reduced to 29 input clock cycles by the novel 2-level flash TDC, it is very suitable for fast lock-in applications. By the 2-level architecture, the hardware cost of the proposed TDC can be saved significantly. In addition, since all designs of the proposed ADPLL are described with HDL language, it can be ported to different processes, making our proposal very suitable for system-level and SoC applications.

## Chapter 4

# All Digital Spread Spectrum Clock Generator Design

### 4.1 Introduction

As the operating frequency of electronic systems increases, the electromagnetic interference (EMI) effect becomes a serious problem especially in consumer electronics, microprocessor (µP) based systems, and data transmission circuits [26]. The radiated emissions of system should be kept below an acceptable level to ensure the functionality and performance of system and adjacent devices [26], [27]. Many approaches have been proposed to reduce EMI, such as shielding box, skew-rate control, and spread spectrum clock generator (SSCG). However, the SSCG has lower hardware cost as compared with other approaches. As a result, the SSCG becomes the most popular solution among EMI reduction techniques for System-On-Chip (SoC) applications [6], [27]-[28].

Recently, different architectural solutions have been developed to implement SSCG. In [28], [29], a triangular modulation scheme which modulates the control voltage of a voltage-controlled oscillator (VCO) is proposed to provide good performance in EMI reduction. However, it requires a large loop filter capacitor to pass modulated signal in the phase-locked loop (PLL), resulting in increasing chip

area or requirement for an off-chip capacitor. Modulation on PLL loop divider is another important SSCG type that utilizes a fractional-N PLL with delta-sigma modulator to spread output frequency changing the divider ratio in PLL [30], [31]. However, fractional-N type SSCG not only needs large loop capacitor to filter the quantization noise from the divider, but also induces the stability issue for the wide frequency spreading ratio applications, especially in PC related applications [31].

In contrast, all-digital SSCG (ADSSCG) [32], [33] does not utilize any passive components and use digital design approaches, making it easily be integrated into digital systems. However, the delay line type ADSSCG [32] does not have the programmable spreading ratio functionality and needs an extra PLL to provide the frequency multiplication function. And the triangular modulation ADSSCG [33] has poor phase tracking capability resulting in loss of lock and stability issues. Moreover, it utilizes a delay non-monotonic digitally controlled oscillator (DCO) that is not suitable for SSCG application. Thus in this chapter, a portable, low-power, and programmable spreading ratio ADSSCG with monotonic DCO is presented.

The proposed ADSSCG employs a novel rescheduling division triangular modulation (RDTM) to enhance the phase tracking capability and provide wide programmable spreading ratio. The proposed low-power DCO with auto-adjustment algorithm saves the power consumption while keeping delay monotonic characteristic. This chapter is organized as follows. Section 4.2 describes the proposed architecture and spread spectrum algorithm of ADSSCG. Section 4.3 focuses on the low-power DCO design and the auto-adjustment algorithm for monotonic delay characteristic. In Section 4.4, the implementation and measurement results of the fabricated ADSSCG chip are presented. Finally, a brief summary is addressed in Section 4.5.

### 4.2 The Proposed ADSSCG Design

#### 4.2.1 ADSSCG Architecture Overview

Fig. 4.1 illustrates the architecture of the proposed ADSSCG. It consists of five major functional blocks: a phase/frequency detector (PFD), an ADSSCG controller, a DCO, and two frequency dividers. The ADSSCG controller consists of a modulation controller, a loop filter, and a DCO code generator (DCG). The ADSSCG can provide the clock signal with or without spread-spectrum function based on the operation mode signal (MODE) setting. In the normal operation mode, the bang-bang PFD detects the phase and frequency difference between FIN M and DCO N. When the loop filter receives LEAD from the PFD, the DCG adds a current search step (S N[15:0]) to the DCO control code, and this decreases the output frequency of the DCO. Oppositely, when the loop filter receives LAG from the PFD, the DCG subtracts the DCO control code to increase the output frequency of the DCO. When PFD output changes from LEAD to LAG or vice versa, the loop filter sends the code-loading signal (LOAD) to DCG to load the baseline code (BASELINE CODE [17:0]) which is averaged DCO control code by the loop filter. Before ADSSCG enters the spread spectrum operation mode, the baseline frequency will be stored as the center frequency. In the spread spectrum operation mode, the modulation controller uses two spreading control signals (SEC\_SEL[2:0] and STEP[2:0]) to generate the add/subtract signal (+/-\_SS) and the spreading step (S\_SS[15:0]) for the DCG, and then it modulates the DCO control code to spread out the DCO output frequency around the center frequency evenly.



Fig. 4.1: Architecture of the proposed ADSSCG.

The system clock of ADSSCG controller is FIN\_M whose operating frequency is limited by ADSSCG's closed-loop response time which is determined by the response time of the DCO, the delay time of the ADSSCG controller, and the frequency divider. Therefore, the period of FIN\_M should not be shorter than the shortest response time to ensure the ADSSCG functionality and performance. In addition, because the frequency of DCO\_N should be the same as FIN\_M after system locking, the frequency of FIN\_M can not be higher than the maximum frequency or lower than the minimum frequency of DCO\_N. As a result, the frequency range of FIN\_M is also limited by the DCO operating range and the divider ratio (N).

### 4.2.2 Spread Spectrum Algorithm

Since triangular modulation is easy to be implemented and has good performance in reduction of radiated emissions, it becomes the major modulation method for SSCG [6], [28]. In triangular modulation, the EMI attenuation depends on the frequency-spreading ratio and center frequency, and it can be formulated as



Fig. 4.2: (a) Conventional triangular modulation. (b) Division triangular modulation. (c) Rescheduling division triangular modulation.

$$A_{dB} = I + J \log(SR/100) + K \log(F_C)$$
 (4.1)

where  $A_{dB}$  is the EMI attenuation, SR is the frequency spreading ratio,  $F_C$  is the center frequency, and I, J, K are modulation parameters [27]. Based on (4.1), under the same center frequency, EMI can be reduced further by increasing spreading ratio.

In addition, under the same spreading ratio, the higher center frequency has better EMI attenuation performance.

Fig. 4.2(a) illustrates the conventional triangular modulation with digital approach [33]. Since the output frequency can be changed by the DCO control code, the output clock frequency can be spread by tuning DCO control codes with triangular modulation within one modulation cycle. In the beginning of the conventional spread spectrum, it will start at center frequency ( $T_c$ ) and take one-fourth of the modulation cycle time to reach the minimum frequency ( $T_{max}$ ), and then takes half of the modulation cycle time to reach the maximum frequency ( $T_{min}$ ). Finally, it will return to the center frequency in the last one-fourth modulation cycle time.

Because the upper half and lower half in the triangle have the same area, as shown in Fig. 4.2(a), the mean frequency of the spreading clock is equal to center frequency and the phase drift will be zero in the end of each modulation cycle. However, in the conventional triangular modulation, the ADSSCG controller can only perform phase and frequency maintenance based on the PFD's output in the end of each modulation cycle. Hence due to the frequency error between reference clock and output clock, reference clock jitter and supply noise, the phase error will be accumulated within one modulation cycle, leading to induce the loss of lock and stability problems.

Thus, in order to enhance phase tracking ability, the division triangular modulation (DTM) is proposed as shown in Fig. 4.2(b). DTM divides one modulation cycle into many sub-sections (for example in Fig. 4.2(b), modulation cycle divides into 16 sub-sections) and updates DCO control code for phase tracking in every 4

sub-sections. As a result, the ADSSCG controller can perform four times phase and frequency maintenance in one modulation cycle when modulation cycle divides into 16 sub-sections. Because DTM can provide the frequency spreading function and keep phase tracking at the same time, it is very suitable for ADSSCG in µP-based system applications. However the disadvantage of DTM is when the frequency changes to different sub-sections; it will induce large DCO control code fluctuations (7S) as shown in Fig. 4.2(b), where S is the spreading step of DCO control code in spreading modulation.

In order to reduce the peak-to-peak value of DCO control code changing in DTM, the rescheduling DTM (RDTM) is proposed as shown in Fig. 4.2(c). By reordering the sub-sections in DTM, the peak-to-peak value of DCO control code changing can be reduced to 5S. As a result, the peak-to-peak value of cycle-to-cycle jitter can be reduced while the period jitter is kept the same. Compared with DTM, the reduction ratio of peak-to-peak jitter by RDTM is related with number of sub-section, and it can be formulated as

$$JR = \frac{((COUNT/2)-1)-((COUNT/4)+1)}{(COUNT/2)-1} \times 100\%$$
 (4.2)

where JR is the jitter reduction ratio, COUNT is number of sub-sections. For example, if there are 16 sub-sections, the jitter reduction ratio is 29% ((7-5)/7), and if the number of sub-section is 32, the jitter reduction ratio is 40% ((15-9)/15). Although the order of sub-sections of DTM is rescheduled by RDTM to reduce the peak cycle-to-cycle jitter, the average cycle-to-cycle jitter still keeps the same as DTM. Besides, because the phase drift of the opposite direction in DTM and RDTM remains the same, the equivalent phase drift is zero. As a result, it will not induce an extra

Table 4.1: Jitter and Timing Comparisons of DTM and RDTM

|                                | DTM                 | RDTM                |  |
|--------------------------------|---------------------|---------------------|--|
| Positive Phase Drift:          | 1+3+5+7=16          | 1+3+5+7=16          |  |
| Upper Half Area (*)            | 1+3+3+7=10          |                     |  |
| Negative Phase Drift:          | 1+3+5+7=16          | 7+5+3+1=16          |  |
| Lower Half Area (*)            | 1+3+3+7=10          |                     |  |
| Avaraga                        | (1+1+1+2+1+3+1+4+   | (1+4+1+5+1+4+1+5+   |  |
| Average                        | 1+5+1+6+1+7+1+4)/16 | 1+4+1+5+1+4+1+1)/16 |  |
| Cycle-to-Cycle Jitter (S)      | =40/16=2.5          | = 40/16=2.5         |  |
| Peak Cycle-to-Cycle Jitter (S) | 7                   | 5                   |  |

<sup>\*:</sup> S times Period of Sub-Section

phase drift while the mean frequency remains the same. The results of frequency spread of DTM and RDTM are the same as the conventional triangular modulation. Table 4.1 summarizes the jitter and timing comparisons of DTM and RDTM with 16 sub-sections within one modulation cycle.

With two control signals, spreading step (S) and number of sub-sections (COUNT), the proposed RDTM can provide a flexible spreading ratio for different system requirements. Spreading step is the difference of DCO control code between two consecutive sub-sections. Number of sub-sections determines how many sub-sections in one modulation cycle. *COUNT* and *S* decoded from *SEC\_SEL* and *STEP* by the modulation controller, respectively. Based on the definitions, the frequency-spreading ratio equation can be given as

$$SR = (S \times RES \times COUNT / 2) / T_C \times 100\%$$
 (4.3)

where SR is the spreading ratio, RES is the finest time resolution of DCO, and  $T_C$  is the center period of DCO output clock. As a result, the frequency-spreading ratio of the proposed ADSSCG can be specified by the control signals easily.

### 4.3 DCO Design

### 4.3.1 DCO Architecture

Because digitally controlled oscillator (DCO) occupies over 50% power consumption in all-digital clocking circuits, the proposed ADSSCG utilizes the proposed low-power DCO structure as described in Chapter 2 to reduce overall power consumption [34]. To achieve the high portability of the proposed ADSSCG, all components in this ADSSCG including DCO are implemented with standard cells.



Fig. 4.3: (a) Architecture of the proposed DCO. (b) Fine-tuning cells of DCO

Table 4.2: Simulation Results of Delay of Tuning Stage

|                    | Coarse-Tuning | 1 <sup>st</sup> Fine-Tuning | 2 <sup>nd</sup> Fine-Tuning | 3 <sup>rd</sup> Fine-Tuning |
|--------------------|---------------|-----------------------------|-----------------------------|-----------------------------|
| Controllable Delay | 61812         | 308.45                      | 121.58                      | 7.73                        |
| Range (ps)         | 01012         |                             |                             |                             |
| Finest Delay       | 242.44        | 102.82                      | 3.92                        | 1.1                         |
| Step (ps)          | 242.41        |                             |                             |                             |

Fig. 4.3(a) illustrates the architecture of the proposed low-power DCO which employs cascading structure for one coarse-tuning and three fine-tuning stages to achieve a fine frequency resolution and wide operation range. As the number of delay cell in the coarse-tuning stage increases, leading to have a longer propagation delay, the operating frequency of DCO becomes lower. The shortest delay path that consists of one NAND gate, one path MUX of coarse-tuning stage, and fine-tuning stage at the minimum delay determines the highest operation frequency of DCO. There are 2<sup>C</sup> different delay paths in the coarse-tuning stage and only one path is selected by the 2<sup>C</sup>-to-1 path selector MUX controlled by C-bit DCO control code. The coarse-tuning delay cell utilizes a two-input AND gate which can be disabled when the DCO operates at high frequency to save power. In order to increase the frequency resolution of DCO, the three fine-tuning stages which are controlled by F-bit DCO control code are added into the DCO design. The 1<sup>st</sup> fine-tuning stage is composed of X hysteresis delay cells (HDC), and each of which contains one inverter and one tri-state inverter as shown in Fig. 4.3(b). When the tri-state inverter in HDC is enabled, the output signal of enabled tri-state inverter has the hysteresis phenomenon to increase delay [34]. Different digitally controlled varactors (DCV's) are exploited in the 2<sup>nd</sup> and 3<sup>rd</sup> fine-tuning stages to further improve the overall resolution of DCO as shown in Fig.

4.3(b). The operation concept of DCV is to control the gate capacitance of logic gate with enable signal state to adjust the delay time. The 2<sup>nd</sup> and 3<sup>rd</sup> fine-tuning stages employ Y long-delay DCV cells and Z short-delay DCV cells respectively. Since the HDC can replace many DCV cells to obtain wider operation range, the number of delay cells connected with each driving buffer and loading capacitance can be reduced, leading to save power consumption and gate count as well.

Based on an in-house μP-based system for liquid crystal display (LCD) controller applications [35], the requested operating frequency is from 27MHz to 54MHz. Thus the design parameters of the proposed DCO are determined as follows: C=8, F=10, X=4, Y=32 and Z=8. Table 4.2 shows controllable delay range and the finest delay step of different tuning stages in the proposed DCO under typical case (typical corner, 1.8V, 25°C). It should be noted that the controllable delay range of each stage is larger than the finest delay step of the previous stage. As a result, the cascading DCO structure does not have any dead zone larger than the LSB resolution of DCO. Since the finest delay step of the 3<sup>rd</sup> fine-tuning stage determines the overall resolution, the proposed DCO can achieve resolution up to 1.1ps.

### 4.3.2 Auto-Adjustment Algorithm for Monotonic DCO

As mentioned in the previous section, the DCO control code will be changed to obtain the different output periods in the spread spectrum applications, thus the monotonic characteristic of DCO is very important. Because the controllable delay range of each stage must be larger than the finest delay step of the previous stage, non-monotonic problem will occur when DCO code switches at the boundary of different tuning stages. To eliminate such non-ideal effects, an adjustable algorithm



Fig. 4.4: Flowchart of auto-adjustment algorithm.

for boundary code switching is proposed. Fig. 4.4 is the flowchart of the proposed algorithm. When the DCO code crosses the boundary of different tuning stages, the DCO code will be adjusted by the ADSSCG controller to eliminate the non-monotonic issue automatically. If DCO code changes across boundary of different tuning stages, the original code will add or subtract the extra compensation code to reduce the delay difference caused by tuning stages switching. According to



Fig. 4.5: Comparison between original and adjusted timing.



Fig. 4.6: Microphotograph of ADSSCG test chip.

simulation the results of proposed DCO under different our process-voltage-temperature (PVT) conditions, the extra compensation code of across coarse/1<sup>st</sup> fine, 1<sup>st</sup> /2<sup>nd</sup> fine, and 2<sup>nd</sup> /3<sup>rd</sup> fine-tuning stage can be defined as 320, 48, and 4 respectively. For example, when the last four bits of DCO code (including one bit for 2<sup>nd</sup> fine-tuning stage and last three bits for 3<sup>rd</sup> fine-tuning stage) changes from (0111)<sub>2</sub> to (1000)<sub>2</sub>, the delay should increase 1.1ps ideally, but it decreases 3.78ps (from 7.7ps to 3.92ps which is the delay of one 2<sup>nd</sup> fine-tuning cell) instead. Based on the auto-adjustment algorithm, the code will be adjusted from (1000)<sub>2</sub> to (1100)<sub>2</sub>. As a result, the delay will increase 0.62ps, leading to operate in a monotonic way as shown in Fig. 4.5.

### 4.4 Experimental Results and Comparisons

Based on the requested operating frequency for an in-house  $\mu P$ -based system and LCD controller [35] applications, the proposed ADSSCG should generate output clock ranges from 27MHz to 54MHz. The proposed ADSSCG is designed and



Fig. 4.7: Measurement spectrum of 54MHz (a) Without frequency spreading (b) With 1% frequency spreading.



Fig. 4.8: Measurement spectrum of 27MHz (a) Without frequency spreading (b) With 10% frequency spreading.

implemented by cell-based design flow, thus the proposed architecture and spread spectrum algorithm are modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Moreover, we also use transistor-level simulator with Hspice to verify the DCO performance. Because the

proposed ADSSCG is implemented with standard cells, the physical layout is generated by the auto placement and routing (APR) tool.

A test chip has been fabricated in 0.18µm 1P6M CMOS process with area of 0.156mm<sup>2</sup>, where the chip microphotograph is shown in Fig. 4.6. The ADSSCG output signal is measured using Agilent E4440A spectrum analyzer at 1.8V/25°C to test the performance. The input clock frequency is from 13.5MHz to 27MHz. The total current consumption is 0.69mA at frequency of 54MHz. Fig. 4.7 shows the reduction of peak power is 9.5dB at 54MHz with 1% of spreading ratio, and the reduction of peak power is 15dB at 27MHz with 10% of spreading ratio is shown as Fig. 4.8. Figs. 4.7 and 4.8 shows the EMI can be reduced at the maximum and minimum operation frequency of the proposed design, respectively. Because RDTM is a kind of the triangular modulation, some peaks are happened in spectrum [27]. For the complex digital application in our system chip, ADSSCG operates under dirty power supply environment in the spread-spectrum operation mode, hence it increases noise floor of spread-spectrum operation mode as shown in Fig. 4.7(b) and 4.8(b), and the measured rms jitter is 94ps at 54MHz with frequency spreading. Besides, because the discrete modulation has wide frequency distribution, it also induces large jitter and has high noise floor. There are several solutions to reduce the high noise floor issue and jitter. First, in the system integration, the power supply for ADSSCG and other modules should separate to maintain a clear environment for the timing critical circuits. In addition, the ADSSCG should have higher immunity for dirty power supply environment. Second, the modulation algorithm should change frequency smoothly to avoid the large frequency jump and provide the pure frequency of clock output. Third, the resolution and monotonicity of DCO should be further improved to enhance the performance and reduce jitter.

Table 4.3: SSCG Performance Comparisons

| Performance Indices    | Proposed          | JSSC'03 [28]         | TCASI'08 [29]            | ISSCC'05 [30]         | JSSC'07 [32]         |
|------------------------|-------------------|----------------------|--------------------------|-----------------------|----------------------|
| Process                | 0.18μm CMOS       | 0.35μm CMOS          | $0.35\mu\mathrm{m}$ CMOS | $0.18\mu$ m CMOS      | 0.15µm CMOS          |
| Design Approach        | All-Digital       | Analog               | Analog                   | Analog                | All-Digital          |
| Modulation Type        | Modulation on DCO | Modulation on VCO    | Modulation on VCO        | Modulation on Divider | Delay Line (2)       |
| Application            | μP-based system/  |                      | $\mu$ P-based system     | SATA I                | DVD Player           |
|                        | LCD Controller    | μP-based system      |                          |                       |                      |
| Output Frequency (MHz) | 27 ~ 54           | 66/133/266           | 50 ~ 480                 | 1500                  | 27                   |
| Spreading Ratio (%)    | User-Defined (1)  | 0.5, 1, 1.5, 2, 2.5  | 0.5 ~ 2                  | 0.5                   | 3                    |
| EMI Reduction (dB)     | 15 @10%, 27MHz    |                      | 16.6 @1.5%, 400MHz       | 9.8                   | 13                   |
|                        | 9.5 @1%, 54MHz    | 4 @2.5%, 266MHz      |                          |                       |                      |
| Power Consumption (mW) | 1.2 (@54MHz)      | 300 (@266MHz)        | 27.5 (@400MHz)           | 77 (@1.5GHz)          | 7.1 (@27MHz)         |
| Power Index (µW/MHz)   | 22.2              | 1127.8               | 68.8                     | 51.3                  | 263                  |
| Area (mm²)             | 0.156             | 2.01 (Excluding loop | 0.55                     | 0.21                  | 0.06 (Excluding PLL) |
|                        | 0.156             | filter)              | 0.66                     | 0.31                  |                      |
| Portability            | Yes               | No                   | No                       | No                    | No                   |

(1) Based on timing constraint of system application. (2) Needs an extra PLL.

Table 4.3 lists comparison results with the state-of-the-art SSCGs for clock generation applications. Based on the power index comparison, it is clear that the proposed ADSSCG can provide better power-to-frequency ratio, implying the proposed ADSSCG is more effective in power saving for a given operating frequency. In addition, since the proposed architecture is very simple and without passive components, it can achieve low-complexity and small-area compared with other SSCG designs. Although [32] occupies smaller area, it needs an extra PLL to provide the frequency multiplication function, and it can only provide the fixed frequency spreading ratio. Furthermore, since the proposed ADSSCG can be implemented with standard cells, it has a good portability and very suitable for SoC integration as compared with [28]-[30]. As a result the proposed ADSSCG has the benefits of better power consumption, programmable spreading ratio, area, and portability.

## 4.5 Summary

In this chapter, we proposed a portable, low power, and area-efficient ADSSCG with programmable spreading ratio for SoC applications. Based on the proposed RDTM, the spreading ratio can be specified flexibly by application demands while keeping the phase tracking capability. With the proposed low-power DCO, the overall power consumption can be saved. The proposed auto-adjustment algorithm can maintain the monotonic characteristic of DCO. Measurement results show the proposed ADSSCG can achieve 9.5 dB EMI reductions with 1% frequency-spreading ratio and 1.2mW at frequency of 54MHz. As a result, our proposal achieves less power consumption and area with competitive EMI reductions. Moreover, because the proposed ADSSCG has a good portability as a soft intellectual property (IP), it is very suitable for SoC applications as well as system-level integration.

# Chapter 5

# All Digital Delay-Locked Loop Design

### 5.1 Introduction

In this chapter, a fast-lock and portable all-digital delay-locked loop (ADDLL) with 90° phase shift and a digitally-controlled phase shifter (DCPS) for DDR interface applications is presented. As the operating frequency of electronic systems increases, double data rate (DDR) memories have been widely used for memory performance enhancement and high-speed data transmission between microprocessors and memory devices. Fig. 5.1(a) illustrates the interconnection of the DDR memory and core system. The data transfers are based on the bidirectional differential or single-ended data strobe (DQS) that is transmitted along with data (DQ) for capture [7]. In the read operation, DQS is transmitted edge-aligned with DQ by the DDR memory, and then delayed by 90° phase shift to the center of the data period to enlarge the effective data capture window in the DDR controller. However, the effective data valid window will be reduced by delay mismatching between DQS and DQ from interconnection of multi-chip as shown in Fig. 5.1(b). In contrast to the read operation, DQS is center-aligned with DQ by the controller and transmitted to the memory in the write operation. However, the effective data valid window will be reduced and the maximum attainable frequency will be further limited by delay mismatching from interconnection of multi-chip even DQS has been delayed by 90° phase shift in the controller before transmitted as shown in Fig. 5.1(c). As a result, the







Fig. 5.1: (a) Interconnection of DDR memory and core system. (b) Waveform of read operation. (c) Waveform of write operation.

phase shift of DQS should be a suitable value instead of the fixed 90° by DDR controller to reach the center of DQ period both in the read and write operation. Thus, DDR controller should have the tunable phase-shift capability to eliminate the non-ideal effect of data transmission between multi-chip interconnections especially in high data rate applications.

Many delay-locked loops (DLL's) and phase shifters have been proposed for a clock generator which can provide the fixed 90° phase-shift clock or control signal required to transfer data correctly in the high-speed DDR memory controller [36]-[40]. The DLL generates an output clock aligned with input clock and provides the control signal for the phase shifter of DQS. In the physical implementation, the phase shifters may have long distance from DLL. The digitally-controlled phase shifter (DCPS), controlled by digital control signal, is more suitable for high-performance DDR controller applications, because the digital control signal is more robust when it has long path propagation. Thus, many all-digital DLL's (ADDLL's) providing the digital control code for the DCPS have been proposed [37]-[39]. However, the phase of these DCPS outputs are not tuned when the ADDLL is locked. Thus, these designs have low immunity to against the non-ideal effect of data transmission between multi-chip interconnections. In addition, these ADDLL's take long locking time, implying that they are not suitable for the low-power DDR controller whose clock signals should be generated in a short time when the controller switches from power-down to active mode. Besides, due to the speed limitation of delay line, a multi-cycle shifting scheme is proposed [37] to generate the phase-shift clock signal, however it is not suitable for the non-periodic DQS.

In this chapter, a tunable phase shift scheme based on a fast lock-in ADDLL and a tunable digitally-controlled phase shifter (DCPS) for high data rate interconnection applications are presented. The proposed ADDLL uses the reference clock to establish the timing information and DCPSs provide the suitable phase adjustment of non-periodic control signals to obtain a large data capture window. The proposed ADDLL utilizes a time-to-digital converter (TDC) to reduce locking time and avoid the harmonic lock problem. A high-performance digitally-controlled delay line

(DCDL) is also included to achieve high speed and keep high delay resolution to generate 90° phase-shift clock signal with small phase-shift error and single-cycle shifting scheme. The proposed DCPS provides the tunable phase adjustment of DQS for DDR interface where precise control is the key to achieve reliable high-performance operation. Furthermore, the proposed ADDLL and DCPS use cell-based design approach, making it easily be integrated into digital system and ported to different processes as a soft IP.

This chapter is organized as follows. Section 5.2 describes the proposed tunable phase shift scheme based on a fast-lock and portable ADDLL and a DCPS for DDR interface applications. Section 5.3 focuses on the proposed DCDL and TDC circuit design. In Section 5.4, the experimental results and performance comparisons of the proposed design are presented. Finally, a brief summary is given in Section 5.5.

## 5.2 The proposed Clock Generator Architecture

### 5.2.1 Tunable Phase Shift Scheme

Fig. 5.2 illustrates the architecture of the proposed tunable phase shift scheme for DDR controller that consists of four major functional blocks: a phase controller, an ADDLL, and two DCPSs. After ADDLL is locked, it provides two clock signals: CLOCK1 (phase aligned with input clock) and CLOCK2 (90° delayed with input clock), and the DLL control code (DLL\_CTRL) for phase controller [40]. If DCPS uses the DLL\_CTRL without any adjustment, it will generate delayed DQS with 90°



Fig. 5.2: Architecture of the proposed tunable phase shift scheme for DDR controller.

phase shift which is the same as CLOCK2 in ADDLL. In the beginning of the tunable phase scheme, the phase adjustment codes of read/write DQS (DQS\_R\_ADJ/DQS\_W\_ADJ) will be set to zero, implying the phase shift of DQS is 90°. Then the core system will enter the test mode to access DDR memory through the DDR controller to verify the functionality and performance of the clock and signal generators in DDR controller. If the core system has detected that DDR memory system fails to meet performance specification, the control code of read/write DQS (DQS\_R\_CTRL/DQS\_W\_CTRL) will be increased or decreased sequentially by the phase adjustment codes to generate the suitable phase shift of the delayed read/write DQS (DQSD\_R/DQSD\_W) to compensate the delay mismatching by interconnection between DDR memory and core system. The flowchart of the tunable phase shift scheme is shown in Fig. 5.3.

### 5.2.2 The Proposed ADDLL and DCPS



Fig. 5.3: Flowchart of the proposed tunable phase shift scheme.

The architecture of the proposed ADDLL which consists of five major functional blocks: a TDC, a DCDL, a phase detector (PD), an ADDLL controller, and a control code decoder as shown in Fig. 5.4(a). The locking procedure is divided into two steps: coarse locking by TDC and fine locking by the binary search algorithm. In the beginning, ADDLL resets and TDC takes four clock cycles to generate TDC control code to determine the coarse controlling code of DCDL for the output clock signal (P360) which is delayed by one clock period approximately. After coarse locking, DCDL control code will be fine tuned by ADDLL controller based on UP/DN from PD to control the delay of DCDL to align phase between CLK\_IN and P360. The worst case for lock time of the binary search algorithm [11], in terms of input clock cycle, is

$$T_F = (2 \times \log_2 2^N) - 1 \tag{5.1}$$

where  $T_F$  is the lock time of fine tuning and N is number of bits of the binary search control code. Because the total number of bits of the fine-tuning control code is 5, the





(b)

Fig. 5.4: Architecture of (a) ADDLL (b) DCPS.

entire phase locking procedure takes 13 clock cycles including 4 cycles for ADDLL reset and TDC operation and 9 cycles (N=5) for the fine-tuning phase locking. In addition, control code decoder converts the DCDL control code from binary to thermal format, owing to the requirement for high-resolution DCDL structure.

Fig. 5.4(b) illustrates the structure of the proposed DCPS including one decoder and one DCDL which are the same as the design in ADDLL. Because the delay of the proposed DCPS is tunable with high-resolution delay step, it can be delayed more or less than 90° depending on phase adjustment setting by the system timing demand.





Fig. 5.5: (a) Proposed DCDL. (b) Coarse-tuning stage. (c) Fine-tuning stage.

# 5.3 ADDLL Circuit Design

### 5.3.1 Digitally Controlled Delay Line

According to the requirements of ADDLL, it has to provide 4-phase clock signal with equal delay space within single input cycle. Thus, the design challenge of the



Fig. 5.6: (a) Proposed TDC. (b) Waveform of TDC.

delay line in ADDLL is to achieve high delay resolution and high speed at the same time [37]. The proposed DCDL has four duplicated delay stages, and each of which has one coarse-delay stage (CDS) and one fine-delay stage (FDS) as shown in Fig. 5.5(a). The minimum delay of each delay stage should be shorter than 1/4 of clock period to provide 90° phase-shift signal within the same clock cycle. The proposed DCDL employs this cascade-stage structure to achieve high delay resolution and high speed at the same time [34]. Each CDS has 16 coarse-delay cells (CDCs), consisting of one buffer and one multiplexer, and the coarse-tuning control code (C[15:0])





Fig. 5.7: Layout of ADDLL and DCPS.

selects the propagation paths from CDCs [41]. The intrinsic delay of CDS is only the gate delay of one multiplexer and interconnect delay as shown in Fig. 5.5(b).

In order to achieve better delay resolution, a hysteresis delay cell (HDC) and 16 digitally controlled varactors (DCV's) are added as shown in Fig. 5.5(c). When the tri-state inverter of the HDC is enabled (F[0] is high), output signal of the enabled tri-state inverter has the hysteresis phenomenon in the transition state to produce different delay times. The gate capacitance of a DCV can be changed slightly by the fine-tuning control code (F[16:1]) to obtain high delay resolution in FDS. Because a tri-state holder cell can provide larger delay than a DCV, it can replace many DCV's to reduce power consumption and the intrinsic, ensuring that the delay range of FDS covers the minimum delay time of CDC to keep the dead zone less than the delay resolution of FDS. As a result, the overall intrinsic delay of DCDL can be reduced by CDC and tri-state holder. The simulation results show that the minimum delay resolution of one FDS is 4ps; hence the total delay resolution of DCDL is 16ps. In order to enlarge the phase-shift range of DCPS, the gain of control code of DCPS is four, thus the minimum tuning delay of DCPS is 16ps.



Fig. 5.8: (a) Transient response of ADDLL. (b) ADDLL at steady state.

### 5.3.2 Time-to-Digital Converter

Fig. 5.6(a) illustrates the architecture of the proposed TDC. The period of input clock is quantized by 4 CDCs and converted to TDC control code (TDC\_CODE) as shown in Fig. 5.6(b). Pulse\_Start and Pulse\_End rises at the first and second rising edge of input clock respectively. The dummy intrinsic delay chain that contains 4 FDSs with minimum delay and one multiplexer is the same as the minimum delay path of DCDL. Because the total delay of DCDL consists of the intrinsic delay and the tunable delay cell delay, Pulse\_Start will pass through the dummy intrinsic delay chain in the front of the CDC chain and then the delay between delayed Pulse\_Start (Pulse\_Start\_D) and Pulse\_End will be quantized by 4 CDCs and converted to TDC



(a)



Fig. 5.9: Tunable signal phase scheme in read operation when (a) DQS leads DQ. (b) DQS lags DQ.

control code. As a result, the intrinsic delay effect can be removed to improve the precision of quantization and conversion. Additionally, Pulse\_Start and Pulse\_End only toggle once after system is reset.

## 5.4 Experimental Results and Comparisons

The proposed design is implemented by  $0.13\mu m$  CMOS standard library where the layout of ADDLL and DCPS is shown in Fig. 5.7, and area of ADDLL and DCPS



Fig. 5.10: Phase shift between CLOCK1 and CLOCK2 at 400MHz.

is 0.026mm<sup>2</sup> and 0.01mm<sup>2</sup> respectively. The proposed ADDLL and DCPS are designed and implemented by cell-based design flow, thus the proposed architecture and lock-in algorithm are modeled in Hardware Description Language (HDL) and functionally verified using NC-Verilog simulator. Fig. 5.8(a) shows the locking procedure of ADDLL after system is reset. The entire phase locking procedure takes 13 clock cycles. Fig. 5.8(b) shows the proposed ADDLL at steady state. When ADDLL is locked, the generated 4-phase clock signals reach equal space in one input clock period. Thus the phase shift between P90 and P360 is 1/4 clock period.

The proposed designs have been verified by HSPICE post-layout simulation with 1.2V. The simulation results of the proposed tunable phase shift scheme show the delayed DQS (DQS\_D) can be adjusted to approach the center of DQ period when DQS leads or lags DQ, as a result, it can eliminate the mismatching delay from interconnection of multi-chip as shown in Fig. 5.9. The tunable range of phase shift is from -600ps to +400ps. For DDR2 400/800 applications, the operation range of the proposed ADDLL is from 200MHz to 400MHz, and the simulation results show that



Fig. 5.11: Jitter and phase shift of ADDLL under different PVT.

the total power consumption is 5.5mW and peak-to-peak period jitter is 20ps at 400MHz. The phase difference between CLOCK1 (P360) and CLOCK2 (P90) is 634ps at 400MHz, hence the phase-shift error is 1.3° (compared with 90°) as shown in Fig. 5.10. Fig. 5.11 shows the phase shift and peak-to-peak period jitter of ADDLL under different PVT and input clock frequency. Table 5.1 lists comparison results with the state-of-the-art ADDLLs for clock generation in DDR controller applications. The proposed ADDLL has the shortest locking time, the smallest phase-shift error, and the lowest power consumption compared with other ADDLL designs. Furthermore, the proposed ADDLL not only has good portability, but also provides the 90° phase-shift clock within the same clock cycle.

## 5.5 Summary

In this chapter, a tunable phase shift scheme based on a fast-lock portable ADDLL and a tunable DCPS for the timing block of DDR interface solution is presented. The proposed ADDLL that employs the high-performance DCDL and

Table 5.1: ADDLL Performance Comparisons

| Performance Indices       | Proposed ADDLL [40] | VLSI-DAT'06 [37] | CICC'07 [38] | E.LETTERS'08 [39] |  |
|---------------------------|---------------------|------------------|--------------|-------------------|--|
| Process                   | 0.13μm CMOS         | 0.13μm CMOS      | 0.13μm CMOS  | 0.18μm CMOS       |  |
| Supply Voltage (V)        | 1.2                 | 1.2              | 1.2          | 1.8               |  |
| Lock Time (clock cycles)  | 13                  | NA               | 40           | < 80              |  |
| Operation Range (MHz)     | 200 ~ 400           | 100 ~ 200        | 333.5 ~ 800  | 510 ~ 1100        |  |
| P2P Jitter (ps)           | 20 @400MHz          | 950 @100MHz      | 40 @800MHz   | 20.4 @800MHz      |  |
| Phase Error (degrees)     | 1.3                 | 5.47 (7.6%)      | 2            | NA                |  |
| Power Consumption (mW)    | 5.5 @400MHz         | 9 @200MHz        | 19.2 @800MHz | 12 @800MHz        |  |
| Phase Shift within Single | V                   | N.               | V            | V                 |  |
| Cycle                     | Yes                 | No               | Yes          | Yes               |  |
| Portability               | Yes                 | Yes              | No           | No                |  |

TDC can achieve fast phase lock and keep small phase-shift error compared with other ADDLLs. The proposed phase shift scheme provides an all-digital and suitable solution to eliminate the non-ideal effect of data transmission between multi-chip interconnections especially for high data rate interconnection applications.

# Chapter 6

# All Digital Synchronous Mirror Delay Design

### 6.1 Introduction

As the operating frequency of electronic systems increases, de-skew clock circuits have been widely used for clock synchronization in System-on-Chip (SoC) applications. Synchronous mirror delay (SMD) is composed of a clock driver for driving the large clock loading on the chip and a skew-compensation circuit for compensating the clock skew induced by the clock driver. In contrast to phase-locked loop (PLL) and delay-locked loop (DLL), SMD is more suitable for the applications that require fast locking and low power consumption, because of its simple circuit structure [9], [42]-[52]. However, the static phase error between input and output clock is hard to reduce in the conventional SMD, owing to the low delay resolution.

Many SMD's have been proposed to reduce the static error including an interleaved type that utilized an interleaving scheme that reduced the static phase error, but had to pay the penalty of increased circuit complexity and power consumption [44], [45]. The successive approximation register (SAR) SMD utilizes phase blender to improve the delay resolution, however, it takes long lock-in time [47]. Besides, the conventional SMD accepts only the pulsed clock signal to ensure the functionality,

Table 6.1: Comparisons of Different SMD Approaches

| Performance Indices | Interleaved SMD<br>[44], [45] | SAR SMD [47] | Arbitrary Duty Cycle<br>SMD [48], [49] |
|---------------------|-------------------------------|--------------|----------------------------------------|
| Static Phase Error  | Large                         | Small        | Large                                  |
| Lock-In Time        | Short                         | Long         | Short                                  |
| Duty Cycle Range    | Narrow                        | Narrow       | Wide                                   |
| Power/Complexity    | High                          | Medium       | Medium                                 |

implying the input clock needs to be modulated if duty cycle is not suitable. An arbitrary duty cycle SMD [48], [49] can accept wide input duty cycle range, but it may occur signal conflict when the high frequency clock propagates through the long delay line. The brief summary of the different SMD approaches is listed in Table 6.1.

In this chapter, the proposed all-digital SMD (ADSMD) utilizes the edge-trigger mirror delay cell (EMDC) and blocking edge-trigger scheme to increase the input duty cycle range and avoid the signal conflict. Furthermore, the proposed fine-tuning delay line (FTDL) and delay-matching structure can reduce the overall static phase error. As a result, the proposed ADSMD not only can achieve the wide input duty cycle range but also keep the small phase error at the same time [52].

This chapter is organized as follows. Section 6.2 describes the basic concept and operation of the conventional SMD. The proposed ADSMD architecture and circuit design including delay-matching structure, blocking edge-trigger scheme, EMDC, and



Fig. 6.1: Architecture of the conventional SMD.

FTDL are described in Section 6.3. In Section 6.4, the experimental results of the proposed design are presented. Finally, a brief summary is addressed in Section 6.5.

### 6.2 SMD Overview

The schematic diagram of the conventional SMD is shown in Fig. 6.1. It consists of an input buffer (IB) with delay Td1, a clock driver (CD) with delay Td2, a forward delay line (FDL), a backward delay line (BDL), and a mirror control circuit (MCC). A pulsed clock propagates forward for the time of Tck - Td1 - Td2 through the FDL, and then propagates backward through the BDL as the opposite direction of FDL, where Tck is the input clock cycle time. As a result, the total delay time is Td1 + (Td1 + Td2) + (Tck - Td1 - Td2) + (Tck - Td1 - Td2) + Td2 = 2Tck. In order that the NAND type mirror delay cell (MDC) in MCC can perform accurately, the input clock should be modulated to narrow-pulse clock to ensure the two inputs of MDC will not be



Fig. 6.2: (a) Architecture of the proposed SMD (b) Circuit of EMDC

overlapped at logic high within the first input clock cycle. The accuracy of phase alignment of SMD is dominated by the delay resolution of delay cell in FDL and BDL. Besides, because the gate delay of MDC is neglected in the delay formula, it will further increase the phase error of SMD after two clock cycles.



Fig. 6.3: Block diagram and equivalent circuit of DCV.

## 6.3 The Proposed ADSMD Design

Fig. 6.2(a) illustrates the architecture of the proposed ADSMD which consists of several major functional blocks: a dummy delay line (DDL), a FDL, a MCC, a BDL, a FTDL, a phase detector, and a timing controller, and the circuit of EMDC is shown in Fig. 6.2(b) [52]. As compared with the conventional SMD, a DDL of the proposed delay-matching structure SMD contains an EMDC and a FTDL to compensate the delay of EMDC and FTDL. As a result, the total delay time is Td1 + (Td1 + Td2 + Td3 + Td4) + (Tck - Td1 - Td2 - Td3 - Td4) + Td2 + (Tck - Td1 - Td2 - Td3 - Td4) + Td3 + Td4 = 2Tck. The locking procedure is divided into coarse and fine locking. The coarse locking takes two clock cycles as the same as the conventional design, and the maximum phase error is the delay resolution of FDL and BDL. The remaining phase error is further reduced by FTDL controlled by 3-bit fine-tuning control code (FTC). In the fine locking, the FTC is changed every two clock cycles by the timing controller based on UP/DN from phase detector to control the delay of FTDL to align



Fig. 6.4: Timing waveform (a) without blocking scheme (b) with blocking scheme.

phase between external clock (EXT\_CLK) and internal clock (INT\_CLK). As a result, the entire locking procedure takes 10 clock cycles  $(2 + 2 \times 4)$ .

Typically, the delay resolution of FDL is one AND gate delay which is about several hundred picoseconds depending on the technology. In order to achieve high delay resolution, the proposed FTDL employs a digitally-controlled varactor (DCV)



Fig. 6.5: Microphotography of SMD test chip.

whose gate capacitance can be changed slightly by the FTC to change the delay of FTDL under different output loading of the driving buffer as shown in Fig. 6.3 [34]. As a result, the overall delay resolution of SMD can be improved from several hundred picoseconds to ten picoseconds.

To increase the input duty cycle range, the proposed SMD utilizes the EMDC to detect the level changing of the outputs of the successive delay cells in FDL [49]. However, based on the system requirements, the length of the FDL and BDL may need to increase to achieve the wide operating frequency range. But, it will induce more than one output of the EMDCs at logic low as the high-frequency clock propagates through the long FDL, implying SMD operation is unstable as shown in Fig. 6.4(a). The proposed blocking edge-trigger scheme uses the blocking signal (BLK), which is set to low level at the second rising edge of IB\_OUT to block the clock propagation in FDL to avoid the signal conflict in MCC and ensure the SMD functionality as shown in Fig. 6.4(b).



Fig. 6.6: (a) Timing diagram of the proposed SMD (b) Acceptable Input duty cycle under different frequencies.

## 6.4 Experimental Results

A test chip of the proposed SMD has been fabricated in 0.18µm CMOS process, where chip microphotography is shown in Fig. 6.5. The proposed design is verified by post-layout simulation using HSPICE. Fig. 6.6(a) shows the entire locking process takes ten clock cycles, and the total propagation delay of SMD is adjusted by the FTC every two clock cycles, making the phase error reduced to 15ps at 400MHz. Table 6.2 lists the verification results of phase error under different PVT conditions and input

clock frequencies. The proposed SMD can accept wide input duty cycle from 20% to 80% at different input clock frequencies as shown in Fig. 6.6(b). The performance characteristics of the proposed SMD are summarized in Table 6.3.

## 6.5 Summary

The performance and application scope of the conventional SMD are limited by

Table 6.2: Phase Error Under Different PVT Conditions

|        | SS, 1.62V, 125° | TT, 1.8V, 25° | FF, 1.98V, -40° |
|--------|-----------------|---------------|-----------------|
| 200MHz | 6ps             | 11ps          | 16ps            |
| 400MHz | 16ps            | 15ps          | 18ps            |

Table 6.3: ADSMD Performance Summary

| Process                    | 0.18µm CMOS |
|----------------------------|-------------|
| Supply Voltage (V)         | 1.8         |
| Operation Range (MHz)      | 200 ~ 400   |
| Input Duty Cycle Range (%) | 20 ~ 80     |
| Delay Resolution (ps)      | 10          |
| Phase Error (ps)           | 18          |
| Lock Time (clock cycles)   | 10          |
| Power Consumption (mW)     | 8.7 @400MHz |
| Area (mm²)                 | 0.08        |

the low accuracy phase alignment and the narrow-pulse clock demand. In this chapter, three important design concepts of the proposed SMD are proposed: a high-resolution delay line, a delay-matching structure, and a blocking edge-trigger scheme. The

proposed high-resolution delay line and delay-matching structure reduce the phase error between the external and internal clock, and the proposed blocking edge-trigger scheme extends the input duty cycle range without delay line length limitation. As a result, the proposed SMD can achieve wide duty cycle range and keep small static phase error compared with conventional designs, making it suitable for the clock synchronization in SoC applications.



# Chapter 7

# Conclusions and Future Works

### 7.1 Conclusions

In this dissertation, a systematic all-digital design approach to implement various high performance and low power clock generators, including ADPLL, ADSSCG, ADDLL, and ADSMD, for SoC applications has been presented. The proposed DCO which is the kernel module of all-digital clock generators employs a cascadable structure with coarse and fine-tuning stage to achieve high resolution and wide frequency range at the same time. The coarse-tuning stage utilizes a segmental delay line (SDL) to reduce redundant power, and the proposed hysteresis delay cell (HDC) can reduce the circuit complexity and loading of the fine-tuning stage to further lower down the power consumption.

For the power management system application, the proposed PLL employs a novel 2-level flash TDC to reduce lock-in time with low hardware cost. Besides, in the consumer electronics, microprocessor (µP) based systems, and data transmission circuits, how to reduce the electromagnetic interference (EMI) effect is an important design topic. Based on the proposed RDTM, the spreading ratio of the proposed ADSSCG can be specified flexibly by application demands while keeping the phase tracking capability. With the proposed low-power DCO and auto-adjustment

algorithm, the overall power consumption can be saved while keeping monotonic delay characteristic.

Double data rate (DDR) memories have been widely used for high-performance system in modern SoC designs to meet required data bandwidth. Because DDR memory controller needs specified clock and control signal to ensure the functionality and performance of data accesses, a tunable phase shift scheme based on all-digital delay locked loop (ADDLL) and digital control phase shifter (DCPS) has been proposed in this work to solve the delay mismatching issue. In addition, memory design utilizes the synchronous mirror delay (SMD) to eliminate the clock skew by wire delay mismatching. The proposed all-digital SMD (ADSMD) uses edge-trigger mirror delay cells to enlarge the input duty cycle range and fine-tuning delay lines with high-resolution delay cell to reduce the static phase error.

The proposed all-digital clock generators not only use the proposed DCO/delay cell and several design techniques to enhance performance and reduce power consumption, but also can be realized by standard cells in standard CMOS processes, making it easily portable to different processes as a soft intellectual property (IP). As a result, the proposed all-digital clock generators are very suitable for SoC applications as well as system-level integration.

### 7.2 Future Works

The proposed DCO employs a cascadable structure with coarse and fine-tuning stage to achieve high resolution and wide frequency range at the same time. However, this structure has several drawbacks. First, the controllable range of each stage should be larger than the delay step of the previous stage to ensure it does not have any dead zone larger than the LSB resolution of DCO. Thus, it needs over design to meet this design constraint, leading to increase power and area. Second, the non-monotonic problem will happen when DCO control code switch cross over different tuning stages. The non-monotonic problem may induce stability issue and large jitter. Recently, many researchers proposed the phase interpolation approach to implement a monotonic DCO design [53]-[56]. However, the phase interpolator is not only hard to obtain precise timing, but also has large power consumption. As a result, a new DCO structure should be proposed to overcome these design issues.

Furthermore, as the operating frequency of clock generator increases, we should pay more attention to several design considerations to ensure the performance and functionality. First, because the tolerance of the duty cycle variation becomes small, the clock generator should embed a duty cycle corrector (DCC) to maintain the duty cycle of clock generator output. Second, in order to achieve high operating frequency, the clock generator may utilize advanced process to implement the high-performance design. It will encounter many non-ideal design issues, such as large leakage current and heavy wire loading as chip area increased. Thus, how to design a nano-meter clock generator will be a great challenge. Third, because the design of SoC becomes more complex, the clock generator needs high immunity to PVT variations to ensure the performance and functionality. In the previous work, it only proposed a compensated solution for supply voltage variation [53]. To have more robust clock generator for high-frequency SoC applications, how to increase the immunity to PVT

variations is an important research topic in the future. In addition to these design issues, the low-power design techniques, such as voltage-domain partition and dynamic voltage scaling, can be applied in the all-digital clock generator to further reduce power consumption.

As IC technology grows up rapidly, the computing systems and high-speed serial links require very high communications bandwidth. Currently, the data rate of serial links is higher than 5Gb/s [5], [57]. Besides, only the data signal is transmitted to save cost, and the receiver must be capable of recovering the clock and data from the received serial-data stream in the serial-data-transmission systems. Thus, the high data rate clock and data recoveries (CDR's) which can recovery received data and clock is very important and essential for such applications.

Many high-speed CDR's that based on PLL/DLL architecture have been proposed [58]-[60]. However, how to design and implement a high-speed CDR using all-digital approach is still a challenge. Thus, our future research will be focused on the high-speed and all-digital CDR design.

# References

- [1] T. Olsson and P. Nilsson, "A digitally controlled PLL for Soc applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 751-760, May. 2004.
- [2] H. -T. Ahn and David J. Allstot, "A low-jitter 1.9-V CMOS PLL for UltraSPARC microprocessor pplications," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 450-454, Mar. 2000.
- [3] T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella, "A 90-nm variable frequency clock system for a power-managed Itanium architecture processor," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 218-228, Jan. 2006.
- [4] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "Next generation Intel® Core<sup>TM</sup> micro-architecture (Nehalem) clocking," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1121-1129, Apr. 2009.
- [5] Serial ATA III Electrical Specification Revision 1.0, May 2009.
- [6] K. B. Hardin, J. T. Fessler, and D. R. Bush, "Spread-spectrum clock generation for the reduction of radiated emissions," in *Proc. IEEE Int. Symp. Electromagnetic Compatibility*, pp. 227–231, 1994.
- [7] JEDEC, "DDR2 SDRAM Specification," JESD79-2E, Apr. 2008.
- [8] C. -C. Chung and C. -Y. Lee, "A new DLL-based approach for all-digital multiphase clock generation," *IEEE J. Solid-State Circuits*, vol. 39, no. 3, pp. 469-475, Mar. 2004.
- [9] T. Saeki et al., "A 2.5-ns clock access, 250-MHz, 256-Mb SDRAM with synchronous mirror delay," *IEEE J. Solid-State Circuits*, vol. 31, no. 12, pp. 1656-1668, Dec. 1996.
- [10] J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, "An all-digital phase-locked loop with 50-cycle lock time suitable for high-performance microprocessors," *IEEE J. Solid-State Circuits*, vol. 30, pp. 412-422, Apr. 1995.
- [11] C. -C. Chung and C. -Y. Lee, "An all digital phase-locked loop for high-speed clock generation," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 347-351, Feb. 2003.

- [12] D. Sheng, C. -C. Chung and C. -Y. Lee, "An all-digital phase-locked loop with high-resolution for SoC applications," *IEEE VLSI-DAT*, pp. 207-210, Apr. 2006.
- [13] D. Sheng, C. -C. Chung and C. -Y. Lee, "A fast-lock-in ADPLL with high-resolution and low-power DCO for SoC applications," *IEEE Asia Pacific Conf. on Circuits and Systems*, pp. 105-108, Dec. 2006.
- [14] R. B. Staszewski, D. Leipold, K. Muhammad, and P. T. Balsara, "All-digital PLL with ultra fast settling," *IEEE Trans. Circuits and Syst. II*, Express Briefs, vol. 54, no. 2, pp. 181-185, Jan. 2007.
- [15] M. Maymandi-Nejad and M. Sachdev, "A monotonic digitally controlled delay element," *IEEE J. Solid-State Circuits*, vol. 40, no. 11, pp. 2212-2219, Nov. 2005.
- [16] R. B. Staszewski, D. Leipold, K. Muhammad, and P. T. Balsara, "Digitally controlled oscillator (DCO)-based architecture for RF frequency synthesis in a deep-submicrometer CMOS process," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process*, vol. 50, no. 11, pp. 815-828, Nov. 2003.
- [17] E. Roth, M. Thalmann, N. Felber, and W. Fichtner, "A delay-line based DCO for multimedia applications using digital standard cells only," *IEEE International Solid-State Circuits Conference*, 2003, Digest of Technical Papers, pp. 432-433, Feb. 2003.
- [18] P. -L. Chen, C. -C. Chung and C. -Y. Lee, "A portable digitally controlled oscillator using novel varactors," *IEEE Trans. Circuits and Syst. II*, Express Briefs, vol. 52, no. 5, pp. 233-237, May 2005.
- [19] J. M. Rabaey, *Digital Integrated Circuits—A Design Perspective*, 2nd ed. NJ: Prentice-Hall, 2003.
- [20] I. Hwang, S. Lee, and S. Kim, "A digitally controlled phase-locked loop with fast locking scheme for clock synthesis applications," *IEEE International Solid-State Circuits Conference*, 2000, Digest of Technical Papers, pp.168-169, Feb. 2000.
- [21] J. Lee and B. Kim, "A low-noise fast-lock phase-locked loop with adaptive bandwidth control," *IEEE J. Solid-State Circuits*, vol. 35, no. 8, pp. 1137-1145, Aug. 2000.

- [22] T. Watanabe and S. Yamauchi, "An all-digital PLL for frequency multiplication by 4 to 1022 with seven-cycle lock time," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 198-204, Feb. 2003.
- [23] C. T. Gray, W. Liu, W. A. M. Van Noije, T. A. Hughes, Jr., and R. K. Cavin, III, "A sampling technique and its CMOS implementation with 1 Gb/s bandwidth and 25 ps resolution," *IEEE J. Solid-State Circuits*, vol. 29, no. 3, pp. 340-349, Mar. 1994.
- [24] P. M. Levine and G. W. Roberts, "High-resolution flash time-to-digital conversion and calibration for system-on-chip testing," *IEE Proc. –Computer*, Digit. Tech., vol. 152, No. 3, pp. 415-426, May 2005.
- [25] P. Dudek, S. Szczepan'ski, and J. V. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line," *IEEE J. Solid-State Circuits*, vol. 35, no. 2, pp. 240-247, Feb. 2000.
- [26] T. Yoshikawa, T. Hirata, T. Ebuchi, T. Iwata, Y. Arima, and H. Yamauchi, "An over-1-Gb/s transceiver core for integration into large system-on-chips for consumer electronics," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 9, pp. 1187-1198, Sep. 2008.
- [27] K. B. Hardin, J. T. Fessler, and D. R. Bush, "A study of the interference potential of spread-spectrum clock generation techniques," *IEEE Int. Symp. Electromagnetic Compatibility*, pp. 624-639, 1995.
- [28] H. -H. Chang, I. -H. Hua, and S. -I. Liu, "A spread-spectrum clock generator with triangular modulation," *IEEE J. Solid-State Circuits*, vol. 38, no. 4, pp. 673-677, Apr. 2003.
- [29] Y. -B. Hsieh and Y. -H. Kao, "A fully integrated spread-spectrum clock generator by using direct VCO modulation," *IEEE Trans. Circuits and Syst. I, Reg. Papers*, vol. 55, no. 7, pp. 1845-1853, Aug. 2008.
- [30] H. R. Lee, O. Kim, G. Ahn, and D. K. Jeong, "A low-jitter 5000 ppm spread spectrum clock generator for multi-channel SATA transceiver in 0.18 μm CMOS," *IEEE International Solid-State Circuits Conference*, 2005, Digest of Technical Papers, pp. 162–163, Feb. 2005.
- [31] Y. -B Hsieh and Y. -H Kao, "A spread-spectrum clock generator using fractional-N PLL with an extended range  $\Sigma \Delta$  modulator," *IEICE Trans. Electron*, vol. E89-C, no.6, pp.581-857, Jun. 2006.

- [32] S. Damphousse, K. Ouici, A. Rizki, and M. Mallinson, "All digital spread spectrum clock generator for EMI reduction," *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 145-150, Jan. 2007.
- [33] D. Sheng, C. -C. Chung and C. -Y. Lee, "An all digital spread spectrum clock generator with programmable spread ratio for SoC applications," *IEEE Asia Pacific Conf. on Circuits and Systems*, pp. 850-853, Nov. 2008.
- [34] D. Sheng, C. -C. Chung and C. -Y. Lee, "An ultra-low-power and portable digitally controlled oscillator for SoC applications," *IEEE Trans. Circuits and Syst. II, Exp. Briefs*, vol. 54, no. 11, pp. 954-958, Nov. 2007.
- [35] Encoding Parameters of Digital Television for Studios. ITU-R Recommendation BT.601, Recommendations of the ITU, Radiocommunication Sector.
- [36] T. Yoshimura, Y. Nakase, N. Watanabe, Y. Morooka, Y. Matsuda, M. Kumanoya, and H. Hamano, "A delay-locked loop and 90-degree phase shifter for 100 Mbps double data rate memories," *Symposium on VLSI Circuits Dig. Tech. Papers*, pp. 66–67, June 1998.
- [37] C. -C Chung, P. -L, C. -Y Lee, "An all-digital delay-locked loop for DDR SDRAM controller applications," *IEEE VLSI-DAT*, pp. 199-202, Apr. 2006.
- [38] J. -H Bae, J. -H Seo, H. -S Yeo, J. -W Kim, J. -Y Sim, and H. -J Park, "An all-digital 90-degree phase-shift DLL with loop-embedded DCC for 1.6Gbps DDR interface," *CICC Dig. Tech. Papers*, pp. 373–376, September 2007.
- [39] K. -I. Oh, L. -S. Kim, K. -I. Park, Y. -H. Jun and K. Kim, "Low-jitter multi-phase digital DLL with closest edge selection scheme for DDR memory interface," *Electron. Lett.*, vol. 44, no. 19, pp. 1121–1123, Sep. 2008.
- [40] D. Sheng, C. -C. Chung and C. -Y. Lee, "Fast-lock all-digital DLL and digitally-controlled phase shifter for DDR controller applications," *IEICE Electronics Express*, vol. 7, no. 9, pp. 634-639, May 2010.
- [41] C. -T. Wu, W. Wang, I. -C. Wey, and A. -Y Wu, "A scalable DCO design for portable ADPLL designs," *IEEE International Symposium on Circuits and Systems*, pp. 5449 5452, May 2005.
- [42] D. Shim, D.-Y. Lee, S. Jung, C.-H. Kim, and W. Kim, "An analog synchronous mirror delay for high-speed DRAM application," *IEEE J. Solid-State Circuits*, vol. 34, no. 4, pp. 484-493, Apr. 1999.

- [43] J. -J. Kim, S. -B. Lee, T. -S. Jung, C. -H. Kim, S. -I. Cho, and B. Kim, "A low-jitter mixed-mode DLL for high-speed DRAM applications," *IEEE J. Solid-State Circuits*, vol. 35, no. 10, pp. 1430-1436, Oct. 2000.
- [44] T. Seaeki, H. Nakamura, and J. Shimizu, "A 10 ps jitter 2 clock cycle lock time CMOS digital clock generator based on an interleaved synchronous mirror delay scheme," *IEEE Symposium on VLSI Circuits, Digest of Technical Papers*, pp. 109–110, 1997.
- [45] K. Sung, B. -D. Yang, and L. -S. Kim, "Low power clock generator based on an area-reduced interleaved synchronous mirror delay scheme," *IEEE International. Symposium Circuits and Systems*, pp. 671–674, May 2002.
- [46] T. Saeli, K. Minami, H. Yoshida, and H. Suzuki, "A direct-skew-detect synchronous mirror delay for application-specific integrated circuits," *IEEE J. Solid-State Circuits*, vol. 34, no. 3, pp. 372–379, Mar. 1999.
- [47] K. Sung, and L. -S. Kim, "A high-resolution synchronous mirror delay using successive approximation register," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1997-2004, Nov. 2004.
- [48] C. -L. Hung., C. -L. Wu, and K. -H. Cheng, "Arbitrary duty cycle synchronous mirror delay circuits design," *IEEE Asian Solid-State Circuits Conference*, pp. 283-286, Nov. 2006.
- [49] K.-H. Cheng, C.-W. Su and S.-W. Lu, "Wide-range synchronous mirror delay with arbitrary input duty cycle," *Electronics Letters*, vol. 44, no. 11, pp. 665-667, May 2008.
- [50] Y. -M. Wang and J. -S. Wang, "A low-power half-delay-line fast skew-compensation circuit," *IEEE J. Solid-State Circuits*, vol. 39, no. 6, pp. 906-918, Jun. 2004.
- [51] J. -S. Wang, C. -Y. Cheng, J. -C. Liu, Y. -C. Liu, and Y. -M. Wang, "A duty-cycle-distortion-tolerant half-delay-line low-power fast-lock-in all-digital delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 45, no. 5, pp. 1036-1047, May 2010.
- [52] D. Sheng, C. -C. Chung, and C. -Y. Lee, "Wide duty cycle range synchronous mirror delay designs," *Electronics Letters*, vol. 46, no. 5, pp. 338-340, Mar. 2010.

- [53] B. -M. Moon, Y. -J. Park and D. -K. Jeong, "Monotonic wide-range digitally controlled oscillator compensated for supply voltage variation," *IEEE Trans. Circuits and Syst. II, Exp. Briefs*, vol. 55, no. 10, pp. 1036-1040, Oct. 2008.
- [54] K. -H. Choi, J. -B. Shin, J. -Y. Sim, and H. -J. Park, "An interpolating digitally controlled oscillator for a wide-range all-digital PLL," *IEEE Trans. Circuits and Syst. I, Reg. Papers*, vol. 56, no. 9, pp. 2055-2063, Sep. 2009.
- [55] M. Combes, K. Dioury, and A. Greiner, "A portable clock multiplier generator using digital CMOS standard cells," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 958–965, Jul. 1996.
- [56] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y. -F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 632–644, May 1999.
- [57] Universal Serial Bus Specification Revision 3.0, Nov. 2008.
- [58] P. K. Hanumolu, G. -Y. Wei, and U. -K. Moon, "A wide-tracking range clock and data recovery circuit," *IEEE J. Solid State Circuits*, vol. 43, no. 2, pp. 425-439, Feb. 2008.
- [59] Y. -S. Seo, J. -W. Lee, H. -J. Kim, C. Yoo, J. -J. Lee, and C. -S. J, "A 5-Gbit/s clock- and data-recovery circuit with 1/8-rate linear phase detector in 0.18-um CMOS technology," *IEEE Trans. Circuits and Syst. II, Exp. Briefs*, vol. 56, no. 1, pp. 6-10, Jan. 2009.
- [60] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, R. Reutmann, M. Ruegg, M. Schmatz, and J. Weiss, "0.94 ps-rms-jitter 0.016 mm 2.5 GHz multi-phase generator PLL with 360° digitally programmable phase shift for 10 Gb/s serial links," *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp. 410-411, Feb. 2005.