## 國立交通大學

# 電子工程學系電子研究所

博士論文



研究生: 周儒明

指導教授: 吳介琮

中華民國九十六年六月

## 精準多相位時脈產生技術

## **High-Precision Multi-Phase Clock Generation**

| 研究生  | : | 周儒明 | Student | : | Ju-Ming Chou   |
|------|---|-----|---------|---|----------------|
| 指導教授 | • | 吳介琮 | Advisor | : | Jieh-Tsorng Wu |

國立交通大學

電機學院 電子工程學系 電子研究所



Submitted to Department of Electronics Engineering and Institute of Electronics College of Electrical and Computer Engineering National Chiao-Tung University in partial Fulfillment of the Requirements for the Degree of **Doctor of Philosophy** 

> in **Electronics Engineering** June 2007 Hsin-Chu, Taiwan, Republic of China

> > 中華民國九十六年六月

## 精準多相位時脈產生技術

學生:周儒明 指導教授:吳介琮

#### 國立交通大學

電機學院 電子工程學系 電子研究所

#### 摘要

多相位時脈常被應用在時序還原電路、相位/頻率調變電路與時序交錯電路 中。這些電路的效能主要被多相位時脈的解析度所決定。換句話說,多相位時脈 的數量與精確度決定了系統效能。這是由於這些相位皆被利用來控制資料處理的 程序。隨著應用頻率的增加,資料處理的時間週期會相對地縮小。若此時電路發 生了不匹配的現象,則時脈訊號的時間邊界將會因而減小,造成資料處理的困難 度。在製程技術的不斷演進之下,電路不匹配的情形將會因電路元件的縮小更加 的嚴重。

本篇論文描述一種使用電阻串列與電阻環圈以達成相位平均化與相位內插目的 之多相位時脈產生技術。相位平均化可以降低相位誤差,而相位內插可以增加可 取用的相位數量。除了波型因素之外,相位平均化與相位內插的效能是被相位平 均化電路的時間常數規格化後之時脈頻率所決定。為了獲得更高的相位精確度, 該相位平均化電路需要較小的時間常數。若系統可以產生出精確地且多相位的時 脈訊號,則可以利用褶疊器去產生倍頻的時脈訊號。為了驗證電阻環圈的相位平 均化與相位內插能力,本論文設計了一個使用標準0.35 µm 互補金屬氧化物半導 體製程技術的數位對相位轉換器。量測結果顯示,此應用了相位平均化與相位內 差技術的數位對相位轉換器可以達到八位元的解析度。

使用預先充電式延遲單元以調整時脈產生器輸出相位之延遲時間的電路技術

也將於本論文內闡述。其延遲時間調整機制是藉由改變該延遲單元內部端點的充 放電行為而產生。該機制於各種不同的條件之下的線性度將會被詳述。該延遲單 元需要使用數位對類比轉換器以產生預先充電之電壓準位,且需要組合式邏輯閘 以由時脈產生器之輸出產生時序控制訊號。為了驗證該延遲單元之功能,本論文 設計了一個使用標準 0.18 µm 互補金屬氧化物半導體製程技術的鎖相迴路。量測 結果顯示,該延遲單元具有 0.145 psec 的解析度,且整體可控制延遲時間範圍為 69.78 psec。



## **High-Precision Multi-Phase Clock Generation**

Student : Ju-Ming Chou

Advisor : Jieh-Tsorng Wu

# Department of Electronics Engineering and Institute of Electronics National Chiao-Tung University



Multi-phase clocks can be found in applications such as timing recovery, phase/frequency modulation and demodulation, and time-interleaved applications. The performance of those systems is mainly determined by the resolution of the available clock phases, i.e., how many and how accurate the available phases are, because all of the output phases of multi-phase clock generators will connect to outer parts for controlling the procedure of data processing. With the increasing of the applied processing frequency, the period of data processing is getting shorter and shorter. Also, if the circuits in the clock generators do not match with outer environment and inner status, the timing margin of the clock signals will get much narrower to raise the complexity of the data processing terribly. Furthermore, with the ongoing advance of fabrication process, the problem of circuit mismatch will be getting worse and worse because of the circuit element's shrinkage.

This thesis described circuit techniques using resistor strings (R-strings) and resistor rings (R-rings) for phase averaging and interpolation. Phase averaging can reduce phase errors, and phase interpolation can increase the number of available phases. In addition to the waveform shape, the averaging and the interpolation performances of the R-string and

R-rings are determined by the clock frequency normalized by a RC time constant of the circuits. To attain better phase accuracy, a smaller RC time constant is required, but at the expense of larger power dissipation. If multiple-phase clocks are available, folders can be used for frequency multiplication. To demonstrate the resistor ring's capability of phase averaging and interpolation, an 8b 125MHz digital-to-phase converter (DPC) was designed and fabricated using a standard 0.35  $\mu$ m SPQM CMOS technology. Measurement results show that the DPC attains 8-bit resolution using the proposed phase averaging and interpolation technique.

Circuit techniques using variable pre-charged delay units (VPDUs) for adjusting the time delay of the output phases of clock generators are also described in this thesis. The delay tuning mechanism is realized by changing the charging and discharging behavior at the internal node in VPDUs. The linearity of delay tuning in different conditions is described. VPDUs require digital-to-analog converters for providing pre-charging voltages, and combinational logic gates for generating timing control signals from clock generator outputs. To demonstrate the VPDU's capability, an 8-channels 1GHz phase-locked loop was fabricated using a standard 0.18  $\mu$ m CMOS technology. The digitally-controlled VPDU has a 0.145 psec delay control resolution and a total control range of 69.78 psec.

mann

## Acknowledgements

First, I would like to thank my advisor Prof. Jieh-Tsorng Wu for his support and guidance in my research. Whenever I encounter the difficulties or problems, he is always patient to give me the direction and encouragement. I would also like to thank Prof. Chung-Yu Wu, Prof. Jiin-Chuan Wu, Prof. Ming-Dou Ker, and Prof. Wei-Zen Chen for teaching the courses and giving me the encouragement.

I wish to thank Mr. Cheng-Chung Hsu and Mr. Hung-Chih Liu for discussing the design skill. I wish to extend my gratitude to my classmates, Mr. Kuo-Chun Hsu, Miss Zwei-Mei Lee, Mr. Chung-Yun Chou, Miss Li-Ju Lin, Mr. Kuan-Hsun Huang, and Mr. Feng-Fei Ma, for discussing the courses. I also thank Mr. Chang-Tsung Fu, Mr. Jen-Lin Fan, Miss Chai Yun, Mr. Chi-Wei Fan, Mr. Chun-Cheng Huang, Mr. Chung-Yi Wang, Mr. Wei-Hsin Tseng, Mr. Yung-Hui Chung, Mr. Tzu-Chang Wang, Mr. Bing-Nang Fang, Mr. Su-Hao Wu, Mr. Jih-Ming Chang, and Mr. Cheng-Chan Tien for the wonderful research life in 307 Lab. I thank the Chip Implementation Center for chip fabrication.

Finally, I would like to express my greatest appreciation to my mother, my brother, and my girl friend, Miss Hsin-Yi Lin, for their unconditional support and encouragement.

JU-MING CHOU

National Chiao-Tung University 2007, June



# Contents

| 中  | 文摘.     | 要             |                                          | i    |
|----|---------|---------------|------------------------------------------|------|
| Er | nglish  | Abstra        | ict                                      | iii  |
| Ac | cknow   | ledgem        | ients                                    | v    |
| Li | st of ] | <b>Fables</b> | ES S                                     | xi   |
| Li | st of l | Figures       | 1895                                     | xiii |
| 1  | Intr    | oductio       | n All All All All All All All All All Al | 1    |
|    | 1.1     | Motiva        | ation                                    | . 1  |
|    | 1.2     | Organi        | ization                                  | . 3  |
| 2  | Mul     | ti-Phas       | e Clock Generation                       | 5    |
|    | 2.1     | Introdu       | uction                                   | 5    |
|    | 2.2     | Phase-        | Locked Loop and Delay-Locked Loop        | 6    |
|    |         | 2.2.1         | Performance Limitation                   | 9    |
|    | 2.3     | Extra (       | Clock Phase Generation                   | 12   |
|    |         | 2.3.1         | Two-Dimensional Array Oscillator         | 13   |
|    |         | 2.3.2         | Phase Interpolation                      | 18   |
|    |         | 2.3.3         | Delay-Locked Loop Array                  | 20   |
|    | 2.4     | Phase         | Accuracy Enhancement Technique           | 22   |
|    |         | 2.4.1         | Self-Calibrated Phase-Locked Loop        | . 22 |
|    |         | 2.4.2         | Self-Calibrated Delay-Locked Loop        | 24   |

|    |       | 2.4.3 Shifted-Averaging Voltage-Controlled Delay Line | 26 |
|----|-------|-------------------------------------------------------|----|
|    | 2.5   | Jitter in Clock Generator                             | 27 |
|    | 2.6   | Summary                                               | 29 |
| 3  | Pha   | se Processing Using Resistor Strings                  | 31 |
|    | 3.1   | Introduction                                          | 31 |
|    | 3.2   | Phase Averaging Using R-String                        | 31 |
|    | 3.3   | Phase Interpolation Using R-String                    | 41 |
|    | 3.4   | Resistor Rings                                        | 44 |
|    | 3.5   | Frequency Multipliers                                 | 45 |
|    | 3.6   | Summary                                               | 46 |
| 4  | An 8  | Bb 125MHz Digital-to-Phase Converter                  | 47 |
|    | 4.1   | Introduction                                          | 47 |
|    | 4.2   | Architecture of Digital-to-Phase Converter            | 47 |
|    |       | 4.2.1 Fully-Differential Delay Cell                   | 50 |
|    |       | 4.2.2 Isolation Buffer                                | 53 |
|    | 4.3   | Experimental Results                                  | 56 |
|    | 4.4   | Summary                                               | 60 |
| 5  | Hig   | n-Resolution Phase Adjusting Technique                | 61 |
|    | 5.1   | Introduction                                          | 61 |
|    | 5.2   | Phase Adjusting Techniques                            | 62 |
|    | 5.3   | Variable Pre-Charged Delay Unit                       | 65 |
|    | 5.4   | An 8-Channels 1GHz Phase-Locked Loop                  | 68 |
|    | 5.5   | Experimental Results                                  | 70 |
|    | 5.6   | Summary                                               | 75 |
| 6  | Con   | clusions                                              | 77 |
|    | 6.1   | Summary                                               | 77 |
|    | 6.2   | Recommendations for Future Investigation              | 79 |
| Ap | opend | ix A Linear Models of PLLs/DLLs                       | 81 |

| Appendix B   | <b>Resistor String's Frequency Response</b> | 85  |
|--------------|---------------------------------------------|-----|
| Appendix C   | Sub-Harmonic Phase Interpolation Technique  | 87  |
| Bibliography | 7                                           | 91  |
| Vita         |                                             | 101 |





# **List of Tables**

| 4.1 | Performance summary of the 8b digital-to-phase converter | 60 |
|-----|----------------------------------------------------------|----|
| 5.1 | Performance summary of the 8-channels phase-locked loop  | 74 |
| 5.2 | Performance comparison                                   | 75 |





# List of Figures

| 1.1  | Time-interleaved system.                                                   | 2  |
|------|----------------------------------------------------------------------------|----|
| 2.1  | Block diagrams of delay cell based multi-phase clock generators. (a)       |    |
|      | phase-locked loop, (b) delay-locked loop.                                  | 7  |
| 2.2  | A basic phase-locked loop with eighteen output phases                      | 9  |
| 2.3  | Output phases of a nine-stages ring oscillator. (a) ideal environment, (b) |    |
|      | real environment                                                           | 11 |
| 2.4  | Output phases of a delay line                                              | 12 |
| 2.5  | A 3x3 coupled ring oscillator array.                                       | 14 |
| 2.6  | Operation modes of a 3x3 coupled ring oscillator array                     | 15 |
| 2.7  | A phase interpolator with 360° tuning range.                               | 16 |
| 2.8  | 4X phase interpolation with phase blender circuits                         | 17 |
| 2.9  | Phase blender for phase-resolution improvement. (a) simple model of        |    |
|      | phase-blending inverters, (b) phase-blender output with $w = 0.5$ , (c)    |    |
|      | phase-blender output with optimal w                                        | 18 |
| 2.10 | A delay-locked loop array.                                                 | 21 |
| 2.11 | A self-calibrated phase-locked loop.                                       | 23 |
| 2.12 | A self-calibrated delay-locked loop.                                       | 25 |
| 2.13 | A eight-stage shifted-averaging voltage-controlled delay line              | 26 |
| 3.1  | A simple delay line.                                                       | 32 |
| 3.2  | A delay line with a R-string.                                              | 33 |
| 3.3  | A delay line with a R-string and isolation buffers.                        | 34 |
| 3.4  | A simplified model for analyzing a delay line with a R-string              | 34 |
|      | · - · ·                                                                    |    |

| 3.5  | R-string frequency response at locations $x = 0$ , $x = \pm 1$ , and $x = \pm 2$ | 36 |
|------|----------------------------------------------------------------------------------|----|
| 3.6  | R-string's space response of magnitude for $\beta = 1, 10^{-1}, 10^{-2}$ .       | 36 |
| 3.7  | R-string's space response of phase for $\beta = 1, 10^{-1}, 10^{-2}$ .           | 37 |
| 3.8  | R-string's INL reduction factor, $\mathcal{R}_{INL}$ , versus $\beta$            | 38 |
| 3.9  | R-string's DNL reduction factor, $\mathcal{R}_{DNL}$ , versus $\beta$            | 40 |
| 3.10 | R-string voltage response for different input phase spacing                      | 41 |
| 3.11 | Phase interpolation using R-string                                               | 42 |
| 3.12 | Phase error of a 16X R-string phase interpolator                                 | 43 |
| 3.13 | Phase interpolation and averaging using R-ring                                   | 44 |
| 3.14 | A frequency tripler using 3X folders                                             | 45 |
| 3.15 | A three-input folder for the frequency tripler.                                  | 46 |
| 4.1  | An 8b digital-to-phase converter                                                 | 48 |
| 4.2  | The effect of duty cycle $(a)$ 50% $(b)$ 62 5%                                   | 50 |
| 4.3  | A fully-differential delay cell                                                  | 51 |
| 4.4  | A self-biased replica-feedback bias circuit                                      | 51 |
| 4 5  | A fully-differential amplifier with cross-coupled and diode-connected loads      | 54 |
| 4.6  | A isolation buffer.                                                              | 55 |
| 4.7  | Phase error of the R-Ring 2 phase interpolator.                                  | 56 |
| 4.8  | Chip micrograph of the digital-to-phase converter                                | 57 |
| 4.9  | Measured output jitter of the digital-to-phase converter.                        | 57 |
| 4.10 | (a) Measured transfer characteristics of the digital-to-phase converter. (b)     |    |
|      | Measured INL. (c) Measured DNL.                                                  | 58 |
| 4.11 | Measurement setup of the digital-to-phase converter.                             | 59 |
| 5 1  | A phase internalator                                                             | 62 |
| 5.1  | A simple digitally controlled delay element with <i>BC</i> time constant varia   | 02 |
| 5.2  | tion techniques                                                                  | 63 |
| 53   | A simple digitally controlled delay element with current variation tech          | 05 |
| 5.5  | niques                                                                           | 64 |
| 54   | Simulated transfer curves of 6-bits digitally controlled delay elements          | 65 |
| 5.4  | Simulated transfer curves of 0-bits digitally controlled delay ciellelits        | 05 |

| 5.5  | (a) Schematic of variable pre-charged delay units. (b) Timing diagram of                                                                  |    |
|------|-------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | the VPDU's operation.                                                                                                                     | 66 |
| 5.6  | A multi-phase clock generator with variable pre-charged delay units                                                                       | 69 |
| 5.7  | $\phi_1$ - $V_{o,1}$ signal path                                                                                                          | 69 |
| 5.8  | Chip micrograph of the 8-channels phase-locked loop                                                                                       | 70 |
| 5.9  | Measured jitter performance of the 8-channels phase-locked loop                                                                           | 71 |
| 5.10 | Measured transfer curves of fine VPDU                                                                                                     | 72 |
| 5.11 | Measured DNL and INL of a fine VPDU with 0 $V \le V_L \le 0.5 V$ and                                                                      |    |
|      | $1.3 V \leq V_H \leq 1.8 V. \dots $ | 72 |
| 5.12 | Measured DNL and INL of a fine VPDU with $0 V \le V_L \le 0.6 V$ and                                                                      |    |
|      | $1.2 V \leq V_H \leq 1.8 V. \dots $ | 73 |
| 5.13 | Simulated transfer curves of 6-bits digitally controlled delay elements                                                                   | 73 |
| 5.14 | Measurement setup of the 8-channels phase-locked loop                                                                                     | 74 |
| A.1  | Linear models of PLLs and DLLs. (a) PLL without charge pump, (b) PLL                                                                      |    |
|      | with charge pump, (c) DLL without charge pump, (d) DLL with charge                                                                        |    |
|      | pump                                                                                                                                      | 82 |
| C.1  | A simple mixer.                                                                                                                           | 88 |
| C.2  | Simulated transfer curves of normalized output phase versus input ampli-                                                                  |    |
|      | tude difference                                                                                                                           | 89 |
| C.3  | Simulated transfer curves of normalized output amplitude versus input                                                                     |    |
|      | amplitude difference.                                                                                                                     | 90 |

# **Chapter 1**

## Introduction

ANTIMUTA I

## **1.1 Motivation**

With the development of communication technique, there are more and more articles dealing with the generation of multi-phase clocks in the complementary metal-oxide semiconductor (CMOS) integral circuit design [1] [2] [3], i.e., generation of multiple periodic clock waveforms with different phases that equally divide the time period of an input reference clock. Multi-phase clocks can be found in applications such as timing recovery [4] [5] [6], phase/frequency modulation and demodulation [7] [8] [9], and time-interleaved applications [10] [11] [12] [13] [14]. The performance of those systems is mainly determined by the resolution of the available clock phases, i.e., how many and how accurate the available phases are, because all of the output phases of multi-phase clock generators will connect to outer parts for controlling the procedure of data processing.

Fig. 1.1 shows the block diagram of a time-interleaved system. The system uses all of the output phases of a multi-phase clock generator to sample the input data and then outputs by a multiplexer. The number of the output phases of the clock generator and the operation frequency of the clock generator dominate the processing frequency of the time-interleaved system. With the increasing of the applied processing frequency, the period of data processing is getting shorter and shorter. Also, if the circuits in the clock generators do not match with outer environment and inner status, the timing margin of the clock signals will get much narrower to raise the complexity of the data processing terri-



Figure 1.1: Time-interleaved system.

#### 

bly. Furthermore, with the ongoing advance of fabrication process, the problem of circuit mismatch will be getting worse and worse because of the circuit element's shrinkage.

Multi-phase clocks are usually generated using a delay line consisting of cascaded delay cells, whose delay time is controlled by either a delay-locked loop or a phase-locked loop. If the delay cells are identical, then their outputs have identical waveform shapes but different phases. The number of available phases is the number of delay cells that constitute one (or half) clock period. The accuracy of the output phases is determined by the matching properties of the delay cells. At high clock frequencies, the available phases are limited by the minimum delay of the delay cells.

It is possible to attain phase resolution beyond the phase quantization step set by the delay cells. One novel scheme is using two-dimensional array oscillators. The phase resolution is increased by the number of coupled rings at the expense of larger chip area and power dissipation. An alternative is using phase interpolators, which combine two clock waveforms of different phases to generate a new one. The resulting phase is determined by the combination weighting of the two inputs. In CMOS technologies, phase interpolators are usually realized using two source-coupled pairs sharing the same output port; and the ratio of their tail currents set the combination weighting. However, the relationship between the output phase and the current ratio is not linear and sensitive to other factors

#### 1.2. ORGANIZATION

such as source-coupled pair's transconductance characteristics, waveform shape of the inputs, and the pole frequency of the output port. Phase accuracy can be improved by using cascaded arrays of identical phase interpolators with fixed combination weighting. In this scheme, each phase interpolator is optimized to produce an output whose phase is located at the center of the two input phases.

Phase accuracy is often improved by using phase accuracy enhancement techniques. The techniques often use phase detectors or statistical analysis for determining the phase errors, and insert delay cells into each clock output paths for fine tuning the output phases. The resolution of the delay cells dominates the performance of the phase accuracy enhancement techniques. The major penalties of those techniques are complex calibration loops, control circuits, and additional noise sources in the clock output paths.

This thesis describes a circuit technique that uses resistor strings (R-strings) or resistor rings (R-rings) for phase interpolation. Due to the symmetric nature of the circuit topology, the phases of the generated new clocks can be uniformly spaced. The R-strings and R-rings also exhibit an averaging capability that can reduce the phase errors caused by mismatches among the delay cells. Another circuit technique that uses variable precharged delay units (VPDUs) inserted in the clock paths for digitally-controlled the delay time of each clock path is described, too. It could be used to fine tune the output phases for better phase accuracy.

## 1.2 Organization

The rest of this thesis is organized as follows:

Chapter 2 describes the multi-phase clock generation. The prototypes of the clock generators are introduced. Their behavior and performance limitation are discussed. Several methods which could increase the available output phases and improve the quality of the output phases are introduced.

Chapter 3 describes the phase averaging and phase interpolation technique using the R-strings. A simplified model with capacitive loadings is used to analyze the phase averaging effect. The impact of capacitive loading on phase interpolation is also discussed. The condition and benefits of using R-rings is discussed. The technique of frequency

multiplication using R-rings and folders is also presented.

In Chapter 4, an 8b 125MHz CMOS digital-to-phase converter is described to demonstrate the proposed phase averaging and interpolation techniques. The digital-to-phase converter chip was fabricated in a standard 0.35  $\mu$ m CMOS technology. The designs of fully-differential delay cells and isolation buffers for the phase averaging technique are also described.

Chapter 5 describes the high-resolution phase adjusting techniques. The linearity of the delay units and phase shifters dominates the performance of the phase adjusting techniques. Therefore, a novel delay unit called VPDU is introduced. The operation principle and timing diagram of the VPDUs are introduced. A design example is also presented. The chip was fabricated in a standard 0.18  $\mu$ m CMOS technology.

Chapter 6 draws the conclusions of this thesis and makes recommendations for future work.



# **Chapter 2**

# **Multi-Phase Clock Generation**

## 2.1 Introduction

Multi-phase clocks are usually generated by multi-phase clock generators. The clock generators could be identified as two classifications roughly. One of the classifications is the phase-locked loop. It generates multi-phase clocks by the voltage-controlled oscillator. Another classification is the delay-locked loop. It generates multi-phase clocks by the voltage-controlled delay line. The voltage-controlled oscillator and the voltage-controlled delay line could be constructed by many different components with different methods. For example, the traveling wave principle could be used in the voltage-controlled oscillator design [15] [16] [17]. Its advantages are accurate high frequency signal generation, adiabatic operation, and good phase noise properties. In CMOS techniques, the most popular design methods of the voltage-controlled oscillator and the voltage-controlled delay line are cascading delay cells as a ring or a string [18] [19] [20].

In this chapter, the architectures, operation principles, analysis, and limitations of the phase-locked loops and the delay-locked loops with delay cell based voltage-controlled oscillators and voltage-controlled delay lines are introduced. The methods which could increase the number of output phases of the clock generators and improve the output phase quality are collected and discussed. Finally, the cause of jitter in clock generators is introduced.

## 2.2 Phase-Locked Loop and Delay-Locked Loop

Fig. 2.1 shows the two different architectures of delay cell based multi-phase clock generators, phase-locked loop and delay-locked loop. The basic operation concepts of the two architectures are very similar. They adjust their output phases to follow the input reference clocks.

In the phase-locked loop shown in Fig. 2.1(a), the voltage-controlled oscillator generates multi-phase clocks at each output node of the delay cells with the same delay time. The phase-locked loop is used to control the oscillation frequency of the voltagecontrolled oscillator. The phase detector compares the phase difference between the output phase of the voltage-controlled oscillator and the input reference clock, and then outputs the error message about the phase difference. The charge-pump circuit receives the error message and then adjusts its output, control voltage, to increase or decrease the operation frequency of the voltage-controlled oscillator. The output of the charge-pump circuit would be filtered by a low-pass filter to filter the high-frequency noise. Because the voltage-controlled oscillator outputs phases by frequency integration, the phase-locked loop is a high-order control system. The transfer function of the system includes two poles at origin. The first pole is caused by the characteristic of the voltage-controlled oscillator. Another pole is caused by the loop filter. In order to cancel the effect of the two poles, the transfer function should include a zero for the stability of the loop system [21]. This zero could be produced by a resistor and a capacitor with serial connection in the loop filter. However, due to the zero, high frequency attenuation of the loop filter is not enough to filter the high frequency noise. In order to solve this problem, another capacitor is placed in parallel. Finally, the loop system becomes a third-order system, thus the phase margin and the stability of the loop become critical issues. By setting the capacitance of the additional capacitor is 1/10 times to 1/15 times of another capacitor. The loop system could be approximated to a second-order system.

The classification of high-order transfer function would produce more complex problems. For example, the process variation would change the position of the zero, thus the stability of the system would be destroyed [22] [23]. However, the voltage-controlled oscillator provides many advantages. First, because the whole loop is a low-pass filter, the



Figure 2.1: Block diagrams of delay cell based multi-phase clock generators. (a) phase-locked loop, (b) delay-locked loop.

jitter of the input reference clock would not affect the output clocks of the system directly. Second, the output frequency could be the multiple of the frequency of the input reference clock by using a frequency divider in the loop system. That is why the phase-locked loop is widely used in clock generation systems.

The delay-locked loop shown in Fig. 2.1(b) is only used in the fixed frequency systems. The voltage-controlled delay line generates multi-phase clocks at each output node of the delay cells with the same delay time. The delay-locked loop is used to control the delay time of the voltage-controlled delay line. The phase detector compares the phase difference between the output phase of the voltage-controlled delay line and the input reference clock, and then outputs the error message about phase difference. The chargepump circuit receives the error message and then adjusts its output, control voltage, to increase or decrease the delay time of the voltage-controlled delay line. The output of the charge-pump circuit would be filtered by a low-pass filter to filter the high-frequency noise. The voltage-controlled delay line is a simple delay gain element and has not the function of frequency integration as voltage-controlled oscillators. There is no necessary to add a zero in the loop system for stability. Therefore, the loop filter could be implemented by a simple integrator. The loop system must a very stable system, because the effect factors are tail away. That also means the design of delay-locked loops would be easier than the design of phase-locked loops. The detail analysis of phase-locked loops and delay-locked loops is described in Appendix A.

The major difference of phase-locked loops and delay-locked loops is the response when jitter occurs. In general, phase-locked loops have higher sensitivity of power supply noise and substrate noise, when those two systems are operated in the same frequency [24].

The operation frequency of voltage-controlled oscillators would be varied, when the voltages of power supply and substrate are changed by noise. The frequency error would become phase error by the frequency integration effect. The amount of phase error accumulation would be eliminated slowly by the loop. If the bandwidth of the loop is larger, the behavior of elimination would be faster. Otherwise, if the bandwidth is narrower, the behavior would be slower. If the supply voltage is varied in the delay-locked loop, the delay time would be affected at the moment when the noise occurs. The delay-locked loop



Figure 2.2: A basic phase-locked loop with eighteen output phases.

would not accumulate the phase error. Therefore, the loop bandwidth could be narrower to reduce the effect of the jitter of the input reference clock.

From above discussion, the design of delay-locked loops is easier than phase-locked loops, and the delay-locked loop is a stable system. There is no the effect of phase error accumulation in delay-locked loops. Therefore, if frequency multiplication is not required, the delay-locked loop is a better choice than the phase-locked loop.

A ALLER A

#### 2.2.1 Performance Limitation

Multi-phase clock generators which are constructed by delay cells could be identified as two classifications roughly. One of the classifications is the phase-locked loop which includes a voltage-controlled oscillator constructed by cascading delay cells as a ring. Another classification is the delay-locked loop which includes a voltage-controlled delay line constructed by cascading delay cells as a string. The phase-locked loop with ring oscillator is used as an example to introduce the operation principle of the ring oscillator, to define the correlative nouns, and to describe the performance limitation factors of multiphase clock generators.

Fig. 2.2 shows the block diagram of a basic phase-locked loop with eighteen output phases. The ring oscillator is constructed by fully-differential delay cells. It would be self-oscillation by its feedback loop. Assume the phase-locked loop is in lock, if there is

a positive voltage difference at the input nodes of the first delay cell, and then it would become negative voltage difference after the feedback of the loop. The delay time of this phenomenon is  $T_s/2$  where  $T_s$  is the oscillation period. After another  $T_s/2$ , the voltage difference would become positive again due to the feedback of the loop. Therefore, the oscillation behavior is established. The delay time of each delay cell of the ring oscillator is one oscillation period over two times of the number of the delay cells. If the ring oscillator is constructed by N delay cells, then the oscillation period and oscillation frequency,  $f_{osc}$ , would be obtained by

$$T_s = 2 \cdot N \cdot t_d \tag{2.1}$$

$$f_{osc} = \frac{1}{T_s} = \frac{1}{2 \cdot N \cdot t_d} \tag{2.2}$$

where  $t_d$  is the delay time of a delay cell. From the fact shown in Equation 2.2, if the oscillation frequency of the ring oscillator is required to be changed, then the delay time or the number of delay cells should be changed. In order to meet the required oscillation frequency defined by system specification, a better method is using a control voltage to change the delay time of delay cells.

The maximum number of the output phases of the ring oscillator,  $2N_{Max}$ , is limited by the required operation frequency and the minimum delay time of a delay cell,  $t_{d,Min}$ .

$$2N_{Max} = \frac{T_s}{t_{d,Min}} = \frac{1}{f_{osc} \cdot t_{d,Min}}$$
(2.3)

Another performance limitation of the multi-phase clock generator is phase accuracy. When a phase-locked loop is in lock and the ring oscillator is constructed by nine fully-differential delay cells, the ring oscillator should produce eighteen phases which are equally-spaced distributed around an oscillation period. The phase difference of each two adjacent phases is completely equal. Fig. 2.3(a) shows the ideal output result which the ring oscillator should produce. However, due to the process variation, defects of substrate, and so on, the delay cells and the metal-oxide-semiconductor field-effect transistors (MOSFETs) are mismatch. The phase difference of each two adjacent phases errors occur. Besides, in order to reduce the chip area, the MOS-FETs with minimum channel length are often chosen. With the advancement of process, the channel length of MOSFETs becomes smaller, so the mismatching probability and



Figure 2.3: Output phases of a nine-stages ring oscillator. (a) ideal environment, (b) real environment.

effect would become larger. Therefore, in real environment, the ring oscillation would produce eighteen phases with unequally-spaced phase difference which is shown in Fig. 2.3(b).

In order to determine the difference of the phases produced by a multi-phase clock generator between the ideal environment and the real environment, two quality factors are defined.

- Integral nonlinearity (INL).
- Differential nonlinearity (DNL).

INL is the phase error which can be obtained by calculating the difference between the real output phases and the ideal output phases. DNL is the phase difference error which can be obtained by calculating the difference between the real output phase differences and the ideal output phase differences. Fig. 2.4 shows the output phases of a delay line. In real environment, the delay time of each delay cell is not the same, and then phase error occurs. The INL of the output phases of the delay line is  $\phi_x^e$ , where  $x = \cdots, -2, -1, 1, 2, \cdots$ . The



Figure 2.4: Output phases of a delay line.

DNL of the output phases is  $|\phi_{x-1} - \phi_x| - \Delta \phi$ , where  $x = \cdots, -2, -1, 0, 1, 2, \cdots$  and  $\Delta \phi$  is the ideal output phase difference. If INL and DNL are smaller, the quality of the output phases of multi-phase clock generators is better.

44000

## 2.3 Extra Clock Phase Generation

The maximum number of the output phases of multi-phase clock generators is limited by the required operation frequency of the clock generator and the minimum delay time of a delay cell in the clock generator. In order to increase the available output phase, many techniques are presented in many journals [25] [26] [27]. They could be identified as two classifications roughly. One of the classifications is phase interpolation by using phase interpolator. Another is array structure by combining several voltage-controlled oscillators or voltage-controlled delay lines.

In this section, three methods would be introduced. They are two-dimensional array oscillator, phase interpolation, and delay-locked loop array. They are the major methods to increase the output phases of multi-phase clock generators in recent years.

#### 2.3.1 Two-Dimensional Array Oscillator

Two-dimensional array oscillators could be used to overcome the limitation of the minimum delay time of delay cells and to output more available phases [28] [29]. They are constructed by combining several ring oscillators to form an oscillator array. Each delay cell of the ring oscillators consists of two input stages. One of the input stages is used to connect to the output of another delay cell to form a ring oscillator. Another input stage is used to connect to the output of the upper ring oscillator to make each ring oscillator correlative. The effects of each input to the output of delay cells are equal. Therefore, each ring oscillator oscillates by itself and the degree of their output phases is dominated by themselves and the upper ring oscillator. That is the basic design concept of twodimensional array oscillators. The phase resolution is increased by the number of coupled rings at the expense of larger chip area and power dissipation. If the number of the delay cells in a ring oscillator is N, the number of the ring oscillators is M, and the delay cells are signal-ended, the total available output phases could be obtained by

Available Output Phases = 
$$M \cdot N$$
 (2.4)

And, the phase resolution is improved by M.

The most important characteristic of the two-dimensional array oscillators is symmetry. Each delay cell experiences the same circuit configuration at the output port around the array oscillator. Therefore, if the array oscillator operates in a stable state, the input signals of each delay cell must be very similar. That means the operation frequency of each ring oscillator are the same, and the delay time of each delay cell is the same, too. Two possible operation states of the array oscillator could satisfy the above descriptions. One of the operation states is that there is no phase difference between the input signals of the delay cell. Another is that there are equally-spaced phase differences between the input signals.

The number of stable operation mode of the two-dimensional array oscillators is an important issue. If the stable operation mode is not unique, the sorting order of the output phases is not unique. That means the degree of the output phases could not be predicted. If the output phases connect to outer circuits to control their timing sequences in a system, the uncertainty may damage the normal operation of the system. Therefore, the stable



Figure 2.5: A 3x3 coupled ring oscillator array.



Figure 2.6: Operation modes of a 3x3 coupled ring oscillator array.



operation mode must be unique in the two-dimensional array oscillator design.

In order to avoid the sorting order problem of the output phases in the two-dimensional array oscillator design, the delay cells with three input stages are used [30] [31] [32] [33]. The additional input stage is used to connect to the output of the lower ring oscillator. The additional input stage is used to enlarge the input phase difference of delay cells. Fig. 2.5 shows a 3x3 coupled ring oscillator array with three input stages fully-differential delay cells. In this architecture,  $T_0$  connects  $B_1$ ,  $T_1$  connects  $B_2$ , and  $T_2$  connects  $B_0$ . The degree of each output phase could be calculated easily. Fig. 2.6 shows the possible operation modes of the 3x3 coupled ring oscillator array. In Mode 0, the input phase difference is 40°, and in Mode 1, the input phase difference is 280°. Due to the cancellation effect, Mode 1 could not be a stable operation mode. Therefore, it consists of a unique stable operation mode, Mode 0, and outputs eighteen output phases with predictable sorting order.



Figure 2.8: 4X phase interpolation with phase blender circuits.



Figure 2.9: Phase blender for phase-resolution improvement. (a) simple model of phaseblending inverters, (b) phase-blender output with w = 0.5, (c) phase-blender output with optimal w.

#### 2.3.2 Phase Interpolation

Phase interpolation by using phase interpolators is the most popular method to increase the number of available output phases in multi-phase clock generator design [34] [35] [36]. Phase interpolators combine two clock waveforms of different phases to generate a new one. The resulting phase is determined by the combination weighting of the two inputs. In CMOS technologies, phase interpolators are usually realized using two source-coupled pairs sharing the same output port; and the ratio of their tail currents set the combination weighting.

Fig. 2.7 shows the schematic of the phase interpolator with 360° tuning range. The phase difference between  $\phi_I$  and  $\phi_Q$  is 90°. The resulting output phase,  $\phi_{out}$  could be obtained by

$$\phi_{out} = \alpha_1 \cdot \phi_I + \alpha_2 \cdot \phi_Q \tag{2.5}$$

where  $\alpha_1$  depends on  $I_1 - I_2$ , and  $\alpha_2$  depends on  $I_3 - I_4$ . In order to maintain constant output amplitude, the weighting factors from Equation 2.5 should fulfill the following

#### 2.3. EXTRA CLOCK PHASE GENERATION

equations.

$$\alpha_1^2 + \alpha_2^2 = \text{constant}$$
(2.6)

$$I_1 + I_2 = I_3 + I_4 = \text{constant}$$
 (2.7)

However, the relationship between the output phase and the current ratio is not linear and sensitive to other factors such as source-coupled pair's transconductance characteristics, waveform shape of the inputs, and the pole frequency of the output port.

Phase accuracy could also be improved by using cascaded arrays of identical phase interpolators with fixed combination weighting [37] [38] [39]. In this scheme, each phase interpolator is optimized to produce an output whose phase is located at the center of the two input phases.

Fig. 2.8 shows the block diagram of a 4x phase interpolation with phase blender circuits. The phase blender circuit is basically an equally-weighted phase interpolator. By the characteristic of equally-weighted sum, the degree of each new phase is the half of the degree sum of its two input phases ideally. Fig. 2.9(a) shows the simple model of phase-blending inverters. Due to the effect of the pole frequency of the output port and input waveform shapes, the output phase of the phase blender shown in Fig. 2.9(b) is not ideal. The output of the phase blender could be expressed as

$$V_o(t) = V_{DD} + R \cdot I \cdot \left[ w \cdot u(t) \cdot (e^{-\frac{t}{RC}} - 1) + (1 - w) \cdot u(t - t_d) \cdot (e^{-\frac{t - t_d}{RC}} - 1) \right]$$
(2.8)

where  $t_d$  is the timing difference of the two inputs,  $\phi_A$  and  $\phi_B$ . The position of the output phase could be optimized by varying *w*.

If ignoring the delay of phase blender circuits and buffers, the ideal degree of each output phase could be obtained as follows.

$$\phi_{A,0}^{''} = \phi_{A,0}^{'} = \phi_A \tag{2.9}$$

$$\phi_{A,1}^{''} = \frac{\phi_{A,0} + \phi_{A,1}}{2} = \frac{3}{4}\phi_A + \frac{1}{4}\phi_B$$
(2.10)

$$\phi_{A,2}^{''} = \phi_{A,1}^{'} = \frac{1}{2}\phi_A + \frac{1}{2}\phi_B$$
(2.11)

$$\phi_{A,3}^{"} = \frac{\phi_{A,1}^{'} + \phi_{B,0}^{'}}{2} = \frac{1}{4}\phi_A + \frac{3}{4}\phi_B$$
(2.12)

$$\phi_{B,0}^{''} = \phi_{B,0}^{'} = \phi_B \tag{2.13}$$
If more phase blenders are inserted into Fig. 2.8, the more available output phases are generated. However, if there are more phase blenders, the clock paths would be longer. That means there are more noise sources in each clock path. And, if output phase of the first-stage phase blender is not accurate, the output phases of the following ideal phase blenders are not accurate, too.

#### 2.3.3 Delay-Locked Loop Array

Another method which could overcome the limitation of the minimum delay time of delay cells and to output more available phases is using delay-locked loop array [40] [41] [42]. Fig. 2.10 shows the block diagram of a delay-locked loop array. It consists of a master delay-locked loop and F slave delay-locked loops. The slave delay-locked loops have the same configuration. There are M delay cells in the master delay-locked loop, and there are N delay cells in each slave delay-locked loops. F is the submultiple of M. That means each input reference clock of slave delay-locked loops are equally-spaced. The total output phases of the delay-locked loop array could be obtained by

Total Output Phases = 
$$F \cdot N$$
 (2.14)

The delay step in the master delay-locked loop is

$$t_{d,master} = \frac{T_{ref}}{M} \tag{2.15}$$

where  $t_{d,master}$  is the delay time of the delay cells in the master delay-locked loop, and  $T_{ref}$  is the period of input reference clock. The delay step in the slave delay-locked loops is

$$t_{d,slave} = \frac{T_{ref}}{N} \tag{2.16}$$

where  $t_{d,slave}$  is the delay time of the delay cells in the slave delay-locked loop. The delay resolution,  $t_{bin}$ , could be improved as

$$t_{bin} = |t_{d,master} \times \frac{M}{F} - t_{d,slave}| = \frac{|N - F|}{M \cdot N} \times T_{ref}$$
(2.17)

However, the number of available output phases is not the same as the number of the total output phases. That means this method would produce many phases with the same degree.



Figure 2.10: A delay-locked loop array.

For example, if F = 2 and N = 4, the DLL<sub>1</sub> would produce four phases, 0°, 90°, 180°, 270°, and the DLL<sub>2</sub> would also produce four phases, 180°, 270°, 0°, 90°. The number of total output phase is eight, but the number of available output phases is four. In order to reduce the waste of the phases with the same degree, the value of M and N should be carefully selected.

### 2.4 Phase Accuracy Enhancement Technique

Due to the process variation, defects of substrate, and so on, the delay cells and the MOSFETs are mismatch. The phase difference of each two adjacent output phases of multi-phase clock generators would not be equal and then phase errors occur. In order to improve the quality of the output phases of the clock generators, many techniques are presented in many journals [43] [44] [45]. They are often using a calibration loop to calibrate their output phases by themselves.

In this section, three methods for improving the quality of the output phases of multiphase clock generators are introduced. They are self-calibrated phase-locked loop, selfcalibrated delay-locked loop, and shifted-averaging voltage-controlled delay line.

41111

#### 2.4.1 Self-Calibrated Phase-Locked Loop

Fig. 2.11 shows the block diagram of a self-calibrated phase-locked loop [46] [47]. It is basically a traditional phase-locked loop. The major difference of the self-calibrated phase-locked loop and traditional phase-locked loops is that the delay time of each delay cells in the self-calibrated phase-locked loop is not only controlled by the output of the major charge pump circuit, CP1, but also the minor charge pump circuit, CP2. The delay cells consist of two adjustable output loads. One of the output loads controlled by CP1 is used to maintain the frequency control ability as a traditional phase-locked loop. Another output load is used to fine tune the output phase. The divide ratio of the frequency divider is M + k/8. The output of the voltage-controlled oscillator would be divided by M first, and then be re-sampled by the eight output phases. If k = 1, the switch box would output the re-sampled output phases in proper sequence. At the same time, the control signals of



Figure 2.11: A self-calibrated phase-locked loop.

the switch box would also control the CP2's output switches to fine tune each delay cell. Finally, phase errors are calibrated.

This analog fine tune method could eliminate phase errors completely. However, if k is the submultiple of the number of the output phases of the voltage-controlled oscillator, several phases could never be selected. That means the unselected phases would never be calibrated. And, if the phase selection paths are mismatch, the systemic phase errors occur. Every time when the delay cell is fine tuned, the calibration loop should halt to wait for the phase-locked loop locked again. Therefore, the calibration period spends much time. If the operation frequency of the system is slow, the leakage current of the capacitors at the output node of the operational amplifiers would damage the performance of the calibration loop.

all the

## 2.4.2 Self-Calibrated Delay-Locked Loop

By inserting additional delay cells into the output phase paths of delay-locked loops, the quality of the output phase could be improved by fine tuning the delay time of each additional delay cell [48] [49] [50]. The available calibration methods could be identified as two classifications. One of the classifications is analog control. Another is digital control. The calibration methods using analog control are often realized by using the basic design concept of delay-locked loops. The major issues of this method are the matching of the calibration paths and complex layout. The calibration methods using digital control are often realized by using the basic concepts of statistics. The major issue of this method is that the resolution and linearity of delay cells dominates the performance and costs of the method.

Fig. 2.12 shows a self-calibrated delay-locked loop. Each output phase of the delaylocked loop connects to a delay cell. The delay time of the delay cells could be adjusted individually by the control signals. The delay comparator is used to judge if the phase differences between three input phases are equal. If the phase differences are not equal, the delay comparator would vary its output to increase or decrease the delay time of the delay cells. Finally, each clock path is calibrated and the phase difference between each



Figure 2.12: A self-calibrated delay-locked loop.



Figure 2.13: A eight-stage shifted-averaging voltage-controlled delay line.

phase are equal.

$$\phi_1 - \phi_2 = \phi_n - \phi_1 \tag{2.18}$$

$$\phi_2 - \phi_3 = \phi_1 - \phi_2 \tag{2.19}$$

$$\phi_3 - \phi_4 = \phi_2 - \phi_3 \tag{2.20}$$

$$\phi_4 - \phi_5 = \phi_3 - \phi_4 \tag{2.21}$$

$$\phi_n - \phi_1 = \phi_{n-1} - \phi_n \tag{2.22}$$

The major issue of this method is the stability of the calibration loops. Because the delay tuning of each delay cell would affect the results of other delay comparators, the whole calibration loop may not convergence.

. . .

### 2.4.3 Shifted-Averaging Voltage-Controlled Delay Line

Shifted-averaging voltage-controlled delay line could also be used to decrease phase errors [51] [52] [53]. The architecture is an excellent invention because it does not require

any additional circuits to improve the quality of its output phases. It is basically a traditional delay line, but the connection method is different. If the delay line is in locked, the total delay of the delay line is a clock period, and the reference clock possesses a 50% duty cycle, the delay line would output two sets of output phases with the same degree due to the use of fully-differential delay cells. By exchanging the connection of the phases with the same degree, the INL of the output phases of the delay line could be improved by  $\sqrt{2}$ .

Fig. 2.13 shows an eight-stage shifted-averaging voltage-controlled delay line. If the above conditions are satisfied, the degree of its output phases is shown in follows.

$$\phi_{1+} = \phi_{5-} \quad ; \quad \phi_{5+} = \phi_{1-} \tag{2.23}$$

$$\phi_{2+} = \phi_{6-} \quad ; \quad \phi_{6+} = \phi_{2-} \tag{2.24}$$

$$\phi_{3+} = \phi_{7-} \quad ; \quad \phi_{7+} = \phi_{3-} \tag{2.25}$$

$$\phi_{4+} = \phi_{8-} \qquad ; \qquad \phi_{8+} = \phi_{4-} \tag{2.26}$$

By exchanging the connections of  $\phi_{1-}$ ,  $\phi_{2-}$ ,  $\phi_{3-}$ ,  $\phi_{4-}$ ,  $\phi_{5+}$ ,  $\phi_{6+}$ ,  $\phi_{7+}$ , and  $\phi_{8+}$ , the ability of phase errors reducing in the delay line is established.

This method could smooth phase errors with a fixed improvement factor,  $\sqrt{2}$ . In order to achieve the maximum improvement, the symmetry in layout is very important.

### 2.5 Jitter in Clock Generator

In clock generators, the operation of circuits affected by noise is unavoidable and then jitter occurs. When jitter occurs, the quality of the phases of output clocks produced by clock generators would be reduced. Therefore, jitter is the most important index factor of the performance of clock generators.

There are two reasons which would cause jitter. One of the reasons is that the input reference clock signal is poor quality. Another comes from the noise sources of MOSFETs in clock generators. The noise sources of MOSFETs are the internal noises including thermal noise, shot noise, and flicker noise, and the external noises including supply noise coupling and substrate noise coupling. In the clock generator design, the delay cells with

high anti-noise ability and good architecture of the clock generator should be chosen for reducing jitter.

The architectures of clock generators are introduced in previous sections. The cause of jitter includes the jitter of the input reference clock and the noise sources in clock generators [54]. No matter which architecture is chosen, it must include a low-pass filter. Therefore, the closed-loop transfer function between the input reference clock and output clocks is a low-pass function. If the bandwidth of the phase-locked loop is narrower, the jitter of the input reference clock could be ignored easily. But, the quality of output clocks would be limited by the noise sources in the clock generator. In this case, if output phases shift a little, the loop system can not fix it immediately. In the wide bandwidth phase-locked loop, the phase shift could be fixed faster. However, if the bandwidth is very wide, the jitter of the input reference clock can not be ignored anymore. Therefore, the system performance is limited by the jitter of the input reference clock can not be ignored anymore. Therefore, the substitute of the stability of the loop system and the two factors, the bandwidth of phase-locked loops should be chosen as 1/10 times to 1/20 times of the frequency of the input reference clock. The delay-locked loop has not the problem of phase error accumulation. Therefore, its bandwidth could be narrower.

In many documents, the detail analysis of the jitter produced by delay cells had been introduced, and the design concepts of delay cells had been pointed out [55] [56] [57] [58] [59] [60].

- 1. Jitter is an inverse proportion to the  $V_{GS} V_t$  of input MOSFETs.
- 2. With a fixed delay time, if the power consumption of delay cells is larger, then the jitter would be smaller.
- 3. The charging and discharging paths of delay cells for output should be symmetric to suppress the high frequency up-conversion phenomenon of the flicker noise.
- 4. With a fixed power consumption and operation frequency, if the number of cascaded delay cells is more, the jitter produced by fully-differential delay cells is worse. But, the jitter produced by single-ended delay cells would be the same.

With these rules, a delay cell with good anti-noise ability could be designed.

In several circuit environments, the jitter of output clocks would be dominated by the noise of power supply and substrate. Because the jitter produced by the internal noise sources is few femto seconds. However, the jitter produced by power supply noise and substrate noise is few pico seconds. Therefore, how to design a delay cell which is not so easy to be affected by the external noise sources is very important.

The effect of the external noise sources could be identified as two classifications. When the DC level of the power supply voltage is changed, the delay time of delay cells would be different. If the change of the DC level is caused by noise, the phases of output clocks would be shifted. This classification is called static supply noise sensitivity. The measurement method is to measure the variation of delay time when the DC level of the power supply voltage changes. When the variation is smaller, the sensitivity is smaller. Another classification is called dynamic supply noise sensitivity. When the power supply voltage is changed temporarily, the delay time of delay cells would be different temporarily, too. The measurement method is to measure the variation of the delay time when the power supply voltage is an AC function. Because the variation effect of the substrate voltage is the same as the power supply voltage, the same measurement methods can be used for judging the sensitivities caused by the substrate noise.

## 2.6 Summary

Multi-phase clocks are usually generated by phase-locked loops and delay-locked loops. The voltage-controlled oscillator and the voltage-controlled delay line are the most important components of the clock generators to generate multi-phase clocks. They can be constructed by many different components with different methods. In CMOS techniques, the most popular design methods of the voltage-controlled oscillator and the voltage-controlled controlled delay line are cascading delay cells as a ring or a string.

"In man

The maximum number of the output phases of multi-phase clock generators is limited by the required operation frequency of the clock generator and the minimum delay time of a delay cell. The methods which can overcome the limitation can be identified as two classifications, phase interpolation by using phase interpolator and array structure by combining several voltage-controlled oscillators or voltage-controlled delay lines. Three major methods to increase the output phases of multi-phase clock generators are twodimensional array oscillator, phase interpolation, and delay-locked loop array.

Due to the process variation, defects of substrate, and so on, the delay cells and the MOSFETs are mismatch. The phase difference of each two adjacent output phases of multi-phase clock generators would not be equal and then phase errors occur. The methods which can calibrate the phase errors are often using a calibration loop to calibrate their output phases by themselves. Three major methods for improving the quality of the output phases of multi-phase clock generators are self-calibrated phase-locked loop, self-calibrated delay-locked loop, and shifted-averaging voltage-controlled delay line.



## **Chapter 3**

## **Phase Processing Using Resistor Strings**

and there

### 3.1 Introduction

In this chapter, a novel multi-phase clock generation technique which could be used to smooth the phase errors caused by the mismatch of the delay cells of multi-phase clock generators is introduced. This phase averaging technique requires a multi-phase clock generator and resistors. Each resistor connects to two adjacent phases produced by the clock generator to form a resistor string (R-string) or a resistor ring (R-ring). The R-string and R-ring provide a current path to distribute the error currents induced by the phase errors into each output node of the clock generator. By the distribution of error currents the quality of the output phases could be improved. The R-string and R-ring could also be used to generate more output phases by phase interpolation. How this technique improves the quality of the output phases, and the penalties for using resistors would be introduced in the following sections. The technique of frequency multiplication using R-rings and folders is also presented.

## 3.2 Phase Averaging Using R-String

In the previous chapter, the quality factors, INL and DNL, are introduced. In order to achieve the better quality when designing a multi-phase clock generator before, the placement of MOSFETs in layout should be symmetric perfectly or using other control cir-



Figure 3.1: A simple delay line.

cuits fine tunes the output phases. However, the above methods would make the system more complex and increase the difficulty of layout. Therefore, a novel phase averaging technique is developed to decrease the phase and phase difference errors caused by the mismatch of delay cells and MOSFETs.

Fig. 3.1 shows a traditional delay line consisting of multiple delay cells. Assuming that all of the delay cells are identical, the output of each delay cell would produce an equally-spaced phase. That means the phase difference between each two adjacent output phases are the same. However, due to the mismatches among the delay cells and MOS-FETs, the phase difference between two adjacent delay cells would not be the same along the delay line.

The occurrence of the phase error,  $\phi_{e,n}$ , could be visualized as an error current flowing into the output loading of the *n*-th delay cell, thus varying the output phase from its correct position. If the error current could be eliminated, decreased, or equally distributed to the output loads of each delay cell around a delay line, then the performance of the delay line should be improved. A feasible method is inserting an additional current path leading the error current to flow into its output loads and the other output loads.

Fig. 3.2 shows the schematic of a delay line coupling with a R-string whose resistor element has an identical resistance of R. The R-string introduces a spatial filtering effect on the outputs [61]. When R approaches to infinity, the interconnection between adjacent delay cells breaks and the outputs of the delay line are determined merely by their corresponding delay cells. With shrinking R, each output would begin to be affected by the



Figure 3.2: A delay line with a R-string.

neighboring ones. That is because the output currents of each delay cell would not only flow into their own loads, but also the neighboring ones via the R-string. The interaction between the outputs leads to the basic concept of phase averaging.

For most delay cell designs, the delay time is controlled by varying the equivalent output resistance  $R_o$  to change the  $R_oC_o$  product associated with the output node, where  $C_o$  is the total capacitance at the output node. This  $R_oC_o$  product will be changed if the R-string is added. In addition, the value of R is small comparing to  $R_o$  so as to achieve good phase averaging effect. This will lead to a reduction in the controllable range of delay time. Thus, it is necessary to separate the function of delay time control from the R-string's phase averaging function.

Fig. 3.3 shows a schematic consisting of a delay line, R-string, and isolation buffers. The buffers isolate the delay line from large resistive and capacitive loadings, and inherit the phase information from their corresponding delay-cell output. The output resistance of the buffers should be high to attain the strongest averaging effect offered by the R-string [61] [62] [63]. The control of the delay line is separated from the R-string. The output phase errors due to the delay-cell mismatches and buffer mismatches can also be reduced by the R-string.

The voltage waveform at each output node in Fig. 3.3 is determined not only by the output currents of the neighboring isolation buffers, but also by the resistive and capacitive



Figure 3.4: A simplified model for analyzing a delay line with a R-string.

loadings at the output nodes. In order to quantitatively analyze the circuit's behavior, the simplified model shown in Fig. 3.4 is used. The capacitance at each output node is C. The buffers are modeled as ideal current sources with sinusoidal output currents expressed as:

$$I_{i,x} = I_A \sin(\omega_i t + \phi_{i,x})$$
  $x = 0, \pm 1, \pm 2, \cdots$  (3.1)

where  $I_A$  is the current amplitude,  $\omega_i$  is the clock frequency, and  $\phi_{i,x}$  is the clock phase. The output current flowing into the output capacitor is  $I_{o,x}$ . As derived in Appendix B, with  $I_{i,x} = 0$  for  $x \neq 0$ , the frequency response of the single buffer current  $I_{i,0}$  to the output current  $I_{o,x}$  can be expressed as:

$$A_{I}(\beta, x) = \frac{I_{o,x}}{I_{i,0}} = \frac{(-4j/\beta)^{|x|}}{\sqrt{1 - 4j/\beta} \left(1 + \sqrt{1 - 4j/\beta}\right)^{|2x|}}$$
(3.2)

where

is the input frequency normalized by the RC product of the R-string.

Fig. 3.5 shows the frequency response of  $|A_I(\beta, x)|$  at different locations, i.e., x = 0,  $\pm 1$ , and  $\pm 2$ . Data from both calculation using Equation 3.2 and simulation using SPICE are shown, thus verifying the validity of Equation 3.2. At x = 0, the transfer gain  $|A_I(\beta, 0)|$  is increased for larger  $\beta$ , i.e., at a higher clock frequency, more current flows into the capacitor directly connected to the signal source. For  $\beta > 10$ ,  $|A_I(\beta, 0)|$  approaches 1 and the R-string loses its phase averaging capability. Therefore, a smaller  $\beta$  is good for phase averaging.

Fig. 3.6 and Fig. 3.7 show the space response of  $A_I(\beta, x)$ , i.e., magnitude and phase responses at different locations, for  $\beta = 1$ , 1/10, and 1/100. For smaller  $\beta$ , the buffer current is distributed more evenly to the neighboring output capacitors, resulting in a stronger phase averaging effect.

By neglecting R-string's boundary conditions and assuming all sinusoidal current inputs have an identical amplitude of  $I_A$ , the voltage on the output nodes can be computed



Figure 3.6: R-string's space response of magnitude for  $\beta = 1, 10^{-1}, 10^{-2}$ .

 $k = -\infty$ 



$$V_{o,x} = \frac{1}{j\omega_i C} \times \sum_{k=-\infty} I_{i,k} \times A_I(\beta, x-k)$$

$$= \frac{I_A}{j\omega_i C} \times \sum_{k=-\infty}^{+\infty} \left[ |A_I(\beta, x-k)| \times \sin(\omega_i t + \phi_{i,k} + \measuredangle A_I(\beta, x-k)) \right]$$
(3.4)
(3.5)

As expressed in Equation 3.5, each output voltage on the R-string,  $V_{o,x}$ , is a summation of sine waves with different amplitude and phases. If the current inputs,  $I_{i,x}$  for all x, are sine waves with identical frequency, the resulting  $V_{o,x}$  is still a pure sine wave but with different phase at different x locations. The phase of  $V_{o,x}$  can be defined as the relative position of its zero crossing in the clock period.

The spatial convolution of Equation 3.5 provides the necessary mechanism for phase averaging. Assume the input phases in Equation 3.1 are uniformly spaced and can be expressed as:

$$\phi_{i,x} = -x \cdot \Delta \phi + \phi^e_{i,x} \tag{3.6}$$



where  $\Delta \phi$  is the nominal phase difference between two adjacent inputs, and  $\phi_{i,x}^{e}$  is the phase deviation at *x*-buffer's input. Then, the phase error,  $\phi_{o,x}^{e}$ , at the *x*-th buffer's output can be approximated by:

$$\phi_{o,x}^{e} \approx \frac{\sum_{k=-\infty}^{\infty} \left( M_{k} \times \phi_{i,x-k}^{e} \right)}{\sum_{k=-\infty}^{\infty} M_{k}} \qquad \text{where } M_{k} = |A_{I}(\beta, k)| \tag{3.7}$$

Equation 3.7 is obtained by letting  $V_{o,x} = 0$  in Equation 3.5. The sine functions are expanded in Taylor's series, and only the first-order terms are kept in the derivation of Equation 3.7. Due to symmetric circuit topology, we have  $M_{+k} = M_{-k}$ . The phase response of  $A_I(\beta, x)$  is not included in Equation 3.7, since it causes only a constant phase shift for  $V_{o,x}$  at all x locations in this first-order approximation. From Equation 3.7, it is necessary to have  $M_k \simeq M_0$  so that the inputs of neighboring  $\pm k$ -buffer can reduce the phase error,  $\phi_{o,x}^e$ , more effectively. Thus, the  $\beta$  of Equation 3.3 needs to be small enough for the R-string averaging to be effective.

Let the input phase errors,  $\phi_{i,x}^e$  for all x, be independent Gaussian variables with a

#### 3.2. PHASE AVERAGING USING R-STRING

mean of zero and a variance of  $\sigma(\phi_i^e)$ . Then, the output phase errors,  $\phi_{o,x}^e$  for all x are also Gaussian, and their variance,  $\sigma(\phi_o^e)$ , can be expressed as:

$$\sigma(\phi_o^e) = \sigma(\phi_i^e) \times \left[\frac{\sum_{k=-\infty}^{\infty} M_k^2}{\left(\sum_{k=-\infty}^{\infty} M_k\right)^2}\right]^{1/2} = \sigma(\phi_i^e) \times \mathcal{R}_{\text{INL}}$$
(3.8)

The ratio,  $\mathcal{R}_{INL} = \sigma(\phi_o^e)/\sigma(\phi_i^e)$ , is the R-string's reduction factor for the output phase's integral nonlinearity (INL) due to the averaging effect. Fig. 3.8 shows the plot of  $\mathcal{R}_{INL}$  ratio versus  $\beta$ . Data from both calculation using Equation 3.8 and simulation using SPICE are shown. An R-string with a  $\beta$  of 1/100 is required to obtain an INL reduction factor of 1/10. Due to the use of 1st-order approximation in Equation 3.7, deviation between calculation and simulation is revealed in Fig. 3.8.

The differential nonlinearity (DNL) for the input can be defined as  $\Delta \phi_{i,x}^e = \phi_{i,x}^e - \phi_{i,x+1}^e$ . The DNL for the output can be defined as  $\Delta \phi_{o,x}^e = \phi_{o,x}^e - \phi_{o,x+1}^e$ . Again, let the input phase errors,  $\phi_{i,x}^e$  for all *x*, be independent Gaussian variables with a mean of zero and a variance of  $\sigma(\phi_i^e)$ . Then both  $\Delta \phi_{i,x}^e$  and  $\Delta \phi_{o,x}^e$  are Gaussian for all *x*. Their variances,  $\sigma(\Delta \phi_i^e)$  and  $\sigma(\Delta \phi_o^e)$ , can be calculated using Equation 3.7. The R-string's reduction factor for the output phase's DNL can be expressed as:

$$\mathcal{R}_{\text{DNL}} = \frac{\sigma(\Delta \phi_o^e)}{\sigma(\Delta \phi_i^e)} = \frac{\sqrt{2}}{2} \times \left[ \frac{\sum_{k=-\infty}^{\infty} (M_k - M_{k-1})^2}{\left(\sum_{k=-\infty}^{\infty} M_k\right)^2} \right]^{1/2}$$
(3.9)

Fig. 3.9 shows the plot of  $\mathcal{R}_{DNL}$  ratio versus  $\beta$ . Data from both calculation using Equation 3.9 and simulation using SPICE are shown. Comparing Fig. 3.9 to Fig. 3.8, the R-string is more effective in improving DNL than improving INL. This is expected from the spatial convolution function.

In addition to averaging, it is also necessary to consider the magnitude of voltage swing on the R-string, which must be sufficiently large to drive the succeeding circuitry. The peak-to-peak value of  $V_{o,x}$ , defined as  $V_{o,pp}$ , can be approximated by using Equation 3.5 with  $\omega_i t + \phi_{i,x} + \measuredangle A_I(\beta, 0) = \pm \pi/2$ . Fig. 3.10 shows the results for different values of  $\beta$  and input phase spacing,  $\varDelta \phi$ . Data from both calculation using Equation 3.5 and simulation using SPICE are shown. In Fig. 3.10, the  $V_{o,pp}$  is normalized by the peak-topeak output voltage when  $R = \infty$ ,  $V_{o,\infty}$ . The  $V_{o,\infty}$  can be simply expressed as:

$$V_{o,\infty} = \left| \frac{I_A}{j\omega_i C} \right| \tag{3.10}$$



where  $I_A$  is the current amplitude of all  $I_{i,x}$  inputs. For a diminishing value of  $\beta$ , the response from any current input,  $I_{i,x}$ , to any output node becomes identical, i.e.,  $A_I(\beta, x) \approx A_I(\beta, x - k)$  for all k. As a result, every sine term in Equation 3.5 is canceled by another sine term with almost identical magnitude and  $\pm \pi$  phase difference, thus resulting in a reduced  $V_{o,pp}$ .

For a given  $\omega_i$  and C,  $\beta$  can be reduced by using a smaller R to enhance the averaging effect. But a smaller  $\beta$  also results in decreasing voltage swing on the R-string,  $V_{o,pp}$ . Then, it is necessary to increase the buffer current,  $I_A$ , to restore  $V_{o,pp}$ . In other words, averaging effect can be enhanced by reducing R, but at the expense of more power dissipation so as to maintain the voltage swing on the R-string.

The results of Fig. 3.10 can be approximated by the curve fitting function of MATLAB as the following equation.

$$\frac{V_o(\beta)}{V_o(\infty)} \approx \left| \frac{1 + j/\beta}{(1 - j/\beta)[1 - j/(\beta \cdot \frac{1}{40} \cdot (\frac{2\pi}{\Delta\phi})^2 + 0.1)]} \right|$$
(3.11)



The maximum error of -1.9 dB occurs when  $\Delta \phi = 2\pi/2$  in this approximation equation.

## 3.3 Phase Interpolation Using R-String

As illustrated in Fig. 3.11, the R-string can also be used for phase interpolation. The outputs of the buffers are connected using a R-string. There are N identical resistors between the B1 and B2 buffers. Thus, additional N-1 clock phases are generated from the two original periodic waveforms with different phases of  $\phi_{i,a}$  and  $\phi_{i,b}$ . The desired interpolated phases are:

$$\phi_{o,x} = \phi_{o,0} - \delta \phi \times x \qquad 1 \le x \le N - 1 \tag{3.12}$$

where  $\delta \phi = (\phi_{o,0} - \phi_{o,N})/N$ . Ideal phase interpolation can be achieved only if waveforms of  $V_{o,0}$  and  $V_{o,N}$  in time domain are two parallel lines. Larger phase difference between  $V_{o,0}$  and  $V_{o,N}$  together with sharp transition of the rising/falling edges can lead to poor accuracy in phase interpolation. Once the errors in interpolation due to waveform shape



is minimized by choosing a smaller phase difference between the input buffers and increasing the rise/fall times of the voltage waveforms on the R-string, the RC delay of the R-string ultimately dominates the error in phase interpolation.

Consider only the phase interpolation error due to the RC effect. The voltage on the R-string's internal nodes can be approximated by:

$$V_{o,x} = \sum_{k=-\infty}^{\infty} [V_{o,-kN} \cdot H(x+kN)]$$
(3.13)

where

$$H(x) = e^{-|x|\sqrt{\beta/2}} \exp\left(-j|x|\sqrt{\frac{\beta}{2}}\right)$$
(3.14)

Equation 3.14 represents the magnitude and phase responses of an infinite RC ladder network with a voltage signal source connected to the x = 0 node. Equation 3.14 was obtained by curve fitting the data from SPICE simulations of a RC ladder network. Assuming sinusoidal inputs and using the first-order approximation similar to one described



Figure 3.12: Phase error of a 16X R-string phase interpolator.

in the previous Section, the phase at node X can be expressed as:

$$\hat{\phi}_{o,x} \approx \frac{\sum_{k=-\infty}^{\infty} \{ |H(x+kN)| \cdot [\phi_{o,-kN} + \mathcal{L}H(x+kN)] \}}{\sum_{k=-\infty}^{\infty} |H(x+kN)|}$$
(3.15)

where  $\beta = \omega_i \times RC$  as defined in Equation 3.3. From Equation 3.12 and Equation 3.15, the phase error,  $\phi_{o,x}^e = \hat{\phi}_{o,x} - \phi_{o,x}$ , can then be obtained by:

$$\phi_{o,x}^{e} = \hat{\phi}_{o,x} - \hat{\phi}_{o,0} + \delta\phi \times x \tag{3.16}$$

Fig. 3.12 compares the calculation results using Equation 3.16 with the simulation results. The R-string phase interpolator in Fig. 3.11 with N = 16 is used as an example. The current buffers, B1, B2, ..., output sinusoidal currents, and the phase difference between  $\phi_{o,0}$  and  $\phi_{o,16}$  is  $2\pi/16$ . Notably, the maximum phase error occurs around x = N/2. For a larger value of  $\beta$ , the phase error caused by the RC delay is more noticeable. In this case, a  $\beta$  on the order of  $10^{-4}$  is required to obtain a maximum phase error less than  $0.5 \times \delta \phi$ .



In Fig. 3.3 and Fig. 3.11, the mechanism of phase averaging and interpolating is accomplished by using R-string to distribute each buffer's output current to its neighboring output nodes. It is assumed that all buffers along the R-string experience the same circuit configuration at the output port, so that transfer functions such as Equation 3.2 are identical for all buffers. However, at locations near both terminals of a R-string, the above assumption is no longer valid, and systematic phase errors occur. This edge-distortion phenomenon can be eliminated by using a R-ring. As shown in Fig. 3.13, if the phase shift along the delay line spans a full clock period, the two terminals of a R-string can be connected seamlessly to form a ring. Then all buffers (not shown in the figure) see the same output circuit configuration regardless of their locations. It is obvious that an oscillator can be formed by shorting the  $V_i$  and  $V_o$  of the delay line, whose delay can be controlled using a phase-locked loop.



Figure 3.14: A frequency tripler using 3X folders.

## 3.5 Frequency Multipliers

Analogous to the folding ADCs [63], folders can be used as frequency multipliers. Fig. 3.14 shows a simplified schematic of a frequency tripler using R-rings and folders. The R-Ring 1 generates clock waveforms of 12 different phases,  $\phi_0 - \phi_{11}$ . The R-Ring 2 is driven by four 3-input folders, F1–F4. The circuit schematic of a folder example is shown in Fig. 3.15. Each folder combines 3 clock waveforms with equally-spaced phases, e.g.,  $\phi_0$ ,  $\phi_4$ , and  $\phi_8$ , by using 3 pairs of source-coupled pairs with their outputs driving R-Ring 2. Folders could seem current-mode exclusive-OR gates and detect the edges of inputs to change their output states. As a result, the voltage waveform on the R-Ring 2 has a frequency three times the original frequency of the waveform on R-Ring 1. More folders can be added to achieve better phase resolution on R-Ring 2. But, the folders cost large power dissipation due to current cancellation. Therefore, the more folders or the multiple of frequency multipliers used in a circuitry, the more power consumed. The jitter performance of folders is not only caused by noise, but also by the phase precision of input clocks. If the inputs are not equally-spaced phases or 50% duty cycle, the outputs of folders would produce jitter. Therefore, the use of folders must be designed with the use of R-strings or R-rings to guarantee the quality of the input phases of folders.



#### 3.6 **Summary**

Resistor strings can be used for phase averaging and phase interpolation. Phase averaging can reduce phase errors and phase interpolation can increase the number of available output phases. When clock phases spanning a full period are available for driving a resistor string, a resistor ring are preferred to mitigate the edge-distortion phenomenon. Capacitors on the resistor strings (or rings) can degrade the effectiveness of both phase averaging and phase interpolation. The design parameter  $\beta = \omega_i \times RC$  need to be carefully chosen to optimize the trade-off between phase accuracy and power dissipation. Analogous to the folding ADCs, multi-phase frequency multipliers can be constructed using R-rings and folders.

## **Chapter 4**

# An 8b 125MHz Digital-to-Phase Converter

## 4.1 Introduction



In this chapter, a multi-phase high-linearity CMOS digital-to-phase converter with the phase averaging technique and the phase interpolation technique presented in Chapter 3 would be introduced. The architecture, operation principle, analysis, and limitation of the digital-to-phase converter are introduced. The major building blocks, delay cells and isolation buffers would be introduced. Finally, a 125-MHz 8-bit digital-to-phase converter was designed and fabricated using a standard 0.35  $\mu$ m SPQM CMOS technology is demonstrated to prove the ability of the phase averaging and phase interpolation technique.

## 4.2 Architecture of Digital-to-Phase Converter

Fig. 4.1 shows the architecture of an 8b digital-to-phase converter. It could be operated at 125MHz and outputs 256 phases. The digital-to-phase converter receives a reference clock at  $V_i$  and generates a clock of the same frequency at  $V_o$  with phase controlled by the 8-bit digital control input Din[7:0]. The total number of adjustable phases is 256, which is equally spaced in one clock period. The digital-to-phase converter includes two delay



Figure 4.1: An 8b digital-to-phase converter.

lines with delay cells D1–D16 and D17–D24. All delay cells are identical and exhibit the same time delay. The delay is controlled by a delay-locked loop, so that the total delay of the first delay line, D1–D16, is one clock period and the total delay of the second delay line, D17–D24, is half clock period. At 125 MHz, one clock period  $T_s$  is 8 nsec, and one delay-cell delay  $t_d$  is 500 psec. The D1–D16 delay line produces 16 clocks with equally-spaced phases. The first ring, R-Ring 1, is added to reduce phase errors caused by mismatches among the delay cells as well as the isolation buffers. One of the clocks is selected by the MUX1 multiplexer to drive the D17–D24 delay line. The second ring, R-Ring 2, is used for phase interpolation. The MUX2 multiplexer selects one of 16 phases interpolated between the input and output signals of the D21 delay cell for the final clock output at  $V_o$ . The final timing resolution is  $t_d/16 = 31.25$  psec, which is defined as 1 LSB for this 8-bit digital-to-phase converter.

$$T = N \cdot \frac{T_s}{16} + \frac{M}{16} \cdot \frac{T_s}{16} + T_c = N \cdot t_d + \frac{M}{16} \cdot t_d + T_c$$
(4.1)

$$N = Din[7:4] = 0 \cdots 15$$
(4.2)

$$M = Din[3:0] = 0 \cdots 15^{696}$$
(4.3)

$$T_c = \text{Constant Delay}$$
 (4.4)

The number of delay cells is different between the coarse loop and fine loop. The delay line of the coarse loop delays the input reference clock one period, and the delay line of the fine loop delays its input half period, because the duty cycle of the input reference clock can not be guaranteed as 50%. If the error of the operation period occurs and the coarse loop delays the input reference clock half period, the total delay of the delay line of the coarse loop is not in half period and jitter occurs. Moreover, it would produce problems to the R-ring for phase averaging. As Fig. 4.2(a) shown, there is no problem when the duty cycle is 50% in the coarse loop which locked in half period. However, if the duty cycle is 62.5% shown in Fig. 4.2(b), the total delay time of the delay line of the coarse loop is 3ns, and the resistor ring would lose the elimination ability of edge-distortion phenomenon.

In the fine loop, only one phase section is cared. Therefore, the edge-distortion phenomenon could be ignored. The delay time of the delay cells of the fine loop is controlled by the control voltage of the coarse loop. The total delay of the delay line of the fine loop



#### 4.2.1 Fully-Differential Delay Cell

Fig. 4.3 shows a fully differential delay cell with symmetric loads. The input stage is the gates of the two N-type MOSFETs constructing a source-coupled pair. The resistive load consists of two same-sized P-type MOSFETs with parallel connection. One of the P-type MOSFET is diode-connected. The gate of another P-type MOSFET connects to the control voltage,  $V_{bp}$ . The control voltage,  $V_{bp}$ , generates another control voltage,  $V_{bn}$ , by the self-biased replica-feedback bias circuit. Therefore, the MOSFETs of the delay cell would be well-biased. The control voltage,  $V_{bp}$ , is used to control the delay time of the delay cell by varying the equivalent resistance of symmetric loads.

For high immunity to the dynamic supply noise, the I–V characteristic curves of the loads of delay cells should be linear theoretically, because the differential-mode resistor is not sensitive to the common-mode supply voltage. But, the resistive loads made by



Figure 4.4: A self-biased replica-feedback bias circuit.

MOSFETs hardly present linear characteristic, when the operation frequency range is very wide. The I–V characteristic curves of symmetric loads are not linear, too. However, the symmetric I–V characteristic curve presents good anti-noise ability as linear resistors, when the supply voltage consists of small-signal common-mode noises.

The voltage range of the symmetric load is between  $V_{DD}$  and  $V_{DD} - V_{bp}$ . That also means the upper and lower boundaries of the output amplitude of delay cells are between  $V_{DD}$  and  $V_{bp}$ , and the middle point of the amplitude is  $(V_{DD} + V_{bp})/2$ . The equivalent output resistance of the symmetric load could be calculated easily by  $V_{DD} - V_{bp}$  over the tail current of the delay cell. With the variation of control voltage,  $V_{bp}$ , the equivalent output resistance changes, too.

Fig. 4.4 shows the schematic of a self-biased replica-feedback bias circuit. The operation amplifier is made by a self-biased P-type MOSFET source-coupled pair. The current source of the operation amplifier is not chosen the cascode current source for operating in low supply voltage. By using the  $V_{bn}$  to bias the current source of the P-type MOSFET source-coupled pair, the sensitivity of the operation amplifier to supply voltage would be smaller. However, there are no current flowing through any MOSFETs could happen when the bias circuit starts up. That means there are two stable states in the bias circuit. Therefore, a start-up circuit is required to avoid the bias circuit operated in the no current state. As Fig. 4.4 shown, if there are no current in the bias circuit, the M25 of the start-up circuit is in cutoff region. The M26 whose drain connects to ground is always turned on. The W/L ratio of M26 is much smaller than one, thus it could be seem a high impedance load. And then, the  $V_{init}$  rises. The M24 would turn on, and force currents starting to flow through the bias circuit. When the bias circuit operates in the operation state, M25 is turned on. Finally,  $V_{init}$  falls and forces the start-up circuit shutdown.

There is a control voltage buffer in the bias circuit of delay cells. The control voltage,  $V_{ctrl}$ , generated by the loop filter of clock generators could be copied as  $V_{bp}$  by the control voltage buffer. Because of the buffer, the capacitance of the loop filter is easy to control, and the noise disturbance to  $V_{ctrl}$  could be reduced around routing in layout. The buffer is the same as the replica but the entire P-type MOSFET loads are diode-connected. The buffer and the replica are biased by the same circuit, thus the bias current are the same and the voltage drop are the same around the load. Therefore,  $V_{bp}$  would be equal to  $V_{ctrl}$ .

#### 4.2.2 Isolation Buffer

The phase averaging technique requires an isolation buffer to isolate the delay line from large resistive and capacitive loadings, and to inherit the phase information from their corresponding delay-cell output. The output resistance of the buffers should be high to attain the strongest averaging effect offered by the R-string [61] [63]. The fully-differential amplifier with passive load does not require a common-mode feedback circuit to fix its output common-mode voltage. Therefore, the schematic of the amplifier is simply and easy to design. However, large resistor which costs lots of layout area should be used to achieve the requirement of high output resistance in digital CMOS process. Thus, the architecture is not suitable for isolation buffer.

Another schematic is a fully-differential amplifier with active loads. With the normal sized MOSFETs, its output resistance is large enough and the MOSFETs cost small layout area. However, the mismatch of MOSFETs would affect the output common-mode voltage. A common-mode feedback circuit is required to fix the output common-mode voltage [64]. That is usually the hardest department of the fully-differential amplifier design.

Fig. 4.5 is a fully-differential amplifier with the cross-coupled and diode-connected loads [65] [66]. It is one of the classifications of fully-differential amplifiers with active loads. The cross-coupled P-type MOSFETs load could provide the local common-mode feedback without additional circuits to fix the common-mode voltage. Therefore, the complexity of circuit design is reduced. The output resistance could be analyzed with the two parts, differential-mode output resistance and common-mode output resistance. As Fig. 4.5 shown, assuming the MOSFETs of the cross-coupled and diode-connected loads are operated in saturation region, if there is a small signal voltage variation at node X and an inverse voltage variation at node Y, the differential-mode output resistance could be calculated.

$$R_{L,diff} \doteq \frac{1}{g_{m1,2} - g_{m3,4}} = \frac{1}{g_m - g_m} = \infty \text{ (if } g_{m1,2} = g_{m3,4} = g_m) \tag{4.5}$$

If all of the MOSFETs of the loads are the same sized, the differential-mode output resistance is infinity. The diode-connected MOSFETs provide a positive conductance, and the cross-coupled MOSFETs provide a negative conductance. When the two conductances



Figure 4.5: A fully-differential amplifier with cross-coupled and diode-connected loads.

are connected with parallel connection, the resulting output resistance is infinity.

On the other hand, if there is the same small signal voltage variation at node X and node Y, the common-mode output resistance could be calculated.

$$R_{L,cm} \doteq \frac{1}{g_{m1,2} + g_{m3,4}} = \frac{1}{g_m + g_m} = \frac{1}{2g_m} \text{ (if } g_{m1,2} = g_{m3,4} = g_m) \tag{4.6}$$

Although the differential-mode output resistance is very high, the common-mode output resistance is very small.

The fully-differential amplifier with cross-coupled and diode-connected loads looks very suitable for the isolation buffer design. However, all of the MOSFETs of the loads should be operated in saturation region; otherwise the above descriptions are wrong. In order to maintain the MOSFETs operated in the saturation region, the upper boundary of the output swing can not larger than  $V_{DD} - |V_{GS,PMOS}|$ . The isolation buffer is used to be the buffer between delay cells and resistor rings. The input voltage of isolation buffers is the output voltage of delay cells. The output amplitude of delay cell is between  $V_{bp}$  and  $V_{DD}$ , so the input common-mode voltage is  $(V_{DD} + V_{bp})/2$ . The lower boundary of output



Figure 4.6: A isolation buffer.

swing of isolation buffers is limited by the input common-mode voltage. Therefore, the fully-differential amplifier shown in Fig. 4.5 is not suitable in this work.

Fig. 4.6 shows a fully-differential amplifier with an output stage. It is the amplifier which is chosen to be the isolation buffer in this thesis. It includes a source-coupled pair as input stage, a cascode tail current source, current mirrors with gain, cross-couple loads, and diode-connected loads. Because of the current mirror, the output swing of the isolation buffer is enlarged between  $V_{GS,NMOS}$  and  $V_{DD} - |V_{DS,PMOS}|$  and the gain of the amplifier is improved by *K*.

If the output resistance of the isolation buffer is not infinity due to the mismatch between the cross-couple loads and diode-connected loads, the error current distribution becomes worse and then the R-ring's phase averaging capability would be reduced. However, it could be ignored when the output resistance of the isolation buffer is much larger


#### 4.3 Experimental Results

Fig. 4.1 shows the architecture of the digital-to-phase converter. At input frequency  $\omega_i = 125$  MHz, the  $\beta$  of R-Ring 1, as defined in Equation 3.3, is 1/18 with  $R = 240 \Omega$ . The individual resistor in R-Ring 1 is realized using polysilicon resistor and a dimension of 31.4  $\mu$ m by 1.6  $\mu$ m. The random mismatch between the resistors is estimated to be  $\sigma(\Delta R/R) = 0.63\%$ . The differential voltage swing on R-Ring 1 is 1.12 V. The  $\beta$  of R-Ring 2 is 1/8500 with  $R = 15 \Omega$ . The individual resistor in R-Ring 2 is realized using polysilicon resistor and a dimension of 3.9  $\mu$ m by 1.6  $\mu$ m. The random mismatch between the resistors in R-Ring 2 is realized using polysilicon resistor and a dimension of 3.9  $\mu$ m by 1.6  $\mu$ m. The random mismatch between the resistors is estimated to be  $\sigma(\Delta R/R) = 1.78\%$ . The differential voltage swing on R-Ring 2 is 1 V. Fig. 4.7 shows the phase error of the R-Ring 2 phase interpolator



Figure 4.8: Chip micrograph of the digital-to-phase converter.



Figure 4.9: Measured output jitter of the digital-to-phase converter.



Figure 4.10: (a) Measured transfer characteristics of the digital-to-phase converter. (b) Measured INL. (c) Measured DNL.



Figure 4.11: Measurement setup of the digital-to-phase converter.

from the results of both simulations and calculation of Equation 3.16. Two different sets of simulations have been performed. One used circuit with devices and interconnects extracted from layout. One used the simplified circuit model described in Chapter 3. The phase interpolator achieves a phase error less than  $0.5 \times \delta\phi$ , where  $\delta\phi = 31.25$  psec. The phase errors obtained from the post-layout simulation are larger than those predicted by Equation 3.16. This is mainly due to the fact that the clock signals are no longer sine waves.

Fig. 4.8 shows the chip micrograph. The chip area is  $0.98 \times 1.18 \text{ mm}^2$ . Fig. 4.9 shows a measured jitter histogram of the digital-to-phase converter's output at 125 MHz.

The digital-to-phase converter was fabricated using a standard 0.35  $\mu$ m single-poly quad-layer metal (SPQM) CMOS technology. Input frequency can be varied from 50 MHz to 250 MHz. Power dissipation is 110 mW from a 3.3 V supply. The measured peak-to-peak jitter of the output is 30 psec and the root-mean-square (RMS) jitter is 5.1 psec, while the RMS jitter of the input clock is 3.2 psec.

Define the digital-to-phase converter's normalized output phase with input  $D_{in} = k$  as:

$$\theta_o(k) = \frac{T(k) - T(0)}{\text{LSB}}$$
(4.7)

where T(k) is the relative time delay of the digital-to-phase converter's output. Fig. 4.10 shows the measured digital-to-phase converter's transfer characteristic, i.e.,  $\theta_o$  versus  $D_{in}$ . The digital-to-phase converter's differential nonlinearity (DNL), which is defined as  $\theta_o(k + 1) - \theta_o(k) - 1$  for k = 0, ..., 254, exhibits a similar pattern every 16 consecutive input codes, indicating some layout mismatches around the MUX2. The mismatches are mainly due to the parasitic capacitance of the interconnects. The DNL is within ±1 LSB

| Technology            | TSMC 0.35 $\mu$ m SPQM CMOS           |  |
|-----------------------|---------------------------------------|--|
| Supply voltage        | 3.3 V                                 |  |
| Input frequency       | $50 \text{ MHz} \sim 250 \text{ MHz}$ |  |
| Resolution            | 8 Bits                                |  |
| $\beta$ (coarse loop) | 1/18 @ 125 MHz                        |  |
| $\beta$ (fine loop)   | 1/8500 @ 125 MHz                      |  |
| DNL                   | ±1 LSB @ 125 MHz                      |  |
| INL                   | ±2 LSB @ 125 MHz                      |  |
| Power consumption     | 110 mW @ 125 MHz                      |  |
| Jitter                | 5.1 ps rms @ 125 MHz                  |  |
| Die area              | 0.98 x 1.18 mm <sup>2</sup>           |  |

Table 4.1: Performance summary of the 8b digital-to-phase converter.

at most of the input codes, except the recurring -1.8 LSB DNL errors every 16 input codes. The integral nonlinearity (INL), which is defined as  $\theta_o(k) - k$  for k = 0, ..., 255, is measured to be within  $\pm 2$  LSB.

Fig. 4.11 shows the measurement setup. The pulse generator generates a 125 MHz clock signal. The clock signal connects to the digital-to-phase converter chip for reference clock input, and also connects to the oscilloscope. The output clock of the chip connects to the other port of the oscilloscope. The delay measurement method is to measure the timing differences between selected phase and the reference clock.

#### 4.4 Summary

The architecture, operation principle, analysis, and limitation of a multi-phase high-linearity CMOS digital-to-phase converter with the phase averaging technique and the phase interpolation technique are introduced. In order to demonstrate the R-Ring's capability of phase averaging and interpolation, a 125-MHz 8-bit digital-to-phase converter was designed and fabricated using a standard 0.35  $\mu$ m SPQM CMOS technology. The digital-to-phase converter consists of two delay lines driving two R-rings respectively. Together, they generate 256 different clock phases. Measurement results show 8-bit resolution is possible using the R-ring technique.

## Chapter 5

# High-Resolution Phase Adjusting Technique

#### 5.1 Introduction



Phase averaging technique has been introduced in the previous chapters. It could be used to smooth the phase errors occurring around the multi-phase clock generator. In order to eliminate the phase errors completely, the design parameter  $\beta = \omega_i \times RC$  need to be very small. However, that would cost much power. If the expense of chip area is not a problem, the phase accuracy enhancement techniques introduced in Chapter 2 are useful to eliminate phase errors completely. The phase accuracy enhancement techniques are often implemented by using phase detectors or statistical analysis for determining the phase errors, and insert phase shifters into each clock output paths for fine tuning the output phases. The resolution of the phase shifters dominates the performance of the phase accuracy enhancement techniques.

In this chapter, the architecture and linearity of the phase shifters which are controlled by digital inputs would be introduced. The phase shifters could be identified as three classifications roughly. The three classifications are delay cells, phase interpolators, and the others. In this chapter, a novel high-linearity phase shifter would be introduced. It is included in the classification of delay cells. Another novel phase shifter design concept included in the classification of phase interpolators would be introduced in Appendix C.



5.2 Phase Adjusting Techniques

There are much more methods for varying phase at specific output node. The most intuitive mean is to generate lots of phase by using a multi-phase clock generator, and then outputs selected one by a multiplexer. The output quality is dominated by the matching of delay units and multiplexer selection paths. The phase resolution is determined by the phase quantization step set by the delay units. To attain phase resolution beyond the limitation, resistor-string (R-string) phase interpolating method generates additional phases from R-strings connecting each two adjacent outputs of a clock generator [67] [68]. The output quality is dominated by waveform shape and phase difference of the clock generator outputs, and the *RC* produce on the R-string. To attain quality outputs, more power dissipation is required. Another method is by cascaded symmetrical phaseblender circuit [37]. It generates additional phases by phase-blender circuits combining two fixed-weighted input phases. The output quality is dominated by phase-blending inverter relative size ratio and matching of phase-blender circuit cascading paths. To attain quality outputs, more chip area occupancy is required.



Figure 5.2: A simple digitally controlled delay element with *RC* time constant variation techniques.

To avoid the dissipation of non-selected phases, variable weighted phase interpolators shown in Fig. 5.1 are often chosen in practice. The phase interpolating method generates selected phase by weighted combining two or more clock waveforms of different phases [69] [70]. In CMOS technologies, they are usually realized using source coupled pairs sharing the same output port; and the ratio of their tail currents set the combination weighting. The linearity of the phase interpolator is dominated by waveform shape of the inputs and the pole frequency of the output port.

Resistance and capacitance variation techniques are often used to implement digitally controlled delay elements [71] [72] [73] [74]. Fig. 5.2 shows a simple digitally controlled delay element with *RC* variation techniques. It consists of two cascaded inverters to form the major signal path, and variable resistance and capacitance induced by controllable shunt MOSFETs to produce the delay tuning mechanism. By varying the *RC* product of the M1–M2 inverter, the waveform shape of  $V_c$  would be changed. Therefore, the timing delay when the  $V_c$  arrives the threshold voltage of the M3–M4 inverter could be



Figure 5.3: A simple digitally controlled delay element with current variation techniques.

ATTILLES,

controlled. The digitally controlled delay elements could provide monotonic delay tuning ability and acceptable linearity. However, the M3–M4 inverter and the following buffers would produce nonlinearity due to the waveform shape variation.

Another method [75] [76] is replaced the variable resistance with variable current source shown in Fig. 5.3. By varying the current of the M1–M2 inverter, the waveform shape of  $V_c$  would be changed. Therefore, the timing delay when the  $V_c$  arrives the threshold voltage of the M3–M4 inverter could be controlled. However, the M3–M4 inverter and the following buffers would also produce nonlinearity due to the waveform shape variation.

Fig. 5.4 shows the simulated transfer curves of 6-bits digitally controlled delay elements. All of the designs are operated at 1-GHz, maximized their tuning range, and drive the same sized M3–M4 inverter. The tuning range of variable C, I, and R are 15.8 psec, 153.7 psec, and 155.9 psec.

To improve the drawback and linearity, variable pre-charged delay unit (VPDU) which could maintain the same input waveform shape of the M3–M4 inverter is proposed. The linearity of the VPDU is primarily dominated by the selection of its control input range and the channel-length modulation parameter.



Figure 5.4: Simulated transfer curves of 6-bits digitally controlled delay elements.

# 5.3 Variable Pre-Charged Delay Unit

Fig. 5.5(a) shows the schematic of variable pre-charged delay units. The delay unit consists of two cascaded inverters, M2–M3 and M7–M8. Its total delay is adjusted by changing the charging and discharging behavior at the internal  $V_c$  node. The total capacitance at the  $V_c$  node is C. As the  $V_c$  is moved up and down between the  $V_{DD}$  and  $V_{SS}$ supply voltages, the M1 and the M2 form the pull-up path and the M3 and the M4 form the pull-down path. The M5 and the M6 are used as switches for pre-charging the  $V_c$  to either  $V_H$  or  $V_L$  dc voltages. The values of  $V_H$  and  $V_L$  are assigned such that  $V_L < V_{TH} < V_H$ , where  $V_{TH}$  is the threshold voltage of the M7–M8 inverter. The control signals,  $\phi_A$ ,  $\phi_L$ ,  $\phi_H$ , are generated from the outputs of a multi-phase clock generator using combinational logic gates.

Fig. 5.5(b) is timing diagram of the VPDU's operation. The entire operation cycle consists of 4 periods,  $T_1 \cdots T_4$ . In the  $T_1$  period, the  $V_c$  is pre-charged to  $V_L$ . In the  $T_2$  period, as the  $CK_{in}$  input falls from  $V_{DD}$  to  $V_{SS}$ , the  $V_c$  is pulled-up toward  $V_{DD}$ , and the



Figure 5.5: (a) Schematic of variable pre-charged delay units. (b) Timing diagram of the VPDU's operation.

 $CK_{out}$  output falls from  $V_{DD}$  to  $V_{SS}$ . In the  $T_3$  period, the  $V_c$  is pre-charged to  $V_H$ . In the  $T_4$  period, as the  $CK_{in}$  input rises from  $V_{SS}$  to  $V_{DD}$ , the  $V_c$  is pulled-down toward  $V_{SS}$ , and the  $CK_{out}$  output rises from  $V_{SS}$  to  $V_{DD}$ . The CK's falling-edge delay can be varied by changing  $V_L$ , while the CK's rising-edge delay can be varied by changing  $V_H$ .

The timing difference of the total delay of VPDUs between different pre-charging voltages is determined by the time delay of voltage variance of the  $V_c$  node from one pre-charging voltage to the other. If the M2 and the M3 are operated in saturation region during the  $T_2$  and  $T_4$  periods with given pre-charging voltages,  $V_{L,1}$ ,  $V_{L,2}$ ,  $V_{H,1}$ ,  $V_{H,2}$ , the time delay,  $t_d$ , from  $V_c = V_{L,1}$  to  $V_c = V_{L,2}$  and  $V_c = V_{H,1}$  to  $V_c = V_{H,2}$  could be obtained by

$$t_{d,T2} = \frac{2C}{\lambda k V_{ov}^2} \times ln \left[ \frac{1 + \lambda (V_{DD} - V_{L,1})}{1 + \lambda (V_{DD} - V_{L,2})} \right]$$
(5.1)

$$t_{d,T4} = \frac{2C}{\lambda k V_{ov}^2} \times ln \left[ \frac{1 + \lambda (V_{H,1} - V_{SS})}{1 + \lambda (V_{H,2} - V_{SS})} \right]$$
(5.2)

where  $k = \mu C_{ox} W/L$ ,  $V_{ov} = |V_{GS} - V_t|$ , and  $\lambda$  is channel-length modulation parameter. In the T2 period, those are the parameters of M2, and, in the T4 period, those come from M3. If the two MOSFETs are operated in triode region, the time delay could be obtained by

$$t_{d,T2} = \frac{C}{kV_{ov}} \times ln \left[ \frac{(V_{DD} - V_{L,1})(2V_{ov} - V_{DD} + V_{L,2})}{(V_{DD} - V_{L,2})(2V_{ov} - V_{DD} + V_{L,1})} \right]$$
(5.3)

$$t_{d,T4} = \frac{C}{kV_{ov}} \times ln \left[ \frac{(V_{H,1} - V_{SS})(2V_{ov} + V_{SS} - V_{H,2})}{(V_{H,2} - V_{SS})(2V_{ov} + V_{SS} - V_{H,1})} \right]$$
(5.4)

By diagramming the above equations, Equation 5.3–Equation 5.4 produce more nonlinearity than Equation 5.1–Equation 5.2 and there are no method to improve the nonlinearity. Therefore, in order to maintain the saturation region operation of the two MOSFETs, the lower bound of  $V_H$  should be defined as  $V_{t,M3}$ , and the upper bound of  $V_L$  is  $|V_{t,M2}|$ for quality outputs. With the ongoing advance of fabrication process,  $V_t$  becomes smaller and constricts the pre-charging voltage range. Therefore, mixed-voltage process is recommended to enlarge the voltage range.

If the channel-length modulation parameters of the M2 and the M3 of VPDUs are

small enough to be ignored, Equation 5.1 and Equation 5.2 could be rewritten as

$$t_{d,T2} = \lim_{\lambda \to 0} \left\{ \frac{2C}{\lambda k V_{ov}^2} \times \ln \left[ \frac{1 + \lambda (V_{DD} - V_{L,1})}{1 + \lambda (V_{DD} - V_{L,2})} \right] \right\} = \frac{2C}{k V_{ov}^2} \times \left[ V_{L,2} - V_{L,1} \right]$$
(5.5)

$$t_{d,T4} = \lim_{\lambda \to 0} \left\{ \frac{2C}{\lambda k V_{ov}^2} \times \ln \left[ \frac{1 + \lambda (V_{H,1} - V_{SS})}{1 + \lambda (V_{H,2} - V_{SS})} \right] \right\} = \frac{2C}{k V_{ov}^2} \times \left[ V_{H,1} - V_{H,2} \right]$$
(5.6)

The VPDUs would provide linear delay tuning mechanism.

VPDUs deal the falling-edge and rising-edge of  $CK_{in}$  with different paths. Therefore, the matching of pull-up and pull-down paths is important, if the duty cycle of  $CK_{out}$  is required to be the same as  $CK_{in}$ . The mismatches come from the difference of k,  $V_{ov}$ , and  $\lambda$  of the M2 and the M3, and the asymmetry variation of  $V_L$  and  $V_H$ . Those mismatches could be overcame by simply cascading one VPDU, one INV gate, and the same sized VPDU.

#### Service.

### 5.4 An 8-Channels 1GHz Phase-Locked Loop

Fig. 5.6 shows the block diagram of a multi-phase clock generator prototype with the VPDUs. It includes a phase-locked loop that receives a 250 MHz reference clock at  $V_i$ , and generates 8 clock outputs,  $\phi_1 \cdots \phi_8$ , with frequency at 1 GHz. The  $\phi_1 \cdots \phi_8$  clocks have different phases that equally divide one clock period. Their phase accuracy is improved by tying them together with a resistor ring (R-Ring). The R-Ring provides a phase averaging function that can reduce the phase errors due to random device mismatches [67] [68].

Each  $\phi_j$  clock is directed to a fine VPDU followed by a coarse VPDU to generate the  $V_{o,j}$  output, where  $j = 1, \dots, 8$ . Each coarse VPDU is controlled by the  $V_H$  and  $V_L$ generated from 3-bits resistor-string DACs. Each fine VPDU is controlled by the  $V_H$  and  $V_L$  generated from 6-bits resistor-string DACs. Assume  $V_{DD} = 1.8$  V and  $V_{SS} = 0$  V. The  $V_L$  is varied between 0 V and 0.5 V. The  $V_H$  is varied between 1.3 V and 1.8 V. The  $V_{TH}$ of the M7-M8 inverter in each VPDU is 0.9 V. The delay control ranges of the coarse and fine VPDUs are designed by using different device sizes for the M1–M4 transistors in the VPDUs. The use of resistor-string DACs can ensure monotonic digital-to-delay transfer functions of the VPDUs.

For each VPDU, the required control signals,  $\phi_A$ ,  $\phi_H$ , and  $\phi_L$  can be generated from



Figure 5.6: A multi-phase clock generator with variable pre-charged delay units.



Figure 5.7:  $\phi_1$ - $V_{o,1}$  signal path.



Figure 5.8: Chip micrograph of the 8-channels phase-locked loop.

 $\phi_1 \cdots \phi_8$  directly. Using the  $\phi_1$ - $V_{o,1}$  signal path as an example shown in Fig. 5.7, its  $\phi_A = \phi_3$ ,  $\phi_H = \overline{\phi_4 \cdot \phi_6}$ , and  $\phi_L = \phi_2 \cdot \phi_8$ . The two cascaded INV gates use the same size as the cascaded INV gates within fine VPDUs to maintain precise timing control.

#### 5.5 Experimental Results

Fig. 5.8 shows the chip micrograph of the clock generator prototype fabricated using a standard 0.18  $\mu$ m CMOS technology. It occupies an area of 1.1×1.3 mm<sup>2</sup> and dissipates 110 mW from a 1.8 V supply. The output frequency can be varied from 950 MHz to

#### 5.5. EXPERIMENTAL RESULTS



Figure 5.9: Measured jitter performance of the 8-channels phase-locked loop.

1.05 GHz. As Fig. 5.9 shown, the measured peak-to-peak jitter is 17.5 psec and the root-mean-square jitter is 2.29 psec.

Fig. 5.10 shows the measured transfer characteristic of a fine VPDU with the corresponding coarse VPDU set at 4 different codes. The coarse VPDUs have a delay control resolution of 8.66 psec and a control range of 60.64 psec. The fine VPDUs have a delay control resolution of 0.145 psec and a control range of 9.14 psec. Fig. 5.11 shows the measured differential nonlinearity (DNL) and integral nonlinearity (INL) of the digitalto-delay transfer function of a fine VPDU. The fine VPDUs can accept 64 different digital codes. Its LSB is equal to 0.145 psec. For the fine VPDU, the DNL is +0.93/-0.79 LSB and the INL is +0.57/-1.36 LSB. Fig. 5.12 shows the measured DNL and INL of a fine VPDU with larger control voltage range. The fine VPDU is operated in triode region when the input codes are smaller than 10–12. It means the  $V_i$  of the M2 and the M3 approximates 500 mV. Its LSB is equal to 0.187 psec. For the fine VPDU, the DNL is +0.93/-0.68 LSB and the INL is +3.56/-0.72 LSB.



Figure 5.11: Measured DNL and INL of a fine VPDU with  $0 V \le V_L \le 0.5 V$  and  $1.3 V \le V_H \le 1.8 V$ .



Figure 5.12: Measured DNL and INL of a fine VPDU with  $0 V \le V_L \le 0.6 V$  and  $1.2 V \le V_H \le 1.8 V$ .



Figure 5.13: Simulated transfer curves of 6-bits digitally controlled delay elements.



Figure 5.14: Measurement setup of the 8-channels phase-locked loop.

| Technology                                                                                                      | TSMC 0.18 μm 1P6M CMOS            |  |  |
|-----------------------------------------------------------------------------------------------------------------|-----------------------------------|--|--|
| Supply voltage                                                                                                  | 1.8 V                             |  |  |
| Operation frequency                                                                                             | 950 MHz $\sim 1.05$ GHz           |  |  |
| Tuning Range                                                                                                    | 69.78 ps @ 1 GHz                  |  |  |
| Resolution 🔊                                                                                                    | 0.145 ps @ 1 GHz                  |  |  |
| DNL 🍼                                                                                                           | ES±1 LSB @ 1 GHz                  |  |  |
| INL S                                                                                                           | ±1.4 LSB @ 1 GHz                  |  |  |
| Power consumption                                                                                               | 110 mW @ 1 GHz                    |  |  |
| Jitter 🛛 📃                                                                                                      | 2.29 ps rms @ 1 GHz               |  |  |
| Die area                                                                                                        | $1.1 \text{ x } 1.3 \text{ mm}^2$ |  |  |
| Contraction of the second s |                                   |  |  |

Table 5.1: Performance summary of the 8-channels phase-locked loop.

Fig. 5.13 shows the simulated transfer curves of 6-bits digitally controlled delay elements. All of the designs are operated at 1-GHz, maximized their tuning range, and drive the same sized M3–M4 inverter. The tuning range of variable C, I, R, and VPDU are 15.8 psec, 153.7 psec, 155.9 psec, and 103.1 psec.

Fig. 5.14 shows the measurement setup. The pulse generator generates a 250 MHz clock signal. The clock signal connects to the 8-channels phase-locked loop chip for reference clock input, and also connects to the oscilloscope. The output clock of the chip connects to the other port of the oscilloscope. The delay measurement method is to measure the timing differences between selected phase and the reference clock.

| Design     | [77]                   | [76]         | This Work          |
|------------|------------------------|--------------|--------------------|
| Process    | 0.18 μm CMOS           | 0.18 µm CMOS | 0.18 µm CMOS       |
| Supply     | 1.8 V                  | 1.8 V        | 1.8 V              |
| Frequency  | 440 MHz $\sim 1.5$ GHz | < 1 GHz      | 950 MHz ~ 1.05 GHz |
| Resolution | < 2.5 ps               | < 2 ps       | 0.145 ps @ 1 GHz   |
| Power      | 43 mW @ 1.5GHz         | Unknown      | 110 mW @ 1GHz      |
| RMS Jitter | 0.93 ps @ 1.5GHz       | Unknown      | 2.29 ps @ 1GHz     |

Table 5.2: Performance comparison.

#### 5.6 Summary

Variable pre-charged delay units can be used for adjusting the time delay of the output phases of clock generators. The delay tuning mechanism is realized by changing the charging and discharging behavior at the internal node,  $V_c$ , in VPDUs. The delay tuning of VPDUs can be more linear with a smaller channel-length modulation parameter. DACs are used for providing pre-charging voltages. Combinational logic gates are used for generating timing control signals.

To demonstrate the VPDU's capability, an 8-phase 1-GHz clock generator was fabricated using a standard 0.18  $\mu$ m CMOS technology. The digitally-controlled variable pre-charged delay unit has a 0.145 psec delay control resolution and a total control range of 69.78 psec. The chip occupies an area of 1.1×1.3 mm<sup>2</sup> and dissipates 110 mW from a 1.8 V supply.



### **Chapter 6**

#### Conclusions

#### 6.1 Summary

Multi-phase clocks are usually generated by phase-locked loops and delay-locked loops. The voltage-controlled oscillator and the voltage-controlled delay line are the most important components of the clock generators to generate multi-phase clocks. They can be constructed by many different components with different methods. In CMOS techniques, the most popular design methods of the voltage-controlled oscillator and the voltage-controlled controlled delay line are cascading delay cells as a ring or a string.

ANTIMAR DA

The maximum number of the output phases of multi-phase clock generators is limited by the required operation frequency of the clock generator and the minimum delay time of a delay cell. The methods which can overcome the limitation can be identified as two classifications, phase interpolation by using phase interpolator and array structure by combining several voltage-controlled oscillators or voltage-controlled delay lines. Three major methods to increase the output phases of multi-phase clock generators are twodimensional array oscillator, phase interpolation, and delay-locked loop array.

Due to the process variation, defects of substrate, and so on, the delay cells and the MOSFETs are mismatch. The phase difference of each two adjacent output phases of multi-phase clock generators would not be equal and then phase errors occur. The methods which can calibrate the phase errors are often using a calibration loop to calibrate their output phases by themselves. Three major methods for improving the quality of the output phases of multi-phase clock generators are self-calibrated phase-locked loop, self-calibrated delay-locked loop, and shifted-averaging voltage-controlled delay line.

In this thesis, circuit techniques using resistor strings and resistor rings for phase averaging and interpolation are introduced. Resistor strings can be used for phase averaging and phase interpolation. Phase averaging can reduce phase errors and phase interpolation can increase the number of available output phases. When clock phases spanning a full period are available for driving a resistor string, a resistor ring are preferred to mitigate the edge-distortion phenomenon. Capacitors on the resistor strings (or rings) can degrade the effectiveness of both phase averaging and phase interpolation. The design parameter  $\beta = \omega_i \times RC$  need to be carefully chosen to optimize the trade-off between phase accuracy and power dissipation. Analogous to the folding ADCs, multi-phase frequency multipliers can be constructed using R-rings and folders.

The two basic building blocks, delay cells and isolation buffers, of the clock generators with phase averaging technique are introduced. Delay cells consist of symmetric loads and a self-biased replica-feedback current source. The current source can provide high output resistance without using the cascode current source for increasing the immunity to static supply noise. The symmetric load could eliminate the first-order term of the common-mode supply noise for increasing the immunity to dynamic supply noise. The isolation buffer which is a fully-differential amplifier with cross-coupled and diode-connected loads provides high output resistance. In order to demonstrate the R-ring's capability of phase averaging and interpolation, a 125-MHz 8-bit digital-to-phase converter was designed and fabricated using a standard 0.35  $\mu$ m SPQM CMOS technology. The digital-to-phase converter consists of two delay lines driving two R-rings respectively. Together, they generate 256 different clock phases. Measurement results show 8-bit resolution is possible using the R-ring technique.

In order to eliminate the phase errors completely, the design parameter  $\beta = \omega_i \times RC$  need to be very small. However, that would cost much power. If the expense of chip area is not a problem, the phase accuracy enhancement techniques are useful to eliminate phase errors completely. The phase accuracy enhancement techniques are often implemented by using phase detectors or statistical analysis for determining the phase errors, and insert phase shifters into each clock output paths for fine tuning the output

phases. The resolution of the phase shifters dominates the performance of the phase accuracy enhancement techniques.

Variable pre-charged delay units can be used for adjusting the time delay of the output phases of clock generators. The delay tuning mechanism is realized by changing the charging and discharging behavior at the internal node,  $V_c$ , in VPDUs. The delay tuning of VPDUs can be more linear with a smaller channel-length modulation parameter. DACs are used for providing pre-charging voltages. Combinational logic gates are used for generating timing control signals. To demonstrate the VPDU's capability, an 8-phase 1-GHz clock generator was fabricated using a standard 0.18  $\mu$ m CMOS technology. The digitally-controlled variable pre-charged delay unit has a 0.145 psec delay control resolution and a total control range of 69.78 psec. The chip occupies an area of 1.1×1.3 mm<sup>2</sup> and dissipates 110 mW from a 1.8 V supply.

#### 6.2 Recommendations for Future Investigation

This section presents several suggestions for future investigations into the design of multiphase clock generators.

4 mm

- The phase averaging technique is using R-ring to provide a path for distributing the error currents produced by phase errors into each output node of the clock generator. The design parameter β = ω<sub>i</sub> × RC can be optimized for a specific operation frequency. The fixed RC represents that the β would not be the optimal value for the other operation frequencies. Therefore, the path could be constructed by inserting other frequency-dependent devices to vary β in different operation frequencies. For example, the capacitors and inductors could be inserted into R-ring. The design parameter β = ω<sub>i</sub> × RC shows that the R-ring's phase averaging capability is reduced in higher operation frequency. The capacitors could be used to compensate the phase averaging capability of R-ring.
- The current path could also be constructed by active devices. For example, the MOSFETs could be used to replace the resistors in R-ring. By varying the gate

voltage of the MOSFETs, the impedance of the MOSFETs is varied, and then the phase averaging capability of the ring could be changed dynamically. It could be used to improve the yield of multi-phase clock generators. If the process variation today is larger than yesterday, the performance control signal, the gate voltage, should be larger to maintain the acceptable quality of output phases.

- By connecting each resistor to two adjacent phases produced by clock generators to form a R-string or a R-ring, the current path is established for phase averaging. The topology of the resistors could be different from a string or a ring. The topology of star connection could be inserted into the R-string and R-ring for additional current paths. The error currents could be distributed more evenly. The performance of the phase averaging technique should be improved.
- VPDUs require DACs for providing pre-charging voltages. However, the DACs cost large chip area. Another circuit technique should be invented to replace the DACs and to provide pre-charging voltages. That will make the VPDUs more practical.



### Appendix A

### **Linear Models of PLLs/DLLs**

Fig. A.1 shows the linear models of phase-locked loops and delay-locked loops. Although the transient responses of clock generators are usually nonlinear, the systems could be modeled as linear models by continuous-time approximation. Fig. A.1(a) shows the linear model of phase-locked loops without the charge pump circuit. Its opened-loop transfer function could be expressed as

$$G(s) = K_{pd}F(s)\frac{2\pi K_{vco}}{s}\frac{1}{N}$$
(A.1)

and its closed-loop transfer function could be derived as

$$H(s) = \frac{\phi_o}{\phi_i} = \frac{G(s)}{1 + G(s)} = \frac{2\pi K_{pd} K_{vco} F(s)/N}{s + 2\pi K_{pd} K_{vco} F(s)/N}$$
(A.2)

where  $K_{pd}$  the gain of the phase detector (V/rad), F(s) is the transfer function of the loop filter,  $K_{vco}$  is the gain of the voltage-controlled oscillator (Hz/V), and N is the divide ratio of the frequency divider. If the transfer function of the loop filter is

$$F(s) = \frac{1}{1 + \frac{s}{\omega_{LPF}}}$$
(A.3)

Equation A.2 could be rewritten as

$$H(s) = \frac{\omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$
(A.4)

$$\omega_n = \sqrt{\frac{2\pi\omega_{LPF}K_{pd}K_{vco}}{N}} \tag{A.5}$$

$$\zeta = \frac{1}{2} \sqrt{\frac{N\omega_{LPF}}{2\pi K_{pd} K_{vco}}}$$
(A.6)



Figure A.1: Linear models of PLLs and DLLs. (a) PLL without charge pump, (b) PLL with charge pump, (c) DLL without charge pump, (d) DLL with charge pump.

where  $\zeta$  is the damping factor and  $\omega_n$  is the natural frequency. The phase difference,  $\phi_o - \phi_i$ , is the function of  $K_{pd}K_{vco}$  in this kind of phase-locked loops. If the phase difference is not equal to zero, there are a periodic voltage undulation in the output of the phase detector and then systemic jitter occurs. In order to reduce the phase difference,  $K_{pd}K_{vco}$  should approaches to infinity. Therefore, phase-locked loops with charge pump circuit are often chosen.

Fig. A.1(b) shows the linear model of phase-locked loops with the charge pump circuit. Its opened-loop transfer function could be expressed as

$$G(s) = \frac{I_p K_{vco} F(s)}{sN}$$
(A.7)

and its closed-loop transfer function could be derived as

$$H(s) = \frac{\phi_o}{\phi_i} = \frac{G(s)}{1 + G(s)} = \frac{I_p K_{vco} F(s)/N}{s + I_p K_{vco} F(s)/N}$$
(A.8)

where  $I_p/2\pi$  is the output current of the charge pump circuit (A/rad). If the loop filter is a second-order low-pass filter and the capacitor in parallel is small enough, the transfer function of the loop filter, F(s), could be approximated to

$$F(s) \approx R + \frac{1}{sC} \tag{A.9}$$

Equation A.8 could be rewritten as

$$H(s) = \frac{2\zeta\omega_n s + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$
(A.10)

$$\omega_n = \sqrt{\frac{K_{vco}I_p}{NC}} \tag{A.11}$$

$$\zeta = \frac{RC}{2} \sqrt{\frac{K_{vco}I_p}{NC}}$$
(A.12)

The system parameters,  $\omega_n$  and  $\zeta$ , do not be tied together due to the inserted zero. This is why the second-order filter are often chosen in the phase-locked loop design.

Fig. A.1(c) shows the linear model of delay-locked loops without charge pump circuit. The output phases generated by the voltage-controlled delay line are by delaying the input reference clock. Therefore, the input and output parameters of the linear model could be changed as  $D_i(s)$  and  $D_o(s)$ . The relationship of the input delay,  $D_i(s)$ , and the output delay,  $D_o(s)$ , of the voltage-controlled delay line could be written as

$$D_o(s) = [D_i(s) - D_o(s)] \omega_{ref} K_{pd} F(s) K_{vcdl}$$
(A.13)

where  $\omega_{ref}$  is the frequency of input reference clock (rad/s), and  $K_{vcdl}$  is the gain of the voltage-controlled delay line (s/V). The delay error could be transferred as phase error by  $[D_i(s) - D_o(s)]\omega_{ref}$ . By simplifying Equation A.13 with F(s) = 1/sC, the closed-loop transfer function of the delay-locked loop could be expressed as

$$H(s) = \frac{D_o(s)}{D_i(s)} = \frac{1}{1 + \frac{s}{\omega_p}}$$
(A.14)

$$\omega_p = \frac{\omega_{ref} K_{pd} K_{vcdl}}{C} \tag{A.15}$$

where  $\omega_p$  is the bandwidth of the delay-locked loop. From above transfer function, the delay-locked loop is a first-order system. The system parameter is only the position of the pole. Therefore, the design of delay-locked loops is simpler than phase-locked loops. However, due to the same reason, the delay-locked loops with charge pump circuit are often chosen.

Fig. A.1(d) shows the linear model of delay-locked loops with the charge pump circuit. When using a charge pump circuit in the delay-locked loop, Equation A.13 could be modified as

$$D_{o}(s) = [D_{i}(s) - D_{o}(s)] \omega_{ref} \frac{I_{p}}{2\pi} \frac{1}{sC} K_{vcdl}$$
(A.16)

because of

$$K_{pd}F(s) = \frac{I_p}{2\pi} \cdot \frac{1}{sC}$$
(A.17)

The closed-loop transfer function of the delay-controlled loop could be obtained by simplifying the Equation A.16.

$$H(s) = \frac{D_o(s)}{D_i(s)} = \frac{1}{1 + \frac{s}{\omega_p}}$$
(A.18)

$$\omega_p = \omega_{ref} K_{vcdl} \frac{I_p}{2\pi C} \tag{A.19}$$

The delay-locked loop is still a first-order system.

### **Appendix B**

## **Resistor String's Frequency Response**

In Fig. 3.4, the node,  $V_{o,0}$ , connects a current source, a capacitor, and two RC ladder networks on the right and left. If the length of the two networks is infinity, then the input impedance of each network can be expressed as

$$Z_{ld} = \frac{1}{2}R + \frac{1}{2}\sqrt{R^2 + 4RZ_c}$$
(B.1)

where  $Z_c = 1/(j\omega_i C)$ . Let  $I_{i,x} = 0$  for  $x \neq 0$ . Using the principle of current dividing, the current flowing in the capacitor connected to the  $V_{o,0}$  node is

$$I_{o,0} = I_{i,0} \times \frac{Z_{ld}}{2Z_c + Z_{ld}}$$
(B.2)

The current flowing in the capacitor connected to the  $V_{o,\pm 1}$  nodes can be computed from  $I_{o,0}$  as:

$$I_{o,\pm 1} = I_{o,0} \times \frac{Z_c}{Z_{ld}} \times \frac{Z_{ld}}{Z_c + Z_{ld}}$$
(B.3)

The current flowing in the capacitor connected to the  $V_{o,x}$  node, for |x| > 1, can be computed from  $I_{o,|x|-1}$  as:

$$I_{o,x} = I_{o,|x|-1} \times \frac{Z_c}{Z_{ld}} \times \frac{Z_{ld}}{Z_c + Z_{ld}}$$
(B.4)

Thus, the frequency response of the current gain from  $I_{i,0}$  to  $I_{o,x}$  can be written as:

$$\frac{I_{o,x}}{I_{i,0}} = \frac{R + \sqrt{R(R + 4Z_C)}}{R + \sqrt{R(R + 4Z_C)} + 4Z_C} \times \left[\frac{2Z_C}{2Z_C + R + \sqrt{R(R + 4Z_C)}}\right]^{|x|}$$
(B.5)

Replacing  $\omega_i \times RC$  with  $\beta$ , the above equation can be manipulated to obtain Equation 3.2.



## **Appendix C**

# Sub-Harmonic Phase Interpolation Technique

In transceiver applications, mixers are often used to be upconverters and downconverters to shift an input frequency either up or down. The function of mixers is mainly provided by the second-order terms of the square law of MOSFETs. By substituting some variables of the square law with well-designed parameters, mixers could even be used as phase interpolators. Fig. C.1 shows the schematic of a simple mixer. The gate and source of the N-type MOSFET are used to receive input signals, and the drain of the MOSFET is used for output. If the two inputs,  $V_g$  and  $V_s$ , are the same signals with different phase, the drain current of the MOSFET without channel-length modulation could be obtained by

$$i_{d}(t) = \frac{1}{2}\mu C_{ox} \frac{W}{L} \times \left[ V_{gs}(t) - V_{t} \right]^{2}$$

$$= \frac{1}{2}\mu C_{ox} \frac{W}{L}$$

$$\times \left[ V_{g,dc} + (V_{sw} + V_{off}) \cdot \cos(\omega t + \phi_{g}) - V_{s,dc} - (V_{sw} - V_{off}) \cdot \cos(\omega t + \phi_{s}) - V_{t} \right]^{2}$$
(C.1)
(C.2)

where  $V_{g,dc}$  and  $V_{s,dc}$  are the common-mode voltage of the two inputs,  $V_{sw}$  is the half of the maximum amplitude of the inputs,  $V_{off}$  is the half of the amplitude difference of the two inputs,  $\omega$  is the input frequency, and  $\phi_g$  and  $\phi_s$  are the phase of the inputs. By expanding the Equation C.2, filtering the output with bandpass filter at  $2\omega t$ , and ignoring the current



gain,  $\mu C_{ox}W/4L$ , the final result could be simplified as

$$i_{d}(t) = (V_{off} + V_{sw})^{2} \cdot cos(2\omega t + 2\phi_{g}) + 2(V_{off}^{2} - V_{sw}^{2}) \cdot cos(2\omega t + \phi_{g} + \phi_{s}) + (V_{off} - V_{sw})^{2} \cdot cos(2\omega t + 2\phi_{s})$$
(C.3)

If  $V_{off}$  is equal to  $V_{sw}$ , the output current,  $i_d(t)$  would be

$$4V_{sw}^2 \cdot \cos(2\omega t + 2\phi_g) \tag{C.4}$$

The output phase of the mixer is  $2\phi_g$ , and the output amplitude is proportional to  $4V_{sw}^2$ . If  $V_{off}$  is equal to  $-V_{sw}$ , the output current,  $i_d(t)$  would be

$$4V_{sw}^2 \cdot \cos(2\omega t + 2\phi_s) \tag{C.5}$$

The output phase of the mixer is  $2\phi_s$ , and the output amplitude is also proportional to  $4V_{sw}^2$ . If  $V_{off}$  is equal to 0, the output current,  $i_d(t)$  would be

$$2V_{sw}^2 \cdot [1 - \cos(\phi_g - \phi_s)] \cdot \cos(2\omega t + \pi + \phi_g + \phi_s) \tag{C.6}$$

88



Figure C.2: Simulated transfer curves of normalized output phase versus input amplitude difference.

The output phase of the mixer is  $\pi + \phi_g + \phi_s$ , and the output amplitude varies with  $|\phi_g - \phi_s|$ . In this case, the output amplitude is 0 when  $\phi_g - \phi_s = 0^\circ$ ,  $2V_{sw}^2$  when  $\phi_g - \phi_s = \pm 90^\circ$ , and  $4V_{sw}^2$  when  $\phi_g - \phi_s = \pm 180^\circ$ . From above analysis, the output of the mixer shows the characteristic of phase interpolators. By varying  $V_{off}$ , the output phase of the mixer varies. The output tuning range is shown in the following equation.

$$360^{\circ} - 2 \times |\phi_g - \phi_s| \tag{C.7}$$

When  $\phi_g - \phi_s = 0^\circ$  or  $\phi_g - \phi_s = \pm 180^\circ$ , the output tuning range is  $0^\circ$ . Therefore, the limitation of the input phase difference is

$$0^{\circ} < |\phi_g - \phi_s| < 180^{\circ} \tag{C.8}$$

Fig. C.2 shows the transfer curves of the output phases normalized to their tuning



Figure C.3: Simulated transfer curves of normalized output amplitude versus input amplitude difference.

ranges versus input amplitude difference with different input phase differences. When the input phase difference is larger, the linearity of the output phases is better.

Fig. C.3 shows the transfer curves of the output amplitudes normalized to their maximum output amplitude versus input amplitude difference with different input phase differences. When the input phase difference is larger, the linearity of the output amplitude is better.

### **Bibliography**

- [1] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, "An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low-jitter performance," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 3, pp. 377–384, March 2000.
- [2] H. Mair and L. Xiu, "An architecture of high-performance frequency and phase synthesis," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 6, pp. 835–846, June 2000.
- [3] L. Sun and T. A. Kwasniewski, "A 1.25-GHz 0.35-μm monolithic CMOS PLL based on a multiphase ring oscillator," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 6, pp. 910–916, June 2001.
- [4] C.-K. K. Yang, R. Farjad-Rad, and M. A. Horowitz, "A 0.5-μm CMOS 4.0-Gbit/s serial link transceiver with data recovery using oversampling," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 5, pp. 713–722, May 1998.
- [5] Y. Moon, D.-K. Jeong, and G. Ahn, "A 0.6–2.5-GBaud CMOS tracked 3x oversampling transceiver with dead-zone phase detection for robust clock/data recovery," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 12, pp. 1974–1983, December 2001.
- [6] S.-J. Song, S. M. Park, and H.-J. Yoo, "A 4-Gb/s CMOS clock and data recovery circuit using 1/8-rate clock technique," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 7, pp. 1213–1219, July 2003.
- [7] I. Ghareeb, "Bit error rate performance and power spectral density of a noncoherent
hybrid frequency-phase modeulation system," *IEEE Journal on Selected Areas in Communications*, vol. 13, no. 2, pp. 276–284, February 1995.

- [8] J. W. Kelly, E. G. Strangas, and J. M. Miller, "Multiphase space vector pulse width modulation," *IEEE Transactions on Energy Conversion*, vol. 18, no. 2, pp. 259–264, June 2003.
- [9] H.-M. Ryu, J.-H. Kim, and S.-K. Sul, "Analysis of multiphase space vector pulsewidth modulation based on multiple d-q spaces concept," *IEEE Transactions on Power Electronics*, vol. 20, no. 6, pp. 1364–1371, November 2005.
- [10] H. Jin and E. K. F. Lee, "A digital technique for reducing clock jitter effects in time-interleaved A/D converter," in *IEEE International Symposium of Circuits and Systems*, May 1999, pp. 330–333.

ALL CONTRACT

- [11] S.-W. Sin, Seng-Pan, and R.P.Martins, "A generalized timing-skew-free, multi-phase clock generation platform for parallel sampled-data systems," in *IEEE International Symposium of Circuits and Systems*, May 2004, pp. 369–372.
- [12] J. Elbornsson, F. Gustafsson, and J.-E. Eklund, "Blind adaptive equalization of mismatch errors in a time-interleaved A/D converter system," *IEEE Transactions on Circuits and Systems—Part I: Fundamental Theory and Applications*, vol. 51, no. 1, pp. 151–158, January 2004.
- [13] M. Kossel, P. Buchmann, C. Menolfi, T. Morf, T. Toifl, and M. Schmatz, "A lowjitter wideband multiphase PLL in 90nm SOI CMOS technology," in *IEEE International Solid-State Circuits Conference*, February 2005, pp. 414–415.
- T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, R. Reutemann, M. Ruegg, M. Schmatz, and J. Weiss, "0.94ps-rms-jitter 0.016mm<sup>2</sup> 2.5GHz multi-phase generator PLL with 360° digitally programmable phase shift for 10Gb/s serial links," in *IEEE International Solid-State Circuits Conference*, February 2005, pp. 410–411.
- [15] G. L. G. de Mercey, "A 18GHz rotary traveling wave VCO in CMOS with I/Q outputs," in *IEEE European Solid-State Circuits Conference*, September 2003, pp. 489– 492.

- [16] N. Tzartzanis and W. W. Walker, "A reversible poly-phase distributed VCO," in *IEEE International Solid-State Circuits Conference*, February 2006, pp. 2452–2461.
- [17] G. P. Bilionis, A. N. Birbas, and M. K. Birbas, "Fully integrated differential distributed VCO in 0.35-μm SiGe BiCMOS technology," *IEEE Transactions on Microwave Theory and Techniques*, vol. 55, no. 1, pp. 13–22, January 2007.
- [18] S.-J. Lee, B. Kim, and K. Lee, "A novel high-speed ring oscillator for multiphase clock generation using negative skewed delay scheme," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 2, pp. 289–291, February 1997.
- [19] D. Guermandi, P. Tortori, E. Franchi, and A. Gnudi, "A 0.83–2.5-GHz continuously tunable quadrature VCO," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 12, pp. 2620–2627, December 2005.
- [20] S.-J. Bae, H.-J. Chi, Y.-S. Sohn, and H.-J. Park, "A VCDL-based 60–760-MHz dual-loop DLL with infinite phase-shift capability and adaptive-bandwidth scheme," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 5, pp. 1119–1129, May 2005.
- [21] B. Razavi, Ed., Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design. IEEE PRESS, 1996.
- [22] F. M. Gardner, *Phaselock Techniques*, 2nd ed. John Wiley & Sons, Inc., 1979.
- [23] —, "Charge-pump phase-lock loops," *IEEE Transactions on Communications*, vol. COM-28, no. 11, pp. 1849–1858, November 1980.
- [24] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa,
  "A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 Megabyte/s DRAM," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 12, pp. 1491–1496, December 1994.
- [25] M. Mota and J. Christiansen, "A four channel, self-calibrating, high resolution, time to digital converter," in *IEEE International Conference on Electronics, Circuits and Systems*, September 1998, pp. 409–412.

- [26] R. Kreienkamp, U. Langmann, C. Zimmermann, and T. Aoyama, "A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator," in *IEEE Custom Integrated Circuits Conference*, September 2003, pp. 73–76.
- [27] U. Karthaus and S. Schabel, "Write pulse generator for 16x DVD recording with symmetric CMOS inverter ring oscillator," *IEEE Journal of Solid-State Circuits*, vol. 11, no. 11, pp. 2286–2295, Novemer 2005.
- [28] J. G. Maneatis and M. A. Horowitz, "Precise delay generation using coupled oscillators," *IEEE Journal of Solid-State Circuits*, vol. 28, no. 7, pp. 1273–1282, December 1993.
- [29] K. Ishibashi, K. Komiyaji, H. Toyoshima, M. Minami, N. Ooki, H. Ishida, T. Yamanaka, T. Nagano, and T. Nishida, "A 300MHz 4-Mb wave-pipeline CMOS SRAM using a multi-phase PLL," in *IEEE International Solid-State Circuits Conference*, February 1995, pp. 308–309.
- [30] P.-F. Chen, "A 2V, 110 MHz, 64-phase CMOS PLL," Master's thesis, National Chiao Tung University, June 1996.
- [31] H.-D. Chang, "A 2V 110MHz CMOS vector modulator," *Master's thesis, National Chiao Tung University*, June 1996.
- [32] J.-T. Wu, H.-D. Chang, and P.-F. Chen, "A 2 V 100 MHz CMOS vector modulator," in *IEEE International Solid-State Circuits Conference*, February 1997, pp. 80–81.
- [33] J.-M. Chou, "3V 9Gbps CMOS multiplexer," Master's thesis, National Chiao Tung University, June 2000.
- [34] A. Maxim, "A 160-2550MHz CMOS active clock deskewing PLL using analog phase interpolation," in *IEEE International Solid-State Circuits Conference*, February 2004, pp. 346–347.
- [35] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and H. Siedhoff, "A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator,"

*IEEE Transactions on Circuits and Systems—Part I: Fundamental Theory and Applications*, vol. 40, no. 3, pp. 736–743, March 2005.

- [36] X. Chen and J. Liu, "A delay compensation technique for N-phase clock generation with 2(N-1) delay units," in *IEEE International Symposium of Circuits and Systems*, May 2005, pp. 4887–4890.
- [37] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 5, pp. 632–644, May 1999.
- [38] T. Saeki, M. Mitsuishi, H. Iwaki, and M. Tagishi, "A 1.3-cycle lock time, non-PLL/DLL clock multiplier based on direct clock cycle interpolation for "clock on demand"," *IEEE Journal of Solid-State Circuits*, vol. 11, no. 11, pp. 1581–1590, Novemer 2000.
- [39] T.-Y. Wang, S.-M. Lin, and H.-W. Tsao, "Multiple channel programmable timing generators with single cyclic delay line," *IEEE Transactions on Instrumentation and Measurement*, vol. 53, no. 4, pp. 1295–1303, August 2004.
- [40] J. Christiansen, "An integrated high resolution CMOS timing generator based on an array of delay locked loops," *IEEE Journal of Solid-State Circuits*, vol. 13, no. 7, pp. 952–957, July 1996.
- [41] C. Y. Lu, M. Viejo, and Calif, "Segmented dual delay-locked loop for precise variable-phase clock generation," U.S. Patent 6 100 735, August 8, 2000.
- [42] Y.-L. Tsao, M.-C. Chung, and S.-J. Jou, "Delay-difference DLL and its-application on skewed output buffer," in *IEEE Asia-Pacific Conference on ASICs*, August 2002, pp. 279–282.
- [43] C.-H. Park, J. W. Kim, and B. Kim, "A 1.8-GHz self-calibrated phase-locked loop with precise I/Q matching," in *IEEE Asia-Pacific Conference on ASICs*, August 2000, pp. 81–84.

- [44] F. Baronti, L. Fanucci, D. Lunardini, R. Roncella, and R. Saletti, "A high-resolution DLL-based digital-to-time converter for DDS applications," in *IEEE International Frequency Control Symposium and PDA Exhibition*, May 2002, pp. 649–653.
- [45] S. Tontisirin and R. Tielert, "A Gb/s one-fourth-rate CMOS CDR circuit without external reference clock," in *IEEE International Symposium of Circuits and Systems*, May 2006, pp. 1–4.
- [46] C.-H. Park, O. Kim, and B. Kim, "A 1.8-GHz self-calibrated phase-locked loop with precise I/Q matching," in *IEEE Symposium on VLSI Circuits Digest of Technical Papers*, June 2000, pp. 242–243.
- [47] —, "A 1.8-GHz self-calibrated phase-locked loop with precise I/Q matching," IEEE Journal of Solid-State Circuits, vol. 36, no. 5, pp. 777–783, May 2001.
- [48] M. Mota and J. Christiansen, "A high-resolution time interpolator based on a delay locked loop and an RC delay line," *IEEE Journal of Solid-State Circuits*, vol. 10, no. 11, pp. 1360–1366, October 1999.
- [49] F. Baronti, L. Fanucci, D. Lunardini, R. Roncella, and R. Saletti, "On-line calibration for non-linearity reduction of delay-locked delay-lines," in *IEEE International Conference on Electronics, Circuits and Systems*, September 2001, pp. 1001–1005.
- [50] L. Wu and W. C. B. Jr., "A low-jitter skew-calibrated multi-phase clock generator for time-interleaved applications," in *IEEE International Solid-State Circuits Conference*, February 2001, pp. 396–397.
- [51] H.-H. Chang, C.-H. Sun, and S.-I. Liu, "A low-jitter and precise multiphase delaylocked loop using shifted averaging VCDL," in *IEEE International Solid-State Circuits Conference*, February 2003, pp. 434–435.
- [52] H.-H. Chang, R.-J. Yang, and S.-I. Liu, "Low jitter and multirate clock and data recovery circuit using a MSADLL for chip-to-chip interconnection," *IEEE Transactions on Circuits and Systems—Part I: Fundamental Theory and Applications*, vol. 51, no. 12, pp. 2356–2364, December 2004.

- [53] H.-H. Chang, "Design and implementation of CMOS digital/analog delay-locked loop," *PH.D's thesis, National Taiwan University*, June 2004.
- [54] B. Kim, T. C. Weigandt, and P. R. Gray, "PLL/DLL system noise analysis for low jitter clock synthesizer design," in *IEEE International Symposium of Circuits and Systems*, May 1994, pp. 31–34.
- [55] T. C. Weigandt, B. Kim, and P. R. Gray, "Analysis of timing jitter in CMOS ring oscillators," in *IEEE International Symposium of Circuits and Systems*, May 1994, pp. 27–30.
- [56] W.-H. Chan, J. Lau, and A. Buchwald, "A 622-MHz interpolating ring VCO with temperature compensation and jitter analysis," in *IEEE International Symposium of Circuits and Systems*, June 1997, pp. 25–28.
- [57] J. A. McNeill, "Jitter in ring oscillators," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 6, pp. 870–879, June 1997.

1896

- [58] A. Hajimiri and T. H. Lee, "A general theory of phase noise in electrical oscillators," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 179–194, February 1998.
- [59] A. Hajimiri, S. Limotyrakis, and T. H. Lee, "Jitter and phase noise in ring oscillators," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 6, pp. 790–804, June 1999.
- [60] T. H. Lee and A. Hajimiri, "Oscillator phase noise: A tutorial," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 3, pp. 326–336, March 2000.
- [61] H. Pan and A. A. Abidi, "Spatial filtering in flash A/D converters," IEEE Transactions on Circuits and Systems—Part II: Analog and Digital Signal Processing, vol. 50, no. 8, pp. 424–436, August 2003.
- [62] K. Kattmann and J. Barrow, "A technique for reducing differential non-linearity errors in flash A/D converters," in *IEEE International Solid-State Circuits Conference*, February 1991, pp. 170–171.

- [63] K. Bult and A. Buchwald, "An embedded 240-mw 10-b 50-MS/s CMOS ADC in 1-mm<sup>2</sup>," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 12, pp. 1887–1895, December 1997.
- [64] B. Razavi, *Design of Analog CMOS Integrated Circuits*, preview ed. McGraw–Hill, 2000.
- [65] D. J. Allstot, "A precision variable-supply CMOS comparator," *IEEE Journal of Solid-State Circuits*, vol. SC-17, no. 6, pp. 1080–1087, December 1982.
- [66] B.-S. Song, S.-H. Lee, and M. F. Tompsett, "A 10-b 15-MHz CMOS recycling two-step A/D converter," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 6, pp. 1328–1338, December 1990.
- [67] J.-M. Chou, Y.-T. Hsieh, and J.-T. Wu, "A 125 MHz 8b digital-to-phase converter," in *IEEE International Solid-State Circuits Conference*, February 2003, pp. 436–437.
- [68] —, "Phase averaging and interpolation using resistor strings or resistor rings for multi-phase clock generation," *IEEE Transactions on Circuits and Systems—Part I: Fundamental Theory and Applications*, vol. 53, no. 5, pp. 984–991, May 2006.
- [69] S. Sidiropoulos and M. A. Horowitz, "A semidigital dual delay-locked loop," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 11, pp. 1683–1692, November 1997.
- [70] W. Rhee, B. Parker, and D. Friedman, "A semi-digital delay-locked loop using an analog-based finite state machine," *IEEE Transactions on Circuits and Systems— Part II: Analog and Digital Signal Processing*, vol. 51, no. 11, pp. 635–639, November 2004.
- [71] J.-S. Chiang and K.-Y. Chen, "The design of an all-digital phase-locked loop with small DCO hardware and fast phase lock," *IEEE Transactions on Circuits and Systems—Part II: Analog and Digital Signal Processing*, vol. 46, no. 7, pp. 945– 950, July 1999.

- [72] M. Saint-Laurent and M. Swaminathan, "A digitally adjustable resistor for path delay characterization in high-frequency microprocessors," in *IEEE Southwest Symposium on Mixed-Signal Design*, February 2001, pp. 61–64.
- [73] F. Baronti, L. Fanucci, D. Lunardini, R. Roncella, and R. Saletti, "A technique for nonlinearity self-calibration of DLLs," *IEEE Transactions on Instrumentation and Measurement*, vol. 52, no. 4, pp. 1255–1260, August 2003.
- [74] G. Torralba, V. Angelov, V. Gonzalez, V. Lindenstruth, and E. Sanchis, "A VLSI for deskewing and fault tolerance in LVDS links," *IEEE Transactions on Nuclear Science*, vol. 53, no. 3, pp. 801–809, June 2006.
- [75] M. Maymandi-Nejad and M. Sachdev, "A digitally programmable delay element: Design and analysis," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 11, no. 5, pp. 871–878, May 2003.
- [76] —, "A monotonic digitally controlled delay element," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 11, pp. 2212–2219, November 2005.

1896

[77] D. Shin, J. Song, K.-W. K. Hyunsoo Chae, Y. J. Choi, and C. Kim, "A 7ps-jitter 0.053mm<sup>2</sup> fast-lock ADDLL with wide-range and high-resolution all-digital DCC," in *IEEE International Solid-State Circuits Conference*, February 2007, pp. 184–185.

BIBLIOGRAPHY



## Vita



Ju-Ming Chou was born in Kaohsiung, Taiwan, in 1976. He received the B.S. degree in electronics engineering from National Central University, Chung-Li, Taiwan, in 1998, and the M.S. degree in electronics engineering from National Chiao-Tung University, Hsin-Chu, Taiwan, in 2000. He worked toward the Ph.D. degree in National Chiao-Tung University from 2000 to 2007.

His research interests include analog front-end circuits and mixed-signal circuits in data communication.

住址:高雄市三民區覺民路83巷11號2樓

本論文使用 LATEX<sup>1</sup> 系統排版.

<sup>&</sup>lt;sup>1</sup>LATEX 是 TEX 之下的 macros 集. TEX 是 American Mathematical Society 的註册商標. 本論文 macros 的原始作者是 Dinesh Das, Department of Computer Sciences, The University of Texas at Austin. 交大中文版的作者是吴介琮, 交通大學電子工程學系, 新竹, 台灣.

# **Publication List**

### • Journal Paper

 Ju-Ming Chou, Yu-Tang Hsieh, and Jieh-Tsorng Wu, "Phase Averaging and Interpolation Using Resistor Strings or Resistor Rings for Multi-Phase Clock Generation," *IEEE Transactions on Circuits and Systems - I: Regular Papers*, Vol. 53, No. 5, pp. 984–991, May 2006.

### Conference Paper

 Ju-Ming Chou, Yu-Tang Hsieh, and Jieh-Tsorng Wu, "A 125MHz 8b Digitalto-Phase Converter," 2003 IEEE International Solid-State Circuits Conference, pp. 436–437, Feb. 2003.

#### • Patent

- Yu-Tang Hsieh, Ju-Ming Chou, and Jieh-Tsorng Wu, "Clock Generator using Resistor Strings and Resistor Rings."
  - \* U.S.A. Patent 6,777,994B2 (2004/8-2022/10), August 17, 2004.
  - \* R.O.C. Patent 168448 (2002/12–2022/1), April 16, 2003.

1896