### 國立交通大學

## 電子工程學系電子研究所

### 博士論文



#### Low-Power Analog-to-Digital Converters Design Techniques

研究生 : 鍾勇輝

指導教授 : 吳介琮

中華民國九十九年七月

#### 低功率類比數位轉換器之 設計技術

#### Low-Power Analog-to-Digital Converters Design Techniques

研究生: 鍾勇輝 Student: Yung-Hui Chung 指導教授 : 吳介琮 Advisor : Jieh-Tsorng Wu

> 國立交通大學 電子工程學系 電子研究所 博士論文

> > A Dissertation

Submitted to Department of Electronics Engineering and Institute of Electronics National Chiao-Tung University in partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

in

Electronics Engineering July 2010

Hsin-Chu, Taiwan, Republic of China

中華民國九十九年七月

### 低功率類比數位轉換器之 設計技術

學生 : 鍾勇輝 - 指導教授 : 吳介琮

國立交通大學

電子工程學系 電子研究所



本論文主要是探討在奈米 CMOS 製程之下, 如何設計一個高性能且低功耗的 類比數位轉換器 (Analog-to-Digital Converter, ADC) 來滿足系統單晶片的需求。一 般而言,類比數位轉換器是由比較器、放大器、類比開關、電容、電阻和數位電 路所構成。比較器與放大器是兩種較耗電的類比電路。而先進奈米製程的電晶體 是愈來愈小,速度愈來愈快,功耗愈來愈低,對數位電路有極大的好處。對於低 解析度(介於四位元到八位元之間)、以比較器為主的類比數位轉換器而言,可以 操作在更高的取樣頻率。它們可以因製程微縮而獲益,其整體功耗主要由比較器 來決定。對於中高解析度(介於十位元到十五位元之間)、以放大器為主的類比數 位轉換器架構而言,由於供應電壓變低,電晶體的內部電壓增益變小,奈米製程 並沒有特別的好處。相反的,類比數位轉換器所需的精準放大器電路會因低供應 電壓與低內部電壓增益而消耗更多功率。因此,如何在奈米製程之下,不因為上 述兩種類比電路而增加功率消耗,從而可以設計出高性能且低功耗的類比數位轉 换器,則是本論文的研究重點。

在比較器的設計上,傳統的比較器電路多是使用前置放大器來降低其輸入偏移 電壓。然而前置放大器的靜態功率消耗,將會大幅增加以比較器為主的類比數位 轉換器的功率消耗。本論文提出一個以閂鎖器(Latch)為主的比較器雷路,如此可 以去除前置放大器所需的靜態功率消耗。關於閂鎖器的輸入偏移電壓問題,則藉 由一個極低功耗的偏移校正迴路來改善。這種新型的比較器可被廣泛使用在比較 器為主的類比數位轉換器,以大幅降低其整體功率消耗。

在放大器的設計方面,為了取代高精確放大器,本論文提出只利用簡單的開 迴路低功率差動放大器來放大訊號。這種簡單的放大器可以適應 CMOS 製程的進 展,同時也簡化類比雷路的設計複雜度。然而這種放大器有增益誤差及非線性的 非理想行為。在不影響類比數位轉換器正常工作下,本論文提出了一個新型的數 位背景校正技術來修正上述放大器的非理想性。面對持續的製程微縮,大多數以 放大器為主的類比數位轉換器都可利用本論文所提出的數位背景校正技術來縮短 設計時間與降低整體功率消耗。

### **ABIL**

我們提出一個二階式類比數位轉換器之設計原型來驗證本論文所提出的技術, 以達成低供應電壓與低功率消耗的要求。這個設計是一個十位元每秒一億次取樣 之二階式類比數位轉換器,包含一個剩餘放大器與九十八個比較器。這個類比數 位轉換器晶片是以 90 奈米 CMOS 製程製作, 供應電壓是 1.0 伏特。在輸入信號之 頻率為一百萬赫茲時,可達到75dB 的無雜散信號動態範圍(SFDR)及58dB 的信 號對雜訊與失真比(SNDR)。利用簡單的放大器電路與前述的新式比較器電路, 這個轉換器本身的整體功率消耗為6毫瓦,內部晶片面積為0.36平方毫米。而我 們所提出的數位校正處理器,其功率消耗小於1毫瓦。

ii

### Low-Power Analog-to-Digital Converters Design Techniques

Student : Yung-Hui Chung Advisor : Jieh-Tsorng Wu

Department of Electronics Engineering and Institute of Electronics National Chiao-Tung University

Abstract

This thesis describes how to design a high performance and low power analog-todigital converter (ADC) to meet the SOC requirement on nanoscaled CMOS technologies. In general, an ADC is constructed by comparators, amplifiers, analog switches, capacitors, resistors and digital circuits. Comaprators and amplifiers are two power consuming analog circuits. Digital circuits benefit CMOS scaling since transistor is smaller, speed is faster and power consumption is lower. For low resolution (between 4 and 8 bits) comparator-based ADCs, they can operate at higher sampling frequency. They also benefit CMOS scaling, and their power consumption is dominant by comparators. For medium and high resolution (between 10 and 15 bits) amplifier-based ADCs, they do not benefit from nanoscaled CMOS technologies. On the contrary, larger power consumption is necessary for accurate amplifiers in ADCs due to lower supply voltage and lower transistors intrinsic gain. Therefore, how to design high performance and low power ADCs without larger power dissipation due to comparators and amplifiers is the research emphasis in

this thesis.

For comparator design, traditional design concepts usually use pre-amplifier to reduce its overall input offset voltage. However, the static power consumption of the pre-amplifier will greatly increase the comparator-based ADC power dissipation. In this thesis, we uses a latch-type comparator to eliminate the static power consumption of the pre-amplifier. About the input offset voltage of a latch, we proposed a very low power offset calibration loop to improve. The proposed comparator can be widely applied to comparator-based ADCs to reduce their overall power dissipation.

For amplifier design, instead of high accurate amplifier, we proposed a simple low power open-loop differential amplifier to amplify the residue signal. This amplifier can adapt for scaled CMOS technologies and also simplify the design complexity for analog circuits. But this simple amplifier has certain non-idealities: gain error and nonlinearity. Without interrupting ADC normal operation, we proposed a new digital background calibration technique to correct these non-idealities. Most amplifier-based ADCs can use the proposed calibration technique to shorten the design time and reduce overall power dissipation for continuous CMOS scaling.

A two-step ADC prototype is manufactured to verify the proposed techniques in this thesis to achieve the requirements of low supply voltage and low power consumption. It is a 10-bit 100-MS/s two-step ADC including one residue amplifier and ninety-eight comparators. This ADC is fabricated using a 90 nm CMOS technology with 1.0 V supply voltage. At 1 MHz input frequency, this ADC can achieve the performance of 75 dB SFDR and 58 dB SNDR. Using a simple open-loop amplifier and proposed comparator circuits, this ADC dissipates a total power of  $6 \text{ mW}$  and occupies die area of 0.36 mm<sup>2</sup>. The power consumption of the proposed digital calibration processor is less than 1 mW.

#### 誌謝

我要對我的指導教授 吳介琮教授致上最高的敬意。感謝他在我取得博士學位 的這段期間,對我的指導與協助。老師對研究品質與方法的嚴格要求,正是我們 未來在做研究時,最重要的信念與助力。更深入的思考及更廣泛的學習,才能成 就更與眾不同的研究成果。這些研究理念上的薰陶,使我受益良多。

我要感謝研究群的同學范啟威、翟芸、黃鈞正、王仲益、曾偉信、王自強與方 炳楠。在這段期間與你們的相處、對問題的討論,提供許多生活的樂趣,也更豐 富我的研究。也要感謝諸位學弟妹吳書豪、張志閔、田政展、張家綾與廖勝暉。 有你們在各種事務上的協助,才能讓我專心研究。

另外要感謝智原科技提供晶片下線的機會。謝謝諸位工程師所給予的協助,讓 我能夠順利完成晶片的製作。

此外,我還要特別感謝我的妻子一芬。她不僅是我心靈上的伴侶,也在研究的 道路上與我攜手同行。謝謝我的寶貝女兒若昀,她的點點滴滴都是上天對我的眷 顧。

最後謹將我的論文獻給我最摯愛的父母,感謝他們數十年來的栽培養育之恩。

鍾勇輝

國立交通大學

中華民國九十九年七月

v



# **Contents**











# List of Tables





# List of Figures









### Chapter 1

### Introduction

#### 1.1 Green Power Era

The electronic products can be simply categorized into battery-powered and non-batterypowered. In recent years, based on the technology development, the portable products increase dramatically rapidly. Their fantastic functionality brings people more convenience and fun. However, these portable products face a fundamental problem: how to extend the battery usage? Therefore, for more and more portable electronic products, how to achieve ultra low power dissipation becomes an important issue under the green power era. Low supply voltage can decrease the usage of battery. Moreover, low power elements of the products can maintain longer battery usage time.

In this decade, based on the Moore's law, the CMOS integrated circuits can operate under lower supply voltage and less power dissipation. SOC (System-on-a-Chip) is a trivial methodology to achieve these fantastic characteristics. Using SOC, the extra power consumption of the interfaces between chips can be saved and the size of the product can be smaller. Advanced CMOS technologies (e.g. 65 nm) provide SOC methodology more powerful functions: lower power consumption, faster operating speed and more flexible utilities. We can say that combination of advanced CMOS technologies and SOC integration, the portable products can be implemented more suitable for people's requirements: low cost, convenient usage, and micro-volume.

In 2003, [1] provided their several points of view about the SOC and how does it



Figure 1.1: Electrical components inside a representative cell phone.

help to implement powerful personal Internet products (PIPs). These PIPs are designed as communication, computing and consumer products, which are enabled by the Internet: cell phones, PDAs, WLANs and Internet audio/video. These PIPs are based on digital signal processing (DSP) and analog functionalities. And they are made accessible to billions of people around the globe by intense focus on cost through SOC integration. In the Internet age, Moore's law will continue to be a technology imperative for the semiconductor industry. Moreover, SOC integration will be an additional technology imperative that drives down the cost of PIPs to mass market levels. SOC integration for PIPs requires the integration of analog, power analog, RF and memory onto the digital baseband processor, which is fabricated in high density, high performance and low cost digital CMOS technology.

Cell phone is the typical product of the PIPs. Figure 1.1 shows the electrical components in a cell phone. The digital, analog, RF and memory components are on separate ICs. To reduce the cost, SOC integration is further applied based on nanometer CMOS technologies. DSP functionality is greatly improved, but other analog and RF circuits will face the difficulties of implementation due to low supply voltage and poor transistors' linearity. To implement SOC technology, analog circuits must be constructed on the same digital CMOS process and integrated with other digital and memory circuits. However, nanometer CMOS devices are more suitable for digital circuits but worse for analog circuits. These transistors have faster speed but lower intrinsic gain and linearity.

To meet the required performance, some traditional analog designers use I/O devices with much higher supply voltage (for example, 1.0 V for core devices in digital circuits and 2.5 V for I/O devices for analog circuits on a 90nm CMOS). Such implementation not only increases extra power consumption but also limits the I/O supply voltage reduction in the near future. To improve this issue, using core devices to implement these analog circuits is necessary. Another way is to reduce the usage of analog circuits. Some system architectures use less analog signal processing but more digital signal processing. However, some analog circuits are necessary to link the physical and digital worlds. Analog-to-digital converter (ADC) is one of the most important analog circuits. It samples the analog signal, comes from physical world, and then quantize this analog signal into digital code to be processed by the digital signal processor in the SOC chip. Generally speaking, ADC can be viewed as an interface between physical analog and virtual digital worlds.

#### 1.2 Motivation

For scaled CMOS technologies, the ADC design is more difficult with the following issues: lower supply voltage, lower transistors' intrinsic gain, severe gate leakage and lower power consumption. Many ADC architectures are proposed for different applications. Flash architecture is a comparator-based design, which can be easily adapted for scaled CMOS technologies. It is very suitable for high speed operation. However, this architecture is limited by its resolution. Higher resolution needs more comparators which dissipates a large amount of power and occupies large die area. In general, its resolution is limited by 6-bit. SAR ADC is another architecture to benefit the CMOS scaling. It uses only one comparator with repeated quantization to implement in one sampling period. This architecture has a perfect trade-off between resolution and speed. SAR ADC can operates both at 1 Gsample/s to achieve 6-bit resolution and at 100 Msample/s to achieve 10-bit resolution. Different from both above architectures, subranging ADC uses two flash sub-ADCs to make twice quantization. This architecture can achieve higher resolution than flash ADCs and higher sampling rate than SAR architecture. However, for scaled CMOS technologies, its complicated switch matrix becomes the bottleneck to slow down ADC's speed.

Pipelined ADC is the most common architecture to achieve higher resolution at certain sampling rate. Different from above three comparator-based architectures, it uses operational amplifier (opamp) with feedback configuration to implement the residue amplifier. For scaled CMOS technologies, the opamp design is difficult to implement with low power consumption and low supply voltage. Another amplifier-based ADC architecture is two-step ADC. It uses only one residue amplifier with certain number of comparators. However, based on traditional design concept, the opamp needs higher dc gain and feedback compensation, which consumes a large amount of power.

In this thesis, a simple residue amplifier is applied with open-loop single-stage architecture, which is improved by a proposed background calibration technique. The proposed calibration technique can be applied to most amplifier-based architectures in the background without interrupting ADC normal operation. It also corrects the non-idealities of the residue amplifier: gain error and nonlinearity. To benefit scaled CMOS VLSI, the correction is achieved in the digital domain. Minor circuit changes in the calibrated ADC can be easily implemented on most amplifier-based ADCs. The calibration scheme is robust since its effectiveness does not rely on the input's amplitude distribution. Except the residue amplifier, the power consumption of the comparator is also reduced by the proposed offset compensation mechanism. All analog circuits are designed with lower accuracy requirement to adapt for scaled CMOS technologies. A 10-bit 100-Msamples/s two-step ADC is fabricated using a 90 nm CMOS technology. The two-step ADC contains one residue amplifier and two flash sub-ADCs, is the best ADC architecture to evaluate both digital calibrated residue amplifier and latch-type comparators with an offset calibration loop. Its measurement results demonstrate the feasibility of the calibration technique and the benefit of low power comparator.

#### 1.3 Organization



The organization of the thesis is described as follow:

Chapter 2 gives an overview of several ADC architectures. For every ADC architecture, a brief analysis is given to clarify their features. To realize different ADC architectures can give readers a clear picture to understand the ADC characteristics or design high-performance low-power ADCs.

Chapter 3 examines the prior comparator designs by using traditional methods: averaging, interpolation and offset storage. Considering the power consumption of traditional design methods, the offset compensation techniques are then discussed. The proposed low power comparator design and analysis provide a clear picture to show its strength for comparator-based ADC architectures.

In Chapter 4, the features of the calibration technique are discussed firstly. Some nonlinear calibration techniques are then analyzed with brief descriptions. To benefit scaled CMOS VLSI, a digital nonlinear background calibration scheme is proposed and analyzed.

In Chapter 5, the prototyping ADC's implementation is described, including comparator, amplifier and digital calibration processor. The experimental results shows the static and dynamic performances to prove the proposed calibration technique for residue amplifier, and the offset compensation for comparators is also verified. A brief summary for 10-bit ADC is provided to show the achievement of the proposed two-step ADC.

Finally, conclusions and future works are drawn in Chapter 6.



### Chapter 2

### Overview of the ADCs

#### 2.1 Introduction

A Nyquist-rate ADC samples and digitizes an analog signal by using a combination of comparators, amplifiers, analog switches, and digital circuits. Many factors are considered in choosing an ADC architecture, including sampling rate, resolution, power consumption, input loading, chip area, and fabrication technology. Here we consider the following ADC architectures: flash ADC, successive-approximation register (SAR) ADC, subranging ADC, two-step ADC and pipelined ADC. In general, according to the key elements usage, these ADCs can be distinguished into two groups: comparator-based or amplifier-based, as shown in Figure 2.1. Flash ADCs, SAR ADCs and subranging ADCs are belong to the group of comparator-based ADCs. Pipelined ADC and two-step ADC are in the group of amplifier-based ADCs. Since the choice of the ADC architecture is highly dependent on the application fields, the understanding of the ADCs is a fundamental necessity. Actually, in recent years, more and more designs use two or more architectures to be a hybrid ADC architecture. The more you can know about ADCs, the better ADC architecture you can determine to achieve the target specification.

In recent years, many techniques were proposed to reduce the power dissipation of these ADCs. Some of them are based on the power reduction of the fundamental elements, which is called the 'element-based' power reduction technique. These techniques, such as opamp-sharing and switched-opamp, modify the analog circuits to reduce their



Figure 2.1: Two simple partitions of Nyquist-rate ADCs.

#### 2.1. INTRODUCTION 9

power consumption. Using these techniques can reduce the power consumption of the analog circuits, but the ADCs still need good enough opamps to achieve certain performance. Different from the 'element-based' technique, some designs develop the calibration mechanism, which is called the 'calibration-based' power reduction technique, to maintain the ADC's performance. Most of them uses digital calibration techniques to improve the power efficiency and relax the analog circuits.

Except the power dissipation, the design robustness is another important factor to evaluate the strength of ADC architectures. In SOC chips, to lower down their power consumption, the analog circuits must use the same type of transistors and same supply voltage as digital circuits. Therefore, the design robustness can be defined as whether the ADC architecture is sensitive to process migration or not?

To evaluate ADC's performance, a generic FOM is defined as follow,

$$
FOM = \frac{Power}{min(F_{\rm ss} \cdot 2ERBW) \times 2^{ENOB_{DC}}}
$$
 (2.1)

where  $F_s$  is the ADC's sampling frequency, ERBW is the ADC's effective resolution bandwidth, and  $ENOB_{DC}$  is the ADC's effective number of bits at low input frequency. For every ADC architectures, their FOM represents the strength and weakness. This can help us to think about what kind of ADC architectures you can use or develope to achieve the target specification.

In this chapter, the track-and-hold (T/H) circuits are firstly discussed in Section 2.2. For most ADCs, the T/H circuit is necessary to process analog input signals. Section 2.3 describes the high-speed flash ADC architecture. The most energy-efficient SAR ADC architecture is discussed in Section 2.4. With scaled CMOS technologies, the SAR ADC can operate at faster speed with ultra low power consumption. Section 2.5 describes the subranging ADC which can extend comparator-based ADCs to achieve 10-bit or higher resolution with medium operating speed. Section 2.6 shows the two-step ADC architecture to improve the speed bottleneck of the subranging architecture. The most popular architecture for medium-to-high resolution is the pipelined ADC architecture. The pipelined architecture which provides good performance for resolution and speed is described in Section 2.7. Section 2.8 draws the summary.

#### 2.2 Track-and-Hold Amplifier

The track-and-hold (T/H) circuit is the key element for A/D conversion. Actually, the T/H determines overall ADC accuracy since the analog input signal and all perturbations are mixed as the input of the ADC. In general, the ADC can not distinguish original input signal from the output of the T/H circuit.

Figure 2.2 shows three fundamental architectures to implement the T/H function in single-ended configuration. The simplest one, shown in Figure 2.2 (a), using a sampling switch S1 and a hold capacitor  $C_S$  to implement. The clock signal  $\phi_1$  controls the switch S1 to build the connection between input and output signals. At the hold phase  $(\phi_1=0)$ , the input signal is stored in the capacitor  $C_S$ . The capacitance of  $C_S$  is determined by either the thermal noise or the matching accuracy for resolution requirement. This T/H can operate at very fast speed, which is limited by on-resistance of the switch. The RC time constant must be small enough to meet operating speed and resolution requirement. With zero static power consumption, this configuration is popular for high-speed flash ADC architectures. However, at the hold phase, the hold signal can be affected by other disturbances to reduce its accuracy because it is a high-impedance node. The capacitor *C<sup>S</sup>* may needs larger capacitance to suppress these disturbances, but it also slows down T/H's operating speed.

To improve this issue, an analog buffer is added to the switched-capacitor configuration. Figure 2.2 (b) shows a source-follower based T/H. The source follower provides better buffer capability to reduce the disturbances, but it also consumes certain amount of power. The buffer capability is proportional to its power consumption. The sourcefollower has several issues.

First issue is the linearity. For nanometer CMOS technologies, the supply voltage is lower down to around 1 V. It makes the source-follower to operate at low supply voltage with transistors which have severe channel length modulation effect. The current source is sensitive to the output signal range and not easy to keep constant. The current variation attenuates the output signal which is input-dependent. That causes the severe non-linearity at the output signal. Another cause of non-linearity is the threshold voltage  $V_{th}$  with nonzero source-bulk voltage  $V_{SB}$ . This can be suppressed by using pMOS transistor as M1



Figure 2.2: Simplified T/H configurations: (a) switched-capacitor, (b) source-follower and (c) flip-around.

with replica-bias control to virtually connect the source and bulk terminals.

The second issue is the level-shift problem. In general, the level-shift at the gate and source nodes of transistor M1 is necessary. It is the summation of the threshold voltage of M1 and the gate-overdrive voltage, *Vov*. Depending on the supply voltage, over-drive voltage  $V_{ov}$  is designed between several tens mV and several hundreds mV. At low supply voltage, the level-shift will deteriorate the linearity of a source follower. To avoid this, the zero- $V_{th}$  transistors can be applied for the transistor M1. However, it induces more production cost due to extra masks usage. Using zero-*Vth* transistor, the gate leakage issue must be carefully considered. In general, the source follower is applied for the ADCs with lower resolution ( $\leq 10$ -bit). Actually, low supply voltage highly affects the sourcefollower operation.



To improve these issues from the source-follower, the flip-around track-and-hold amplifier (FA-THA) was proposed, shown in Figure 2.2 (c). This architecture is usually applied for high-speed applications [2]. During the track phase  $(\phi_1=1)$ , the input signal is sampled onto the the capacitor  $C_S$ . The operation amplifier (opamp) is at reset mode by switches S2 and S4. During the hold phase  $(\phi_2=1)$ , the capacitor  $C_S$  is connected to the opamp's output node to act as a feedback capacitor by switch S3. The early clock  $\phi_{1a}$ is applied to doing the bottom-plate sampling. By the close-loop configuration, the hold input signal can be represented at the output node of the opamp. The FA-THA shows perfect performance until facing to the nanometer CMOS technologies. Other opamp-based THA were also proposed to provide good performance, such as charge-transferred THA [3], charge-redistribution THA [4] and pre-charged THA [5]. These opamp-based THAs usually dissipates large power, which is about one third power consumption of the ADC or more. Therefore, for low power application, the opamp-based THA may not become a good choice for ADCs. In recent years, many low power ADC designs use only switchcapacitor configuration, as shown in Figure 2.2 (a), to implement the T/H function with some circuit modifications.



Figure 2.3: A flash ADC architecture.

#### 2.3 Flash Architecture

Flash ADC, shown in Figure 2.3, is the most commonly known architecture to be used for very high speed and low resolution applications. It contains an input sampler, a resistor ladder, a comparator array and an encoder. At the sampling phase  $(\phi_1 = 1)$ , the input signal is sampled onto the capacitor,  $C_s$ . At the comparison phase ( $\phi_1 = 0$ ), all of the comparators make comparisons with their individual slice levels to achieve high-speed operation. If the input signal is larger than its slice level, the output of the comparator is '1'. If the input signal is smaller than its slice level, the output of the comparator is '0'. Output data of the comparator array performs as a thermometer code. The digital output, *Dout*, is obtained by encoding this thermometer code in the encoder. These slice levels are provided by a resistor string, which is connected to the top and bottom reference voltages,  $V_{RT}$  and  $V_{RB}$  respectively. The difference between  $V_{RT}$  and  $V_{RB}$  defines the input dynamic range of the ADC. For N-bit resolution, there are at least  $(2^N - 1)$  comparators used to construct the overall quantization range. If the resolution is higher, the power consumption of the flash ADC will grow up dramatically.

For a high-speed flash ADC, the linearity is dominated by the input sampler, the resistor ladder and the comparators. Due to lower resolution requirement, in general, the linearity of the input sampler and the resistor ladder can be easily maintained. Therefore, the comparators determine the overall ADC linearity. Figure 2.4 shows a simplified general comparator design. It contains a pre-amplifier, a regenerative latch and digital circuits. The pre-amplifier is applied to reducing large input offset of the latch by its gain. If the amplifier's gain is enough to suppress the latch's offset, the equivalent input offset voltage is only dominated by the pre-amplifier. The output signals,  $D_p$  and  $D_n$ , are digitallike analog signals to represent whether the input signal *Vin* is larger than reference voltage  $V_R[k]$  for the k-th comparator or not. If  $V_{in}$  is larger than  $V_R[k]$ , the comparator output  $D<sub>O</sub>[k]$  is '1'; otherwise,  $D<sub>O</sub>[k]$  is '0'. There are two characteristics of the comparator necessary to be considered: one is the input offset voltage and the other is the metastability. Both of them produce 'Bubbles' in the thermometer code. These Bubbles will introduce wrong result at the encoder output to degrade the overall ADC performance.

The input offset voltage is generated by the mismatch between input- and reference-



Figure 2.4: A general comparator design.

related transistors, resistors or capacitors (if they are used in the comparators). The mismatch is mainly due to the imperfect manufacture in the semiconductor process and temperature variation during the operation. With the input offset voltage  $V_{OS}$ , the equivalent k-th reference voltage can be expressed as *VR,eq*[*k*],

$$
V_{R,eq}[k] = V_R[k] + V_{OS}
$$
\n(2.2)

The above equation shows that the input offset voltage is necessary to be removed to maintain the original reference level for each comparator. To lower down the equivalent input offset voltage, these mismatch must be reduced by enlarging the size of the transistors in the pre-amplifier. However, using larger transistors usually results in large power dissipation. For a flash ADC, at the architecture level, there are several techniques proposed to mitigate the effect of comparator's input offset voltage. These techniques include spatial averaging [6, 7], interpolation [8, 9], offset storage [10, 11], calibrated redundancy [12, 13], fault-tolerant encoding [14], feedback compensation [15, 16], and so on.

Metastability, an unwanted output state for the latch in the comparator, occurs while the input signal is very close to the reference level. If metastability happened, the output of the comparator is decided by the threshold level of the digital circuits. In general, metastability is caused by less driving current at the output nodes or short comparison time. It means larger power consumption is necessary to avoid this issue at certain sampling period. If comparators are applied to very high-speed flash ADC, the metastability issue must be carefully considered because the metastability condition usually does not happen in the design stage. In general, post-layout simulation can provide real condition

| Design             |                 |      | Process   Supply   Resolution | Speed             | <b>ENOB</b>      | Power            | $\mid$ FOM (/conv.-step) |
|--------------------|-----------------|------|-------------------------------|-------------------|------------------|------------------|--------------------------|
| [17]               | $90 \text{ nm}$ | 1.0V | 8-bit                         | $1.25-GS/s$       | 7-bit            | $207 \text{ mW}$ | $1.3$ pJ                 |
| $18$ ]             | $90 \text{ nm}$ | 0.9V | 6-bit                         | $3.5-\text{GS/s}$ | $5.3$ -bit       | 98 mW            | $0.95$ pJ                |
| $19$ ]             | $90 \text{ nm}$ | 1.0V | 5-bit                         | $1.75-GS/s$       | $4.8$ -bit       | 7.6 mW           | $0.15$ pJ                |
| $\lceil 20 \rceil$ | $65 \text{ nm}$ | 1.2V | 6-bit                         | $800$ -MS/s       | $\vert$ 5.63-bit | $12 \text{ mW}$  | $0.3$ pJ                 |

Table 2.1: Prior flash ADCs

due to the routing capacitance.

The input capacitance of the comparator is another issue for higher resolution flash ADCs. To reduce the input capacitance of the comparator array, a dedicated input sampler is usually built before the comparator array. The buffer is implemented by source follower circuits to maintain the buffering. For the high-speed A/D conversion, the input sampler can also improve the spurs-free dynamic range (SFDR). For over-GS/s operation, flash ADC architecture is the best choice to achieve the quantization for lower resolution applications (*<* 8-bit).

Table 2.1 presents several flash ADC designs. [17] and [18] use traditional design concept by using multi-stage pre-amplifiers to reduce the input offset of the comparator. Their FOMs are still around 1 pJ/conv.-step. On the other hand, [19] and [20] uses their offset-calibrated comparators to construct the flash ADCs. Their FOMs can be greatly improved to 0.15 and 0.3 pJ/conv.-step respectively. This result demonstrates the benefit of using offset-calibrated comparators to construct the flash ADCs.

#### 2.4 Successive-Approximation Architecture

SAR ADC is contrast to the flash architecture, using only one comparator with N-times quantization to achieve N-bit resolution. The offset voltage of the comparator is not important since it does not affect the conversion accuracy, only the offset of the overall SAR ADC. In general, the dedicated input sampler is necessary to maintain its operation. If the sampling time is equal to one quantization cycle time, the conversion time of the SAR ADC needs at least N+1 quantization cycles. It means the quantization cycle time must be (N+1)-times faster than the flash architecture to achieve the same Nyquist frequency.

Figure 2.5 shows a 4-bit SAR ADC and its quantization sequence. In general, a SAR



Figure 2.5: A 4-bit SAR ADC and its quantization sequence.

ADC uses binary search for the possible sub-regions. The speed bottleneck is the DAC with precision on the order of the converter itself. The DAC is constructed with a resistor ladder with switches to select the reference voltage closest to the input signal. For higher resolution (*>* 8-bit), large amount of switches will limit the operating speed due to the wire loading capacitance at the node voltage, *Vda*, shown in Figure 2.5. Moreover, power consumption of the dedicated input sampler (SHA) will dominate the overall power dissipation of the SAR ADC. To avoid the above two issues, the DAC can be designed with the capacitors to do the subtraction based on the charge redistribution mechanism. However, the design consideration of the capacitors matching is necessary to maintain better performance.

The matching issue will increase the total input capacitance, which means the increasing of the input driving power. In recent years, several techniques are proposed to improve the ADC performance. Segmented capacitor array can reduce the necessary input capacitance to lower the input driving power. Redundancy quantization cycles [21] can relax the DAC settling time. Furthermore, asynchronous clocking [22] can reduce the overall quantization time to speed up the conversion rate of the SAR ADCs. Generally speaking, the SAR architecture shows an excellent trade-off between accuracy and speed for low-power

| Design |                  |       | Process   Supply   Resolution | Speed                            | ENOB      | Power                    | FOM (/conv.-step) |
|--------|------------------|-------|-------------------------------|----------------------------------|-----------|--------------------------|-------------------|
| [23]   | $90 \text{ nm}$  | 1.0V  | 7-bit                         | 150-MS/s   6.5-bit   133 $\mu$ W |           |                          | 10f               |
| 124    | $90 \text{ nm}$  | 1.0V  | 9-bit                         | $40$ -MS/s                       |           | $8.6$ -bit   820 $\mu$ W | 54 fJ             |
| [21]   | $65 \text{ nm}$  | 1.2 V | $10$ -bit                     | $100$ -MS/s                      |           | $9.5$ -bit   1.13 mW     | $15.5 \text{ fJ}$ |
| [25]   | $130 \text{ nm}$ | 1.2V  | $12$ -bit                     | $45$ -MS/s                       | $11$ -bit | $3.0 \text{ mW}$         | 31.4 fJ           |

Table 2.2: Prior SAR ADCs

application to achieve low-to-medium resolution ADCs.

Table 2.2 presents several SAR ADC designs with ultra-low FOMs. Benefit from CMOS scaling, SAR ADCs achieve incredible performance on power consumption. [23] shows the advantages of SAR ADC architecture with amazing FOM, 10 fJ/conv.-step. [24] gives us the possibility that SAR ADC can achieve 9-bit or higher resolution by using only comparators, without residue amplification. In 2010, two SAR ADC designs [21] and [25] can achieve 10-bit 100-MS/s and 12-bit 45-MS/s performance respectively. This implies that SAR ADC may extend its application to higher resolution (*>* 10-bit) ADCs with sampling rate of around one hundred MS/s.

#### 2.5 Subranging Architecture

Subranging ADC can be viewed as a trade-off between flash and SAR architectures. By quantizing the analog input signal into two steps, compared with the flash ADC, the number of comparators can be reduced significantly. For N-bit resolution, this ADC only needs (2*N/*2+<sup>1</sup>−2) comparators. Therefore, this architecture is usually applied to achieving 8-bit or higher resolution. A general 10-bit subranging ADC architecture and its quantization curves is shown in Figure 2.6. The subranging ADC quantized the input signal by two steps: coarse quantization and fine quantization. In general, a dedicated input sampler (SHA) is applied to maintaining the operation. The subranging ADC has a 5-bit coarse ADC consisting of 31 comparators and a 6-bit fine ADC consisting of 63 comparators. Coarse ADC makes the first quantization to yield coarse code,  $D_1$ . It then drives a MUX to select reference voltages which are close to the input for fine ADC. Fine ADC makes the second quantization to yield  $D_2$  code, which is the exact magnitude of the input. Finally, the encode logic combines  $D_1$  and  $D_2$  to generate the 10-bit  $D_{out}$ .


Figure 2.6: A 10-bit subranging ADC architecture.

| Design | Process         | Supply      | Resolution | Speed       | <b>ENOB</b> | Power                     | $\vert$ FOM (/conv.-step) |
|--------|-----------------|-------------|------------|-------------|-------------|---------------------------|---------------------------|
| 16     | $90 \text{ nm}$ | 1.2V        | 6-bit      | $1-GS/s$    | $5.3$ -bit  | $\frac{55 \text{ mW}}{2}$ | $1.4 \text{ pJ}$          |
| [26]   | $90 \text{ nm}$ | 1.0V        | $10$ -bit  | $160$ -MS/s |             | 9.2-bit   $84 \text{ mW}$ | $0.89$ pJ                 |
| [27]   | $90 \text{ nm}$ | $1.2/2.5$ V | 8-bit      | $300$ -MS/s | $7.2$ -bit  | $34 \text{ mW}$           | $0.68$ pJ                 |
| [28]   | $90 \text{ nm}$ | 1.2 V       | 8-bit      | 770-MS/s    | $9.3$ -bit  | 70 mW                     | $0.94$ pJ                 |

Table 2.3: Prior subranging ADCs

Extra one-bit is designed for the fine ADC. These extra comparators can be used to relax the accuracy requirement of the coarse ADC. With the over-range protection, coarse ADC only need 5-bit accuracy, not 10-bit. There are two issues for the subranging ADC. One is that the comparators in the fine ADC still need 10-bit resolution, which is equal to the overall ADC accuracy. Such comparators are power-hungry circuits. The other is that the critical delay path is the MUX that has to select 63 reference voltages out of 1023 possible voltages generated by the resistor string. The MUX is constructed by the switches, which are made by MOS transistors. For nanoscaled CMOS technologies, the MOS switches usually have larger on-resistances if the bootstrapping techniques are not applied. Higher resolution also causes more complicated wire routing for these reference voltages. It results in larger parasitic capacitance at the references of the fine ADC. Both large parasitic capacitance and on-resistance increase the settling time of the reference voltages, which determine the overall ADC accuracy.

Due to the bottleneck of the MUX and high resolution comparator in fine ADC, subranging ADC has obvious trade-off between resolution and sampling rate. In recent years, the resolution of subranging ADC is limited by 10-bit. Table 2.3 presents several subranging ADC designs from 2006 to 2009. Due to CMOS scaling, the sampling rate can speed up to 1 GS/s, for 6-bit resolution. These FOMs are between 0.68 and 1.4 pJ/conv. step. However, from these measurement results, subranging ADCs have certain trade-off between resolution and operation speed since the FOM differences are not large. In general, the subranging architecture is applied for several hurdreds MS/s sampling rate and medium resolution of 8-to-10 bits.



Figure 2.7: A 10-bit two-step ADC architecture.

## 2.6 Two-step Architecture

To relax the fine ADC and simplify the MUX in the subranging ADC, two-step ADC architecture was proposed over twenty years. Similar to subranging ADC, this architecture uses two quantization steps. Figure 2.7 shows a 10-bit two-step ADC architecture and its quantization curves. It still has a 5-bit coarse ADC and a 6-bit fine ADC with one-bit over-range protection. Different from the subranging ADC, this ADC only needs the MUX to select one reference output out of 32 possible voltages to generate the input estimation,  $V_{da}$ . The reference voltages of fine ADC are fixed, not selected by a complex MUX. This architecture requires a residue amplifier that amplifies the difference between the hold input  $V_1$  and the MUX output  $V_{da}$ .

#### **ULLULLE**

Actually, two-step ADC can be viewed as a hybrid architecture of comparator-based and amplifier-based architectures. In this example, there are 31 comparators in the coarse ADC and 63 comparators in the fine ADC. The required comparator's resolution in the fine ADC is relaxed by the amplifier's gain, shown in Figure 2.7. However, for this 10-bit two-step ADC, the residue amplifier still dissipates a large amount of power to achieve accurate gain and linearity. For scaled CMOS technologies, it is not easy to implement high-precision amplifier with low power consumption.

Table 2.4 presents several two-step ADC designs from 2001 to 2009. By traditional design concept, the two-step ADC can achieve 12-bit resolution and several tens MS/s sampling rate, but it still consumes a large amount of power [29]. Considering the design difficulty of a high accurate residue amplifier, there seems to have a bottleneck for twostep ADC design. Until 2009, [30] presents a 6-bit 1-GS/s ADC to achieve the FOM of 1.24 pJ/conv.-step. Different from traditional design concepts, [31] shows a better FOM by using digital calibration technique for residue amplifier and offset-calibrated comparators for coarse and fine ADCs. With these calibration techniques, two-step ADC can be a good candidate for low-power and high-speed application to get a good FOM, not larger than 100 fJ/conv.-step.

|      |                  |       | Design   Process   Supply   Resolution | Speed                  | <b>ENOB</b> | Power             | $\vert$ FOM (/conv.-step) |
|------|------------------|-------|----------------------------------------|------------------------|-------------|-------------------|---------------------------|
| [29] | $250 \text{ nm}$ | 2.5 V | $12$ -bit                              | $54$ -MS/s             |             | 10.3-bit   295 mW | $4.9 \text{ pJ}$          |
| [30] | $130 \text{ nm}$ | 1.2V  | 6-bit                                  | $1-GS/s$               | $5.3$ -bit  | 49 mW             | $1.24$ pJ                 |
| [31] | $90 \text{ nm}$  | 1.0V  | 10-bit                                 | 100-MS/s   $9.32$ -bit |             | $6 \text{ mW}$    | $0.092$ pJ                |

Table 2.4: Prior two-step ADCs



Figure 2.8: A 10-bit pipelined ADC architecture.

## 2.7 Pipelined Architecture

Pipelined ADC is a typical amplifier-based ADC architecture using several amplifiers to do residue amplifications. Different from the subranging ADC, shown in Figure 2.1, it is another trade-off between flash and SAR architectures with amplifier-based architecture. Figure 2.8 shows a conceptual N-bit pipelined ADC architecture with 1-bit/stage configuration. It consists of N stages, each including a SHA, a sub-ADC, a sub-DAC, and an amplifier with gain of 2. The basic operation is similar to the two-step architecture. The first stage samples the analog input signal. The sub-ADC quantizes the hold input to yield the digital output  $D_1$ .  $D_1$  drives the sub-DAC to generate the estimated input,  $V_i^{da}$  $j^{da}(D_j).$ The estimated input is subtracted from the hold input to generate the residue signal. It is then amplified to the next stage sampling. All stages are operated under two clock phases, sampling and amplification. All digital outputs are collected by the digital encoder with sequentially half clock delay to generate the final digital output, *Dout*.

Because the SHA is existed in each stage, after the amplification phase, the first stage can start sampling a new analog input signal while second stage processes the previous sample. All stages have the same operation process as the first stage, as the above described, to achieve the so-called pipelined architecture. For example, the first stage is k-th sampling and the second stage is  $(k-1)$ -th amplification; the third stage is  $(k-1)$ -th sampling and the forth stage is (k-2)-th amplification, et al. It means the operating speed of the pipelined ADC is determined by the operating speed of its first stage. The simplest architecture is the configuration of 1-bit per stage. Its transfer curve is shown in Figure 2.9(a). The sub-ADC consists of one comparator to decide the output  $D_j$  is 0 or 1. The 1-bit sub-DAC generates output of  $V_R$  or  $-V_R$  if  $D_j$  is 0 or 1 respectively. This simplest configuration can not deal with the comparator's offset. For the example shown in Figure 2.9(a), if the comparator has the offset of  $V_R/8$ , the digital output will have missing code due to the saturation of the transfer curve.

To solve this issue, the most common configuration is 1.5-bit per stage, as shown in Figure 2.9(b). The sub-ADC consists of two comparators with their slice levels of −*VR/*4 and *VR/*4, respectively. The amplified residue signal is typically in the range of {−*VR/*2*, VR/*2}. Due to the comparator's offset, the extra residue signal is still in the



Figure 2.9: Transfer curves for (a) 1-bit per stage and (b) 1.5-bit per stage.

range of  $\{-V_R, V_R\}$  which is available to processed by the next stage. With the redundancy of comparators, the comparator's offset requirement is relaxed. Actually the reserved output range tolerates the inaccuracy of sub-ADC, sub-DAC and the non-idealities of the opamp. It is the most popular architecture to achieve over 10-bit resolution ADCs with sampling frequency range from several tens MS/s to several hundreds MS/s.

Except the configuration of 1.5-bit per stage, multi-bit per stage configurations are also popular. The configurations of 2.5-bit per stage and 3.5-bit per stage are commonly used to reduce overall ADC power consumption. For such multi-bit configurations, the accuracy requirement of comparators is higher than that of 1.5-bit configuration. With the pipelined operation, this architecture makes an excellent optimization among power, speed and accuracy specifications.

The drawback of the pipelined architecture is the longer latency. Moreover, for advanced CMOS technologies, the opamp implementation becomes a severe issue. The amplifier-based ADC architectures always face the issues for lower supply voltage (*<* 1.2 V) and lower intrinsic gain of the MOS transistors. Low supply voltage limits the input dynamic range of the ADC and makes the amplifier difficult to implement for high-resolution. Lower intrinsic gain makes the amplifier architectures more complex to maintain certain performance, but larger power consumption. Actually, for a complicated opamp circuit, extra power dissipation is wasted on the circuit stability, not the operating speed.

To solve both issues, many digital calibration techniques are proposed to improve the power dissipation of the pipelined ADCs. Linear calibration techniques can solve the gain error due to capacitor mismatch and the sub-DAC errors, but opamp still needs enough gain. To mitigate opamp's gain requirement, nonlinear calibration techniques are proposed to improve the non-idealities, generated by opamp circuitry. To sum up, pipelined architecture is very popular to implement ADCs with various resolution (between 6 and 16 bits) and wider range of operation speed (between several MS/s to several GS/s).

Table 2.5 presents several pipelined ADC designs in recent two years. For higher resolution [32, 33] or higher sampling rate designs [34], their FOMs are between 0.3 and 0.5 pJ/conv.-step. In general, most higher resolution ADCs are noise-limited, but not matching-limited. Their power consumption will greatly increase due to larger sam-

| Design             | Process         | Supply | Resolution | Speed       | <b>ENOB</b> | Power              | FOM $/(conv. - step)$ |
|--------------------|-----------------|--------|------------|-------------|-------------|--------------------|-----------------------|
| $\lceil 34 \rceil$ | $90 \text{ nm}$ | 1.2V   | $10$ -bit  | $500-MS/s$  | $8.5$ -bit  | $55 \text{ mW}$    | 300 fJ                |
| $\left[32\right]$  | $90 \text{ nm}$ | 1.2V   | 14-bit     | $100$ -MS/s | $11.2$ -bit | $130 \text{ mW}$   | 520 fJ                |
| $\left[33\right]$  | $180$ nm        | 1.8 V  | $16$ -bit  | $125$ -MS/s | $12.8$ -bit | $\frac{1}{385}$ mW | 432 fJ                |
| $\left[35\right]$  | $90 \text{ nm}$ | 1.2V   | $12$ -bit  | $50-MS/s$   | $10$ -bit   | $4.5 \text{ mW}$   | 88 fJ                 |
| [36]               | $90 \text{ nm}$ | 1.0V   | $10$ -bit  | $100$ -MS/s | $8.9$ -bit  | $4.5 \text{ mW}$   | 98 fJ                 |

Table 2.5: Prior pipelined ADCs

pling capacitances. Actually, for 10 and 12-bit pipelined ADCs, their FOMs can be lower than 100 fJ/conv.-step. [35] presents a low power pipelined ADC by using zero-crossing based amplifiers in the pipelined stages to replace traditional opamps. [36] shows a 10-bit pipelined ADC with time sharing technique to reduce the overall ADC power consumption.

## 2.8 Summary



Several ADC architectures are illustrated in this chapter, except sigma-delta ADC and time-interleaved ADC architectures. Two categories are applied for these ADCs: comparatorbased and amplifier-based architectures.

Comparator-based ADCs are insensitive to CMOS scaling and benefit faster operation speed. Generally speaking, flash ADC architecture is used to achieve high-speed application (over GS/s). However, it is also the most power consuming architecture and not suitable for higher resolution (*>* 8 bits). Subranging ADC is a good choice to meet higher resolution requirement (between 8 and 12 bits) with lower power consumption. But its operation speed is still limited by the complicated MUX (several hundreds MS/s). In recent years, SAR architecture is popular to implement medium and high speed ADCs. It is more energy-efficient than other ADC architectures. Its resolution is between 6 and 12 bits with the sampling rate from several tens MS/s to several GS/s. Moreover, small occupied area is another advantage for SAR ADCs. But, multiple quantization steps slow down the overall operating speed if higher resolution.

Different from comparator-based ADCs, amplifier-based ADCs benefit higher resolution. Pipelined architecture is the most common architecture for ADCs, between 8 and

16 bits resolution. This architecture provides the best trade-off between resolution and speed. Amplifiers' power dissipation dominants the overall ADC power consumption. Another amplifier-based architecture is two-step ADC, which can be viewed as a specialized pipelined architecture. Different from pipelined ADC architecture, it only need one amplifier with lower output accuracy and more comparators in coarse and fine ADCs. In general, two-step ADCs have both amplifier and comparators with almost equivalent importance. Actually, two-step ADC architectures do not have the resolution limitation since the residue signal is amplified for next quantization stage. For higher resolution ADCs, the power consumption is noise-limited. However, for amplifier-based architectures, low supply voltage and low intrinsic gain of transistors are two issues for CMOS scaling, briefly illustrated in Appendix D.1.

For scaled CMOS technologies, except low supply voltage and low intrinsic gain of transistors, the gate leakage is another severe issue to affect ADC performance. Gate leakage deteriorates the droop rate of MOS transistors, used as hold capacitors or charge redistribution mechanism. It should be noted that the gate leakage issue must be carefully considered at the design stage. Gate leakage issue is discussed in Appendix D.2.

Figure 2.10 summaries the various ADC limitation boundaries between resolution and sampling frequency. We may find that pipelined ADC architectures have widest range, compared with other ADCs. Flash ADCs have the advantage at very high-speed operation, but limited by the resolution. Subranging and two-step ADCs have their benefits at medium resolution and medium operation speed. SAR ADCs have lower boundary compared with other ADCs, but actually benefit the smaller area and ultra low power consumption. However, with digital calibration techniques, these boundaries will become indefinite.

Figure 2.11 summaries the ADCs published on the ISSCC and VLSI from 1997 to 2010 [37]. In 2010 ISSCC, for 10-bit ADCs, their figure-of-merit (FOM) are less than 100 fJ/conv.-step. Most of them are SAR ADCs using the scaled CMOS technologies to achieve 10-bit resolution and several tens MS/s sampling rate. Actually, SAR ADC can almost reach the FOM of 10 fJ/conv.-step.



Figure 2.10: ADC limitation boundaries between resolution and sampling frequency.



Figure 2.11: ADC survey from 1997 to 2010 on two major conferences: ISSCC and VLSI.

# Chapter 3

# Comparator with O**ff**set Compensation

## 3.1 Introduction

On the demand of advanced nanoscale CMOS technologies, analog circuits are more and more difficult to achieve the required performance by using traditional design concepts. Among most analog circuits, comparator is the best one to against the CMOS scaling. Actually, analog comparator behaves like a digital circuit to provide fast operating speed. For a comparator-based ADC, it can benefit the advantage of advanced CMOS technologies.

In this chapter, the traditional design concepts and techniques are discussed in Section 3.2. Different from traditional designs, the feedback compensation mechanism is applied to reducing the equivalent input offset voltage of the comparators. Section 3.3 shows some feedback compensation schemes to cancel the comparator's input offset to achieve low power consumption. With the feedback compensation mechanism, a low power comparator design is proposed in Section 3.4. Finally, Section 3.5 summarizes these comparator design techniques.

## 3.2 Traditional Comparator Design

In general, the comparator consists of a pre-amplifier and a regenerative latch, as shown in Figure 3.1(a). The pre-amplifier is usually applied before the latch to reduce the latch's offset by the amplifier's gain, which is defined as *A*. Here the latch's and pre-amplifier's offset voltages are defined as  $V_{OS,L}$  and  $V_{OS,P}$  respectively. The overall input referred offset voltage of the comparator, *VOS*.

$$
V_{OS} = V_{OS,P} + \frac{V_{OS,L}}{A}
$$
\n
$$
(3.1)
$$

If single amplifier's gain is not enough, several amplifiers are usually serial-connected to provide enough gain to reduce the overall input offset voltage for the comparator Figure 3.1(b).

$$
V_{OS} = V_{OS,P1} + \frac{V_{OS,P2}}{A_1} + \frac{V_{OS,P3}}{A_1 \times A_2} + \frac{V_{OS,L}}{A_1 \times A_2 \times A_3}
$$
(3.2)

Figure 3.1(c) ans (d) shows a general pre-amplifier and a regenerative latch schematics. With the pre-amplifier's gain attenuation, the latch's input offset is greatly reduced. According to Equation (3.1), only the pre-amplifier's offset dominates the overall input offset voltage of the comparator.

For the offset of a source-couple pair in the pre-amplifier, the standard deviation of the input offset voltage,  $V_{OS,P}$ , is defined as

$$
\sigma^{2}(V_{OS,P}) = \sigma^{2}(\Delta V_{t}) + (V_{on}^{V})^{2} \times \frac{\sigma^{2}(\Delta \beta)}{\beta^{2}}
$$
\n(3.3)

where  $\Delta V_t$  is the threshold voltage difference between two input transistors,  $V_{ov}$  is the gate over-drive voltage of M1 and M2, and *∆β* is the *β* difference between two input transistors. With the following equations,

$$
\beta = \mu C_{ox} \frac{W}{L} \tag{3.4}
$$

$$
\sigma^2(\Delta V_t) = \frac{A_{V_t}^2}{W \cdot L}
$$
\n(3.5)

$$
\frac{\sigma^2(\Delta\beta)}{\beta^2} = \frac{A_\beta^2}{W \cdot L} \tag{3.6}
$$

the input offset voltage,  $V_{OS,P}$ , can be rewritten as

$$
\sigma^{2}(V_{OS,P}) = \frac{1}{W \cdot L} (A_{V_{t}}^{2} + \frac{V_{ov}^{2}}{4} \cdot A_{\beta}^{2})
$$
\n(3.7)

where *W* and *L* are the width and length of the input transistors M1 and M2,  $A<sub>k</sub><sup>2</sup>$  $\frac{2}{V_t}$  and  $A_\beta^2$ *β* are process-dependent matching parameters. According to Equation (3.7), the input offset



Figure 3.1: A traditional comparator design.

of pre-amplifier is reduced by using larger transistor's size and smaller gate-overdrive voltage for its input differential pair.

Considering a general pre-amplifier circuit, some key performance indices are defined as follow.

$$
Speed \propto \frac{g_m}{C_{GS}} \approx \frac{2I/V_{ov}}{(2/3) \cdot WL \cdot C_{ox}} \tag{3.8}
$$

$$
Power \propto I \cdot V_{DD} \tag{3.9}
$$

$$
Accuracy^2 \propto \frac{V_{DD}^2}{\sigma^2(V_{OS})} \approx \frac{WL \cdot V_{DD}^2}{A_{V_t}^2 + \frac{V_{ov}^2}{4} \cdot A_{\beta}^2}
$$
(3.10)

Actually, these indices can not be optimized for all of them, but some tradeoffs may be existed among them. Here a relationship between speed, power and accuracy [38] is defined to describe the tradeoff,

$$
\frac{Speed \times Accuracy^2}{Power} \propto \frac{1}{C_{ox} \cdot (A_{V_t}^2 + \frac{V_{ov}^2}{4} \cdot A_{\beta}^2)} \times \frac{V_{DD}}{V_{ov}}
$$
(3.11)

If the gate-overdrive voltage  $V_{ov}$  is proportional to the supply voltage  $V_{DD}$  and  $A_{\beta}$  is smaller enough, this relationship can be represented as

$$
\frac{Speed \times Accuracy^2}{Power} \propto \frac{1}{C_{ox} \cdot A_{V_t}^2}
$$
 (3.12)

With above assumption, Equation (3.12) figures out a physical limitation: the relationship is process-dependent only. This result shows that for thinner process  $(C_{ox}$  is larger), the tradeoff is smaller. It means the advanced nanoscale CMOS processes actually degrade the performance of analog circuits. Since the tradeoff is constant for same process, the speed or accuracy improvements will suffer the penalty of larger power dissipation. For example, more one-bit accuracy will induce four-times power consumption.

Although Equation (3.12) is made by some assumptions [38], it may not describe the scaled CMOS transistors precisely. This approximation equation can still provide a qualitative trend. To break the tradeoff degradation trend, several innovative techniques, mentioned in Section 2.3, were proposed to reduce the equivalent input offset of the comparator and achieve low power consumption. Some of them, including spatial averaging [6, 7], interpolation [8, 9] and offset storage [10, 11], are commonly applied to the flash ADCs. Here the qualitative analysis is described in the following section.

### 3.2.1 Spatial Averaging Technique

Spatial averaging technique improves the overall linearity of the ADC by averaging the error of an individual pre-amplifier with several adjacent pre-amplifiers. Figure 3.2 shows the averaging technique with resistor connections and its effect to improve ADC's linearity. There are three transfer curves shown in Figure 3.2. The black 'dash-dot' line is the ideal transfer curve for the comparator array without any offset. The green 'dash' line and blue 'solid' line represent real transfer curves before and after using spatial averaging technique respectively. With the resistors connected to its adjacent comparators, the equivalent input offset voltage is suppressed by the averaging mechanism.

The individual comparator offset can be treated as a random number and independent on others. This averaging mechanism uses resistor connection to perform the moving average function, by analog way. For each comparator, the moving average produces a spatial low-pass filter to suppress its large deviation by relating to adjacent comparators. The ratio,  $R_1/R_0$ , determines the degree of the offset reduction. Generally speaking, if the ratio is lower, the offset reduction is more. This is because of such 'analog' type moving average is constructed by the resistance, connected to other comparators. If the connecting resistor  $R_1$  is too large, the averaging path is weak to act as a real moving average. The averaging technique can easily reduce the differential non-linearity (DNL), but has less reduction for the integral non-linearity (INL). Figure 3.3 shows the DNL and INL improvements with respect to the ratio,  $R_1/R_0$ . The DNL and INL reduction factors are defined as  $R_{DNL}$  and  $R_{INL}$  respectively.  $W_n$  is defined as the number of the adjacent comparators which have significant influence. The detail analysis can be referred to [39, 40, 7].

By using the resistors to connect output nodes of adjacent pre-amplifiers, the input offset of the individual comparator is averaged to improve the linearity of flash ADC. To perform a good averaging, there are extra comparators necessary in the both ends of the comparator array to provide enough neighbors. But this also increase extra power consumption. To mitigate this redundancy, averaging termination [40] and triple-cross connection [41] techniques were proposed, but they still require extra power and area. Moreover, the equivalent gain of the pre-amplifier is reduced by the resistive network. It



Figure 3.2: Averaging technique to reduce the comparator's offset.



Figure 3.3: DNL and INL reduction factors with respect to  $R_1/R_0$ .



 $\sum_{\text{min}}$ 

is a tradeoff between the offset reduction and the power consumption. Another issue is the assumption that offset voltage for every comparator is independent. Actually, due to the layout spacing, the input offset of comparator is related to its neighbor comparators by process gradient. This is why the measured INL of flash ADC can not be improved as more as DNL. To avoid this gradient, the comparator array must be kept into a smaller layout area.

### 3.2.2 Interpolation Technique

Although the averaging technique reduces the input offset of the comparator, its power consumption is still too large since  $(2<sup>N</sup> - 1)$  comparators are used in a N-bit flash ADC. For a comparator, the first-stage pre-amplifier dominants its overall power dissipation. Using less required number of pre-amplifiers, the interpolation technique can be applied to generating necessary zero crossings.

#### 3.2. TRADITIONAL COMPARATOR DESIGN 39

According to different flash ADC architectures, the interpolation technique can be implemented by resistive network [42] or capacitor network [8]. In general, the interpolation technique is applied with the averaging technique by using the resistive network, shown in Figure 3.4. With the resistive interpolation, one extra differential signal is generated for the  $(k+1)$ -th comparator. By re-arranging the signal connections, two extra differential signals are generated for the k-th and  $(k+2)$ -th comparators. In Figure 3.4, there are three zero-crossing levels are produced by the interpolation technique. It is called a four-time interpolation, called the interpolation factor is 4. With this technique, the equivalent comparator's power consumption can be further reduced. However, the interpolation factor is limited by the requirement of integral non-linearity (INL). A larger interpolation factor requires a sufficiently high linear region of the pre-amplifier. It will introduce extra power consumption of the pre-amplifier.

The capacitive interpolation technique uses capacitors to reduce the required number of the pre-amplifiers [8]. Figure 3.5 shows the interpolation technique by using capacitive network. More zero crossings are generated by the charge redistribution from the adjacent amplifier's outputs. Generally speaking, this technique operates with the offset storage techniques by using the capacitors to achieve the charge redistribution. With the switchcapacitor operation, more zero-crossing levels are generated.

Similar to the spatial averaging technique, the interpolation techniques require preamplifiers with higher linear region at their output nodes. For scaled CMOS technologies, this requirement needs more serial-connected pre-amplifiers and introduces extra power consumption.

#### 3.2.3 O**ff**set Storage Techniques

Different from averaging and interpolation techniques, offset storage techniques use the switch-capacitor operation to cancel the input offset of the comparator. The principle of offset storage technique is using the capacitors to store the input offset of the amplifier in the reset phase and then cancel it in the comparison phase [43, 44, 45, 46]. for the offset storage techniques, there are two common approaches: output offset storage (OOS) and input offset storage (IOS).



Figure 3.5: Interpolation technique by capacitive network.

#### 3.2. TRADITIONAL COMPARATOR DESIGN 41

Figure 3.6 shows the offset cancellation based on the OOS technique. In Figure 3.6 (a), OOS is applied to the comparator, which consists of a pre-amplifier, a sampling capacitor and a regenerative latch. With the OOS technique, the offset is canceled by the capacitor at the output nodes of the pre-amplifier. At the reset phase  $(\phi_1 = 1)$ , the amplified offset is stored at the capacitor  $C_o$ . At the comparison phase ( $\phi_2 = 1$ ), the input signal and the input offset of pre-amplifier are amplified together. Since the stored amplified offset at reset phase has opposite polarity compared with the amplified offset at the comparison phase, the input offset of the pre-amplifier is canceled. With single-stage OOS, the equivalent input offset of the comparator is reduced to

$$
V_{OS} = -\frac{\Delta Q}{A \cdot C_o} + \frac{V_{OS,L}}{A} \times \frac{C_o + C_p}{C_o}
$$
(3.13)

where A is the gain of the pre-amplifier, *∆Q* represents the total amount of charge injection and clock feed-through of the switch S3,  $C_p$  is the parasitic capacitance of the node  $V<sub>o</sub>$ . From Equation (3.13), the input offset of the pre-amplifier is totally canceled. The S3-induced offset voltage is constant by switching off the switch S3 before the switch S2. Large  $C<sub>o</sub>$  can reduce the equivalent input offset, but slow down the amplifier's speed.

Similar to averaging technique with multiple-stage pre-amplifier configuration, shown in Figure 3.6 (b), can be applied to producing large enough gain to suppress the latch's offset voltage and S3-induced offset. However, the amplifier design must cover the input common-mode range, which limits the input dynamic range.

Another offset storage technique uses input capacitor to store and cancel the input offset of pre-amplifier. Figure 3.7 (a) shows the offset cancellation which is based on the IOS technique. At the reset phase  $(\phi_1 = 1)$ , the amplifier is connected as unit-gain feedback, and the offset information is stored into the capacitor *C<sup>i</sup>* . At the comparison phase ( $\phi_2 = 1$ ), the input signal is connected to one terminal of  $C_i$ , and the offset of the pre-amplifier is canceled by the pre-stored offset. With single-stage IOS, the equivalent input offset is reduced to

$$
V_{OS} = \frac{V_{OS,P}}{A+1} - \frac{\Delta Q}{C_i} - \frac{V_{OS,L}}{A}
$$
 (3.14)

where  $V_{OS,P}$  is the offset of the pre-amplifier. In Equation (3.14), the offset reduction is determined by the gain of amplifier *A* and the capacitance of *C<sup>i</sup>* . The gain *A* is just necessary large enough to suppress the offset voltage of the latch. Multiple-stage configuration is



Figure 3.6: Output offset storage technique to cancel the comparator's input offset.



Figure 3.7: Input offset storage technique to cancel the comparator's input offset.

necessary if single-stage gain is not enough. Figure 3.7 (b) shows the multiple-stage IOS technique to reduce the input offset. However, more pre-amplifiers means more power consumption is necessary to achieve the offset reduction. Another drawback is the usage of the unit-gain feedback amplifier. While connected as unit-gain feedback, closed-loop stability of the pre-amplifier must be carefully concerned.

Same design issue as the averaging technique, the usage of pre-amplifiers is necessary for the offset storage techniques. Actually, based on the usage of pre-amplifiers, the averaging, interpolation and offset storage techniques are combined to achieve a low power design [45]. The power consumption can be reduced, but it seems not enough to meet the low power specification.

## 3.3 O**ff**set Feedback Compensation Schemes

Except using averaging, interpolation and offset storage techniques, the feedback compensation is another way to cancel the input offset of the comparator. Different from offset storage techniques, it do not use capacitors to store and cancel the input offset voltage at the input or output paths. Different from averaging or interpolation techniques, it does not use parallel resistors or capacitors to connect with adjacent comparators to reduce its equivalent input offset voltage. The feedback compensation mechanism uses the comparator's output to judge the offset is positive or negative, then adjust the comparator's controlled elements to cancel the input offset. Figure 3.8 shows a latch comparator with a feedback compensation loop. In Figure 3.8, the latch is modeled as an ideal latch with its input offset,  $V_{OS}$ . According to the comparator's output  $D_c$ , the offset information is stored and processed by the offset estimation block. By using the results from the offset estimation, the offset compensation block adjusts  $V_{adj}$  to cancel the input offset.

$$
V_{adj} = -V_{OS} \tag{3.15}
$$

Considering a regenerative latch schematic, shown in Figure 3.9 (a). It can be viewed as a simplified regenerative latch used in Figure 3.8 with ignoring the coupling capacitance *C<sup>c</sup>* between output nodes. Most regenerative latches can be simplified as such circuit to analyze their behavior. The mathematical model, shown in Figure 3.9 (b), is simplified



Figure 3.8: A latch comparator with a feedback compensation loop.

by using first-order parameters, *G<sup>m</sup>* and *C*. Referred to [47], the offset is defined as

$$
V_{OS} = \sqrt{\frac{G_{m2}C_2}{G_{m1}C_1}} \times (V_{o2}(0) - V_{S2}) + (V_{S1} - V_{S2})
$$
(3.16)

where  $G_{m1,2}$  are the sum of the pMOS and nMOS trans-conductance terms, and  $C_{1,2}$  are the load capacitance at nodes  $V_{o1,2}$  respectively.  $V_{S1,2}$  are the switching voltages of each inverter, i.e., the voltage at which pMOS and nMOS currents are equal. The values of *V<sup>S</sup>*1*,*<sup>2</sup> depend on VDD and also on  $G_m$  and  $V_{th}$  of the constituent devices. In Equation (3.16), the input offset is yield by the mismatch between  $G_{m1}$  and  $G_{m2}$ ,  $C_1$  and  $C_2$  or  $V_{S1}$  and  $V_{S2}$ . Since the main contributor to the mismatch between  $V_{S1}$  and  $V_{S2}$  is typically the mismatch in the device threshold voltages, which are process-dependent. To yield a zero offset, we may adjust  $G_{m1,2}$  or  $C_{1,2}$  to achieve the following equation.

$$
\sqrt{\frac{G_{m2}C_2}{G_{m1}C_1}} = \frac{V_{S2} - V_{S1}}{V_{o2}(0) - V_{S2}}
$$
\n(3.17)

With Equation (3.17), the equivalent input offset  $V_{OS}$  is canceled to zero. However, by the simulation results in [47], this model can not provide enough accuracy to describe the offset voltage of the regenerative latch.

To improve the accuracy of the offset modeling, the effect of the coupling capacitor



Figure 3.9: Simplified regenerative latch (a) schematic and (b) mathematical model.

*C<sup>c</sup>* must be considered. The offset of the regenerative latch is now re-written as

$$
V_{OS} = \beta \times (V_{o2}(0) - V_{s2}) + (V_{S1} - V_{S2})
$$
\n(3.18)

where  $\beta$  is defined by the following equations.

$$
\beta = \alpha \sqrt{\frac{(C_2 + C_c)C_1 \sqrt{\kappa} + (G_{m1} - G_{m2})C_c}{(C_1 + C_c)C_2 \sqrt{\kappa} - (G_{m1} - G_{m2})C_c}}
$$
(3.19)

$$
\alpha = \sqrt{\frac{G_{m2}C_2}{G_{m1}C_1}}
$$
\n(3.20)

$$
\kappa = (G_{m1} + G_{m2})^2 C_c^2 + 4G_{m1} G_{m2} (C_1 C_2 + C_c C_1 + C_c C_2)
$$
(3.21)

With similar process, the offset can be canceled by adjusting  $G_{m1,2}$  or  $C_{1,2}$  to achieve the following equation,

$$
\beta = \frac{V_{S2} - V_{S1}}{V_{o2}(0) - V_{S2}}\tag{3.22}
$$

According to the simulation results in [47], the impact of coupling capacitance  $C_c$  between output nodes can not be neglected. In practice, *C<sup>c</sup>* stems mostly from gate-drain capacitance of the four transistors and can usually be estimated with sufficient accuracy from the technology data. However, the complexity of Equation (3.18)-Equation (3.21) does not change the same fact that adjusting  $G_{m1,2}$  or  $C_{1,2}$  can cancel the input offset to zero.

In Equation (3.22), four parameters are considered to adjust the input offset:  $G_{m1,2}$  or  $C_{1,2}$ . If  $G_{m1}$  and  $G_{m2}$  are fixed, adjusting  $C_1$  and  $C_2$  can achieve the zero offset [15, 23, 24, 48]. On the other hand, if  $C_1$  and  $C_2$  are fixed, adjusting  $G_{m1}$  and  $G_{m2}$  can also achieve the zero offset [49, 50, 51, 52, 16, 19, 53, 54, 55, 56, 57].

### 3.3.1 The Analog Latches

There are many analog latch circuits, some of them are shown in Figure 3.10. For these analog latches, three issues are important to be considered: power consumption, input offset , kickback noise and input common-mode variation. First is the power consumption. Since these latches are considered with offset compensation schemes, their static power consumption should be avoided. About the input offset voltage, using these compensation schemes, it is minor issue to take into account. Kickback noise is a major issue to be considered since the pre-amplifier may not be applied before these latches. It is decided by the coupling path between output and input nodes. Input common-mode variation is caused by large input dynamic range. It is necessary to be suppressed to reduce the resultant dynamic offset voltage.

In Figure 3.10 (a), the input devices act as voltage-controlled resistors. It does not consume static power and has lower kickback noise. Large input offset is its disadvantage since the voltage-controlled resistors are highly process-dependent. Its input commonmode variation is directly dependent on its input signals. In Figure 3.10 (b), the latch circuit is equalized at reset phase. It can produce less input offset, faster regeneration speed and lower input common-mode variation. But it also consumes static power at reset phase and has larger kickback noise. Figure 3.10 (c) shows a commonly used analog latch with lower power consumption. It is usually used as a sense amplifier in the RAM cell [58]. This latch has small input offset, faster regeneration speed and no static power consumption. However, it has larger kickback noise and input common-mode variation.

### 3.3.2 O**ff**set Compensation

In Figure 3.8, to build an offset feedback compensation loop, there are two sub-blocks are added: one is the offset estimation, the other is the offset compensation. Because



Figure 3.10: The analog latches.

the offset is dependent on continuous temperature variation, the offset estimation and compensation should operate periodically, not just at the power-on state. In the following description, the offset compensation is firstly discussed to describe various methods for a regenerative latch. The offset estimation methods are provided later to complete the overall compensation loop.

Offset compensation is the way to suppress the equivalent input offset of a comparator. In Figure 3.8, with the results of offset estimation, the input offset can be canceled by the compensation circuits. In previous descriptions, there are two ways to cancel the input offset: *Gm*-adjustment or *C*-adjustment. Firstly, the *Gm*-adjustment is discussed.

In a physical comparator design, the *Gm*-adjustment compensations are dependent on the comparator's architectures. Here a regenerative latch without pre-amplifier, shown in Figure 3.11, is considered. It consists of an input differential pair and two cross-coupled inverters. Its input offset voltage can be represented as

$$
V_{OS} = V_{OS,in} + V_{OS,latch}
$$
\n(3.23)

where  $V_{OS,in}$  defines the equivalent input offset due to the differential pair,

$$
V_{OS,in} = \Delta V_{ih} + \left(\frac{V_{ov}}{2}\right) \times \frac{\Delta \beta}{\beta}
$$
 (3.24)

and *VOS,latch* defines the equivalent input offset due to the regenerative latch, as in Equation (3.18). In general, the trans-conductance adjustment is similar to adjust the conducting currents  $(I_1 \text{ and } I_2)$  at the output paths for the regenerative latch shown in Figure 3.11. There are three ways to control the conducting currents of the regenerative latch: inputinjection, body-bias control and auxiliary adjustment pair.

Input-injection is the way to inject extra voltage amount at the input nodes [51, 52] to change the equivalent input offset of the comparator, shown in Figure 3.12 (a). For the input paths, one resistor is inserted between the input node and the gate node of the transistor. To adjust the differential input offset, two K-bit current-DACs are placed to control the amount of the adjustment voltage *Vadj*.

$$
V_{adj} = -(I_p - I_n) \cdot R = -V_{OS}
$$
 (3.25)

This configuration directly changes the offset voltage of input pair and generate different trans-conductances for the regenerative latch. The caused voltage drop determines the



Figure 3.11: *Gm*-Adjustment Offset Compensation Methods

resolution of the adjustment. Without the pre-amplifier, the standard deviation of the input offset may have the range of several tens  $mV$ . Since necessary input offset is less than several  $mV$ , the high-precision DACs are necessary for all comparators. Such accurate analog circuits always consumes a large amount of power, which is contrast to the design concept of offset compensation for a low power comparator.

The second way is to adjust the body-bias voltages of the input pair to change the conducting currents [56, 57], shown in Figure 3.12 (b). Adjusting the body-bias voltage is to change the body voltage of the input transistors. Based on the body effect, threshold voltage  $(V<sub>th</sub>)$  of the transistor is adjusted by the following equation.

$$
V_{th} = V_{th0} + \gamma \cdot \left( \sqrt{|2\phi_F + V_{SB}|} - \sqrt{|2\phi_F|} \right) \tag{3.26}
$$

Without adding elements on the input paths, this configuration provides faster operation speed. However, it has two major disadvantages: one is that the isolated body terminals for MOS transistors needs extra production cost. Although pMOS transistor can avoid this issue, it also slows down the operation speed due to the pMOS's characteristics. The other is the tuning range of  $V_{adj}$ . Since the threshold voltage  $V_{th}$  is affected by the adjustment









Figure 3.12:  $G_m$ -Adjustment: (a) input-injection (b) body-bias control (c) auxiliary compensation pair.

voltage  $V_{SB}$  with square-root operation and the ratio  $\gamma$ , its tuning range is not easy to be large enough. Furthermore, large *Vth* also slows down the overall ADC comparison speed.

The third way is to add another auxiliary adjustment pair to change the output currents [16, 19, 53, 54, 55], shown in Figure 3.12 (c). With this configuration, the equivalent input offset is modified as follow,

$$
V_{OS} = V_{OS,in} + V_{OS,compen} + V_{OS,latch}
$$
\n(3.27)

where  $V_{OS,compen}$  defines the input offset due to the auxiliary compensation pair. This method may induce larger input offset due to the auxiliary compensation transistors. But after compensation, the induced input offset can be canceled. This configuration also needs accurate DACs to compensate the input offset.

Figure 3.13(a) shows the methods to achieve the *C*-adjustment [15, 23, 24, 48]. There are two configurations usually used to cancel the input offset of the regenerative latch. One is the output capacitance adjustment shown in Figure 3.13 (a), and the other is the X-node capacitance adjustment shown in Figure 3.13 (b). Both configurations can adjust the input offset but with different gain. The output configuration has higher gain to adjust the offset but it also slow down the operation speed. On the contrast, the X-node configuration has lower gain to adjust the offset but it provides faster operation speed to achieve high-speed operation.

For the implementation of adjustable capacitance, linear capacitance by the MOS transistor is not possible to be used for two reasons. One is that the capacitance of the MOS transistor is not accurate enough by the process issue. The other is that the control range is too small for a MOS capacitor. The capacitance is very sensitive to the voltage crossed between two terminals. Different from linear capacitance provided by analog control, the capacitor-DAC can provides good controllability and better insensitivity to process variation. Figure 3.13(b) shows the implementation example for a capacitor-DAC. In general, this configuration is very suitable for digital compensation schemes without static power consumption. However, the capacitance added at the output nodes or X-nodes also slow down the operation speed at the same time. To cover both large adjustable range and small step size, the DAC may have higher resolution, which is a complicated analog circuit. To solve this issue, combination of  $G_m$  and  $C$  adjustments provides a better configuration.



Figure 3.13: (a) *C*-adjustment offset compensation methods and (b) Digitally controlled capacitance implementation.

Using *Gm*-adjustment architecture to provide larger adjustable range and *C*-adjustment architecture to generate fine adjusting step size with smaller loading overhead.

### 3.3.3 O**ff**set Estimation

Offset estimation is the way to find out the comparator's input offset. The offset estimation can be based on deterministic or statistic methods. Firstly, the deterministic method is discussed. To extract the offset information, both input nodes of the input pair are connected to make a comparison, shown in Figure 3.14 (a). Two switches are inserted to achieve the input connection and isolate the input signal. The comparison result means the offset is positive or negative by output  $D_c$ . This configuration can directly extract the offset polarity, but not the amplitude of offset. But it also needs to make twice comparisons for normal and compensation purpose. Since the amount of the input offset can not be detected, the adjustment voltage step must be smaller, e.g.  $1/4V_{LSB}$ , to avoid large fluctuation range. Due to the switch off operation at the comparison phase, the charge injection of switch S2 must be considered.

$$
V_{OS} = V_{OS, \text{label}} - V_{\text{adj}} + \Delta V_Q \tag{3.28}
$$

The equivalent input offset will always include *∆V<sup>Q</sup>* since this term can not be estimated by the offset estimation mechanism. The input-referred noise also induces the wrong decision to enlarge the fluctuation range, shown in Figure 3.14 (b). The noise-induced decision error can be reduced by using an integration-and-dump circuit, which is similar to the moving average function.

Different from deterministic estimation, statistic estimation provides another way to extract the offset information. Without extra comparison, it extracts the offset information by collecting the output data with an internal digital chopper [15], shown in Figure 3.15. Two choppers are inserted at the input and output paths to achieve normal comparison results. The chopper at the input path is analog and the chopper at the output path is digital. Both of them are controlled by a random signal *q*, which is uncorrelated to the input signal. If  $q = +1$ , the passed signal is unchanged; if  $q = -1$ , the passed signal is interchanged. Considering the behavior of the random chopped comparator, the output


Figure 3.14: Deterministic offset estimation: (a) operation and (b) voltage adjustment.



Figure 3.15: Offset estimation by using statistics based detector.

*D<sup>c</sup>* can be represented as

$$
D_c = \frac{1}{2} \left[ 1 + \frac{1}{q} \times \frac{sgn(q(V_i - V_{0S,L}))}{V_R} - V_{0S,L}) \right]
$$
(3.29)

where  $sgn(x)$  is a sign function that return  $+1$  if  $x > 0$  and  $-1$  if  $x < 0$ . Since *q* is +1 or −1, Equation (3.29) can be re-written as

$$
D_c = \frac{1}{2} \left[ 1 + sgn(V_i - V_R - qV_{OS,L}) \right]
$$
 (3.30)

From above equation, input signal passes through two choppers, the signal characteristics are not changed. But the offset of the latch just passes through one chopper, which is isolated from the input signal. From Equation (3.30), *D<sup>c</sup>* can be represented as the summation of *q*-related and *q*-unrelated terms.

$$
D_c = f(V_i - V_R) + q \times f(V_{OS,L})
$$
\n(3.31)

A statistics based offset detector is built to extract the offset information with the random chopping operation. In this detector, a q-controlled digital chopper processes *D<sup>c</sup>* again to extract the offset information with the averaging function.

$$
avg(q \times D_c) = avg(f(V_{OS,L})) + avg(q \times f(V_i - V_R))
$$
\n(3.32)

$$
= avg(f(V_{OS,L}))
$$
\n(3.33)

In Equation (3.32), the first term shows the offset information is extracted, and the second term is zero since the random signal  $q$  is uncorrelated to the input signal  $V_i$  and average of *q* is zero. Using the statistic offset estimation, the offset can be canceled under normal comparison. It can save half power consumption compared with the deterministic estimation. However, the offset detector may have more transistors to implement the statistic functions, the overall power consumption may not be less than that of the deterministic estimation. The analog chopper at the input path may have speed issue if high-speed operation is necessary. It also introduces possible input-dependent error in the comparator operation. More detail analysis can be referred to [15].

## 3.4 Proposed Comparator Design

For flash and subranging ADC architectures, comparators are widely applied as the fundamental elements. Since lots of comparators are activated at the same time, to reduce individual comparator power consumption is necessary to achieve low power ADC designs. Traditional comparator architecture using pre-amplifier to suppress the input offset is effective, but it also consumes a large amount of static power. Here a latch-type comparator is proposed with an offset compensation loop to improve its input offset.

## 3.4.1 Comparator Architecture

Figure 3.16 shows the proposed comparator architecture with differential configuration. It consists a differential input stage, a regenerative latch and an offset calibration chargepump (OCCP). The input stage and OCCP are applied to doing the deterministic offset estimation. The offset compensation is using the *Gm*-adjustment function in the latch. The equivalent input offset *VOS* is represented as

$$
V_{OS} = V_{OS,L} - V_b \tag{3.34}
$$

where  $V_b$  is the compensation voltage generated from the OCCP. There are two nonoverlapping clock signals applied to achieving comparison phase ( $\phi_1 = 1$ ) and compensation phase ( $\phi_2 = 1$ ). In the input stage, there are four switches to connect and isolate



Figure 3.16: Proposed offset-calibrated comparator architecture.

the input signal for comparison and calibration phases respectively. At the comparison phase, the input and reference signals are connected to the latch's input nodes. After the activated signal  $\phi_c$  goes high, the latch determines the input signal is larger or smaller than reference signal by output signal  $D_c$  is '1' or '0'. At the compensation phase, only reference signals are connected to the latch's input nodes. The latch's output shows the equivalent input offset is positive or negative.

$$
D_c = \begin{cases} 1 & \text{if } V_{OS} > 0, \\ 0 & \text{if } V_{OS} < 0, \end{cases}
$$
 (3.35)

The proposed latch comparator is shown in Figure 3.17. There are three source-couple

#### 3.4. PROPOSED COMPARATOR DESIGN 59

pairs parallel connected to a regenerative latch. The latch's outputs are processed by two serial-connected inverters to transfer into digital outputs. These three parallel-connected pairs can be lumped as a trans-conductance amplifier. The first pair consists of transistors M1-M4 is for positive input and reference signals. The second pair consists of transistors M5-M8 is for negative input and reference signals. The third pair consists of transistors M9-M12 is for adjustment and common-mode signals. If all elements are assumed to be ideal and matched, the current difference *∆I* is defined as

$$
\Delta I = I_1 - I_2
$$
  
=  $g_{m1}(V_{ap} - V_{rp}) - g_{m5}(V_{an} - V_{rn}) - g_{m9}(V_b - V_{cm})$  (3.36)  
=  $g_m \cdot ((V_{ap} - V_{an}) - (V_{rp} - V_{rn}) - (V_b - V_{cm}))$ 

where  $g_{m1}$ ,  $g_{m5}$  and  $g_{m9}$  are the trans-conduntances of transistors M1, M5 and M9 respectively.  $g_m$  is the equivalent trans-conduntance if  $g_{m1}$ ,  $g_{m5}$  and  $g_{m9}$  are assumed to be equal to *gm*. Transistors M13-M16 are two inverters cross-coupled as a regenerative latch. The conducting currents of the regenerative latch are controlled by the lumped trans-conductance amplifier, which is activated by *φ<sup>c</sup>* . For a comparator, all mismatches between transistors can be lumped as the input offset.

For the proposed latch-type comparator, three issues are considered: input offset, kickback noise and input common-mode variation. In this thesis, the input offset is compensated by the proposed offset compensation loop. The kickback noise and input commonmode variation is reduced by the circuit design. In the following subsections, all of them are discussed by qualitative analysis.

### 3.4.2 O**ff**set Cancellation

In Figure 3.16, to cancel the input offset, third source-couple pair (M9-M12) is added to implement the  $G_m$ -adjustment function. At the compensation phase, the input switches, S2 and S4, are closed. The OCCP receives the output signals  $(D_c \text{ and } \overline{D}_c)$  which represents the polarity of the input offset. According to  $D_c$  and  $\bar{D}_c$ , the proposed charge-pump circuit will charge or discharge the capacitor  $C_b$  to change the voltage of  $V_b$ . The voltage step is determined by the currents  $(I_p \text{ and } I_n)$ , capacitance of  $C_b$  and switch-on time



(*Tb*) of switches (S5 and S6) in the OCCP. Roughly speaking, the voltage step *∆V* is approximated as

$$
\Delta V \approx \frac{I \cdot T_b}{C_b} \tag{3.37}
$$

where  $I_p = I_n = I$ . To avoid large fluctuation range of  $V_b$ , the voltage step should be small enough, for example 0.1  $V_{LSB}$  or smaller. However, smaller voltage step also induces slower tracking speed for the input offset due to temperature variation. Variable voltage step for different offset deviation can be achieved by using the variable charging or discharging time  $T_b$ , as shown in Figure 3.18. With this mechanism, at the moment of larger offset deviation, the voltage step is larger. On the other hand, for smaller offset deviation, the voltage step is determined by the noise-induced error or metastability error.

The simulation result for a latch comparator with OCCP is shown in Figure 3.19. The initial condition of the offset-adjusting voltage  $V<sub>b</sub>$  is 0.5V. Here a large deviation of output capacitances is assumed that *∆CO/C<sup>O</sup>* is 33 percent. From the beginning, the voltage step on  $V_b$  is large due to larger input offset. While  $V_b$  is close to the target value, the voltage



Figure 3.18: Variable adjustment voltage step to suppress the fluctuation range.



Figure 3.19: Simulation result for a latch comparator with OCCP.

step becomes smaller. The resultant input offset voltage is about 47 mV. After  $V_b$  settlled, the peak-to-peak fluctuation range is only 2 mV.

### 3.4.3 Kickback Noise

For a comparator, the kickback noise is due to the large voltage variations at short time interval on its internal nodes coupling back to the comparator's input. The coupling path is the parasitic capacitance of the transistors. If the input nodes of the comparator have higher impedance, the node voltage is disturbed by the kickback noise to degrade the accuracy of reference voltages.

Some kickback noise reduction techniques were proposed, shown in Figure 3.20. The most common technique to reduce the kickback noise is to add a pre-amplifier before the latch, mentioned in Section 3.2. But this also introduces static power consumption, contrast to the low power design target. A neutralization technique [59], shown in Figure 3.20 (a) was proposed to a latch-comparator, but it only achieved moderate improvements. This technique uses the equivalent capacitance  $C_N$  to provide a negative path to cancel the coupling path due to the parasitic capacitance between gate and drain nodes  $C_{GD}$ . However, for the nanometer CMOS transistors, the mismatch between  $C_{GD}$  and  $C_N$ may cause large deviation of the capacitance, specially for much smaller transistors used in the regenerative latch to achieve low power consumption at high speed operation.

In [60], two techniques were proposed to reduce the kickback noise. Both techniques uses isolated input switches to eliminate the coupling paths between input and regeneration nodes. Figure 3.20 (b) shows the isolation-1 configuration which is applied to a class-AB comparator. The transistors M1 and M2 are used for the neutralization technique. The transistors M3-M6 is applied to providing the isolation and maintain the circuit operation. Figure 3.20 (c) is the isolation-2 configuration which can be applied to any latch-comparator. Similar to isolation-1, it uses the transistors M1-M8 to provide the isolation and eliminate the memory effect due to isolation switches. However, isolation-1 configuration is not suitable for a latch-only comparator. Isolation-2 configuration uses more transistors to achieve the isolation, but slow down the operating speed. Moreover, to cooperate with the offset compensation loop, it complicates the clock timing requirement.









Figure 3.20: Kickback noise reduction techniques. (a) Neutralization. (b) Isolation-1. (c) Isolation-2.

Here a simple configuration is proposed to reduce the kickback noise. Figure 3.21 shows the proposed kickback noise reduction technique. A bias-controlled nMOS transistor is serial-connected with the clock controlled nMOS transistor. This configuration suppresses the voltage variation in a short time interval at the drain nodes of the input transistors, which is equivalent to provide a low-pass filter for the kickback noise. Figure 3.22 shows the kickback noise on the reference voltage for different *Vbx* conditions with 1 K*Ω* resistor connection. Without the bias-controlled nMOS transistor, the kickback noise is about 4.5 mV. With the bias-controlled nMOS transistor and  $V_{bx}$  is 0.5 V, the kickback noise is greatly improved to 1.7 mV. Another advantage is that this bias-controlled transistor provides the capability to suppress the input dynamic offset voltage which caused by the input common-mode voltage variation. The only drawback of this configuration is to provide a smaller tail current, which causes a longer comparison time. However, it is not a concern for the ADC operation speed in this thesis.

## 3.5 Summary



Offset feedback compensation techniques provide another way to compensate the input offset by using a feedback control loop. Using these techniques, pre-amplifiers can be removed to save larger power dissipation. The feedback compensation technique consists of offset estimation and offset compensation. Analog estimation uses shorted inputs and extra comparison cycle to sense the polarity of the input offset, represented by its digital output. Different from analog way, digital estimation applies the statistics method to estimate the input offset by using a random input chopper. The compensation can be applied to adjusting the difference of capacitance on the output nodes, or the difference of the



Figure 3.21: (a) Traditional design and (b) proposed design for kickback noise.



Figure 3.22: Kickback noise for different *Vbx* conditions with 1 K*Ω* resistor connection.

#### $3.5.$  SUMMARY 67

current on the output nodes, or the body-bias voltage of the differential pair.

Without adding extra capacitive loading on the output nodes, the proposed offset compensation using extra differential pair to adjust the conducting currents. Input shorted switches provide a simpler way to estimate the offset polarity. To avoid the kickback noise, a serial-connected nMOS transistor is added to provide the kickback noise a lowpass filter to suppress the noise amount. The dynamic offset due to the input commonmode variation can also be reduced. With the proposed latch-type comparator with offset compensation loop, low power consumption can be easily achieved. It is suitable to apply the proposed comparators into a flash ADC to meet the high-speed and low power specification. For two-step or subranging ADCs, the sub-ADCs can also apply the proposed comparators to achieve the required performance under low power dissipation.

Except input offset and kickback noise, the comparison time and input-referred noise are also important for comparator usage in ADC architectures. A simplified comparator modeling is illustrated in Appendix B. Appendix B.1 describes the comparison time for a regenerative latch with analytical representation. Understanding the comparison speed of the proposed comparator is important to realize its timing constraint in the ADC architectures. Appendix B.2 describes the noise analysis using the stochastic differential equations [61]. It is important for high-resolution comparator-based ADC with smaller input dynamic range. By the input-referred noise of the comparator, the comparator resolution can be roughly estimated.



# Chapter 4

# Nonlinearity Calibration Techniques

## 4.1 Introduction

**WILLIV** For the amplifier-based ADC architectures, such as pipeline ADC and two-step ADC, the residue amplifier plays an important role to maintain the overall ADC linearity. In general, there are three ways to implement the residue amplifier: (a) opamp with feedback resistor, (b) opamp with feedback capacitor and (c) open-loop amplifier. Their simplified single-ended versions are shown in Figure 4.1. For configuration (a) or (b), the gain of the residue amplifier is determined by matching of  $R_1$  and  $R_2$  or by matching of  $C_1$  and *C*<sup>2</sup> if opamp is assumed to have infinite gain. For configuration (c), its gain is determined by matching of trans-conductances for transistors M1 and M2. With careful circuit design and layout techniques, the opamp can have enough gain and the matching can be maintained with certain accuracy. However, with the scaled CMOS VLSI technologies, the transistors have lower intrinsic gain to maintain the opamp requirements. The opamp circuits are complicated to adapt for the issues of low supply voltage and low intrinsic gain. The extra power dissipation is spent on the circuitry stability, such as multi-stage opamp design.

Some analog circuit techniques [62, 63] were proposed to treat these imperfections. The capacitor error-averaging technique can reduce the mismatch requirement between the sampling capacitors, but it also slows down the conversion speed by the complicated circuitry penalty. To improve these issues, considerable effort has been devoted to the



Figure 4.1: Residue amplifier by using (a) opamp with feedback resistor, (b) opamp with feedback capacitor and (c) open-loop amplifier.

low-power data converters by using the calibration techniques. The analog calibration techniques require separate calibration DACs and high precision analog components to compensate for gain error of the residue amplifier [64]. It requires high-performance analog circuits to improve the circuit imperfection. This still consumes a large amount of power. To effectively improve the power dissipation by the benefit of CMOS device scaling, the digital calibration techniques perform better power efficiency.

Different from analog calibration techniques, digital calibration techniques do not attempt to fix the circuit imperfections but correct the errors in the digital domain. Digital calibration can be further classified into two categories: (1) foreground calibration and (2) background calibration. The foreground calibration techniques operate during the system power-up or standby status. These techniques can not track possible variations due to temperature variation, supply voltage drift and device aging, since they are not activated during the system operation. To improve this, periodically switching the normal operation and calibration operation is possible. However, it still interrupts the normal operation for the ADCs. Different from foreground calibration, background calibration techniques are always activated without interrupting the normal operation. These variations can be observed to generate correct output data by continuous calibration operation. The power consumption for the calibration shceme must be concerned. In general, the extra power consumption should be less enough to improve the overall ADC power consumption.

In this chapter, Section 4.2 introduces calibration techniques and their characteristics. These calibration techniques can relax the requirement of analog circuits. Some nonlinear calibration techniques are briefly discussed in Section 4.3. These nonlinear calibration schemes can tolerate the circuit imperfections on the scaled CMOS technologies. Section 4.4 describes the proposed nonlinear background calibration scheme and its related detail analysis. Section 4.5 draws a summary of the calibration techniques.

# 4.2 Calibration Techniques

The scaled CMOS transistors deteriorate the analog circuits used in the residue amplifier. For those amplifier-based ADC architectures, the non-idealities of the residue amplifier result in poor ADC performance. In recent years, the calibration techniques are applied to



Figure 4.2: The categories of calibration techniques.

improving the ADCs. The calibration techniques need to observe these non-idealities and compensate them to get the corrected output data. If these errors are hard to correct, these calibration techniques may not have enough capability to achieve good results. Therefore, the capability of the calibration techniques is also an important index to evaluate its strength. In general, most calibration techniques can be characterized with three aspects: estimation method, compensation method and capability, shown in Figure 4.2.

## 4.2.1 Estimation

The estimation can be implemented by two methods: foreground estimation and background estimation. The foreground estimation is to operate at the system power-up or standby status [64, 65]. A simple foreground estimation is illustrated in Figure 4.3. At the calibration phase (CAL=1), the input MUX is switched to connect the ADC input to the



Figure 4.3: Foreground estimation for A/D conversion.

calibration input. A high resolution DAC is applied as the input source  $V_c$ . The calibration input waveform (e.g. ramp function) is quantized by first stage sub-ADC to yield *D*1. The residue signal,  $V_c - V_{da}$ , is then amplified by the residue amplifier to generate  $V_2$  for the backend stages (Z-ADC). For pipelined architecture, the Z-ADC means the reminder stages in the ADC. For every stage, the input MUX can be added to do the calibration. For two-step ADC, the Z-ADC means the fine ADC. The Z-ADC then quantize the signal  $V_2$  to yield  $D_z$ . In the DEC,  $D_1$  and  $D_z$  are encoded to yield the output code  $D_0$ . If the calibration input is a ramp function, the collected data is actually the transfer curve represented in the digital format. All non-idealities of the residue amplifier, including the offset, gain error and nonlinearities can be extracted from the collected data. After the extraction, these data can be stored into the RAM to compensate the output code *D<sup>o</sup>* at normal operation.

The foreground estimation can greatly improve the ADC performance if these nonidealities are unchanged. However, these non-idealities actually vary with the temperature variation, supply voltage drift and device aging. Considering the time-varying nonidealities, foreground estimation can not observe them during ADC normal operation. The corrected data is not exact while these non-idealities are changed.

In contrast to the foreground estimation, the background estimation can operate without interrupting ADC normal operation. Figure 4.4 shows a correlation-based background



Figure 4.4: Correlation-based background estimation for A/D conversion.

gain calibration to estimate the gain error of the ADC [66]. Correlation-based background estimation is popularly applied in the A/D converters by using simple digital hardware to observe the non-idealities of the residue amplifier. In Figure 4.4, the main blocks are: the ADC under estimation, a 1-bit DAC, a pseudo-random number generator (PRNG), a digital multiplier with adjustable gain *G* and a digital accumulator (ACC). The gains of ADC and 1-bit DAC are modeled as  $G_a$  and  $G_q$  respectively. The random sequence q generated by the PRNG is binary and white. It is zero mean and uncorrelated with the input *V<sup>i</sup>* . The random signal *q* is added to the input of the ADC with gain of  $G_q$ . The ADC's output code is multiplied by *G* and then subtracted by q to become the corrected output *Do*.

$$
D_o = G_a G V_i + q \times (G_q G_a G - 1) \tag{4.1}
$$

The digital accumulator ACC generates the compensated gain *G* by the following equation,

$$
G[k+1] = G[k] - q \times D_o \tag{4.2}
$$

$$
= G[k] + 1 - G_q G_a G[k] - q \times (G_a G \cdot V_i)
$$
\n
$$
(4.3)
$$

From above equation, the input signal  $V_i$  can be separated from the compensation coefficient *G*[*k*]. With the averaging function, the q-related term is removed.

$$
E[G[k+1]] = E[G[k]] + 1 - G_q G_a \cdot E[G[k]] \tag{4.4}
$$

#### 4.2. CALIBRATION TECHNIQUES 75

If the iterative equation is convergent,  $E[G[\infty + 1]] = E[G[\infty]]$ . Equation (4.4) can be concluded that

$$
E[G[k]] = \frac{1}{G_a G_q} \tag{4.5}
$$

Since the steady-state value of  $G[k]$  is constant,  $G[k] = E[G[k]]$ .

$$
G[k] = \frac{1}{G_a G_q} \tag{4.6}
$$

Substituting Equation (4.6) into Equation (4.1), the output code  $D<sub>o</sub>$  can be rewritten as

$$
D_o = G_a G V_i = \frac{V_i}{G_q} \tag{4.7}
$$

 $G_q$  is a known parameter, the ADC actual gain  $G_q$  can be estimated. Without any interruption to the ADC normal operation, the estimation can be done in the background. However, the correlation-based background estimation still suffers two issues: one is the necessary extra input range and the other is the trade-off between convergence and calibration time.

Another example of background estimation is using a high-resolution, low-speed reference ADC to estimate the ADC's non-idealities [67], shown in Figure 4.5 (a). This is an example of pipelined ADC architecture with a 'reference-based' estimation. The SHA samples the analog input at the sampling rate  $f_s$ . The ADC also operates at the same rate as SHA to dump all stages' output, called 'raw code'. The reference ADC operates at much lower sampling rate  $f_s/M$  to generate the reference output. Both raw code and reference output are passed into a digital post processor to yield the compensated output  $D_c^c$ . The digital post processor includes an background estimation and a digital correction, shown in Figure 4.5 (b). The corrected output  $D_c^c$  is the summation of  $D_o$  and  $D_e$ , which is the error estimation output. In the background estimation, *D<sup>o</sup>* is down-sampling by *M* and then subtract the reference output to generate the error information, *err*. In the digital error estimation (DEE), this error is used to update the parameters used in the DEE to minimize the mean-squared error (MSE),  $E(err^2)$ , which is the average value of err<sup>2</sup>. The least mean square (LMS) algorithm is applied to achieving this error estimation. Finally the compensated output  $D^c$  approaches the reference output in steady state. Compared with the correlation-based estimation, this reference-based estimation does not



Figure 4.5: Background estimation with reference-ADC for A/D conversion (a) block diagram and (b) digital post processor.





Figure 4.6: Analog compensation methods for (a) circuit adjustment and (b) inverse function.

need extra input range. However, a high-resolution low-speed reference ADC is the overhead to increase the design difficulty. Another issue is the necessity to use a dedicated SHA. For scaled CMOS technologies, the dedicated SHA is a power consuming circuit. It also limits the input dynamic range, which is a severe issue at low supply voltage.

## 4.2.2 Compensation

In Figure 4.2, the compensation can be implemented by analog or digital methods. Analog compensation is broadly applied into the comparator-based ADC architectures, such as flash ADCs [16] or SAR ADCs [57]. Ideally, analog compensation is the best way to generate the amplified residue signal without any sacrifice, such as SNR loss due to reduced output range. Figure 4.6 shows two ways to implement the analog compensation for the first stage of the pipelined ADC architecture. The Z-ADC is the representation of the overall preceding stages.

Figure 4.6 (a) shows the compensation using circuit adjustment. Circuit adjustment is using adjustable elements, such as capacitance, resistance, current or transistor size, to compensate the error terms. For example, in Figure 4.1 (a), the mismatch between  $R_1$  and  $R_2$  induces a gain error. We may adjust the resistance of  $R_1$  or  $R_2$  to reduce the gain error. Different from circuit adjustment, the analog inverse function is another way to compensate the non-idealities of the residue amplifier, shown in Figure 4.6 (b). If an ideal inverse function  $f^{-1}()$  is applied after the residue amplifier, all non-idealities can be exactly removed. However, such ideal inverse function is impossible to implement with analog circuits.

Actually, for the nanoscale CMOS technologies, using analog compensation is not a good idea. Because the characteristics of the nanometer transistors are difficult to control, extra overhead reduces the possibility to use analog compensation. Digital compensations are commonly applied to A/D conversion with digital estimation schemes [65, 68, 4]. Different from analog compensation, it compensates the non-idealities of the residue amplifier in the digital domain. For scaled CMOS technologies, digital compensation also takes the advantage of scaled power consumption.

Figure 4.7 shows the digital compensation method for the amplifier-based ADC architectures. The distorted residue signal  $V_2$  is quantized by the Z-ADC to yield  $D_z$ , which include the digital representation of the non-idealities.  $D_z$  is then processed by the digital compensation block to yield the compensated output  $D_z^c$ . With the first stage quantization result  $D_1$  and the compensated preceding stages' output  $D_z^c$ , the ADC's output  $D_o$  is then generated by the digital error correction.

Using digital compensation can avoid the modification of analog circuits, which increase the design difficulty. However, due to the distorted output of the residue amplifier, the SNR loss can not be avoided. For most pipelined ADC architectures, extra stages and larger sampling capacitance are applied to improving their SNR performance.



Considering the amplifier-based ADC architectures, the residue amplifier has three nonidealities: (1) input offset, (2) gain/sub-DAC error and (3) nonlinearity. In general, the input offset is not the issue since it is constant, which can be removed in the digital domain. If the offset reduction is necessary, the mentioned offset compensation techniques in Chapter 3 can be applied to achieving lower offset voltage. For some particular ADC architectures [29], the offset voltage of the residue amplifier can be suppressed by their proposed techniques. Therefore, the reminder non-idealities are gain/sub-DAC error and nonlinearity.

The capability of a calibration technique represents the dependency on the high-precision analog circuits. It is one of the characteristics for a calibration technique, shown in Figure 4.2. It can be classified into two categories: (1) linear calibration and (2) nonlinear calibration.

Most prior designs [65, 68, 69, 4] are linear calibration techniques which mainly correct the overall linearity caused by gain/sub-DAC error. These linear calibration techniques need the amplifier to maintain certain linearity. A pipelined ADC architecture with 1.5-bit/stage configuration can be an example, shown in Figure 4.8. During  $\phi_1 = 1$ , the capacitors  $C_s$  and  $C_f$  samples the last stage output signal  $V_j$ . During  $\phi_2 = 1$ , the MDAC amplifies the residue to generate j-th stage output signal  $V_{j+1}$ . In Figure 4.8, the transfer function of MDAC can be represented as

$$
V_{j+1} = \hat{G}_j \times \left( V_j - \hat{V}_j^{da}(D_j) - V_j^{os} \right)
$$
 (4.8)

and

$$
\hat{G}_j = \left(\frac{C_s + C_f}{C_f}\right) \cdot \frac{1}{1 + \frac{1}{\beta \cdot A_0}}
$$
\n(4.9)

$$
\beta = \frac{C_f}{C_s + C_f + C_p} \tag{4.10}
$$

$$
V_j^{da}(D_j) = \left(\frac{C_s}{C_s + C_f}\right) V_r \times D_j \tag{4.11}
$$

where  $A_0$  is the opamp finite gain,  $V_f$  is the reference voltage and  $D_j \in \{-1, 0, +1\}$  is the quantization result from sub-ADC. The input offset  $V_i^{os}$ *j* is ignored for the following description. The ideal transfer curve is the black solid line shown in Figure 4.8 if the following conditions are satisfied: (1)  $C_s = C_f$ , (2)  $C_p = 0$  and (3)  $A_0$  is infinite. If one of them is not satisfied, the gain error will change the transfer curve, which is the red dash line in Figure 4.8. The magnitude of j-th stage's transition height  $R_i$  represents the gain and sub-DAC errors if  $R_j \neq V_r$ . The stage gain/sub-DAC error causes the linearity problem for the ADC output after post encoding logic, as mentioned in Section 2.7. Actually, if the transition height can be measured and compensated in the post encoding logic, the reduced available number of output codes, caused by the gain error, can be improved by adding extra pipelined stages.

One of the linear calibration techniques is to measure the transition height  $R_i$  by the Z-ADC with the correlation-based estimation [4]. To achieve the background calibration without interrupting the normal A/D operation, the switched-capacitor network is modified from conventional, shown in Figure 4.9 (a), to split capacitor version, shown in Figure 4.9 (b). The capacitor  $C_s$  is split into N fragments such that

$$
C_s = C_{s,1} + C_{s,2} + \cdots + C_{s,N} \tag{4.12}
$$



Figure 4.8: Pipelined stage with 1.5-bit configuration and the transfer curve of MDAC.



Figure 4.9: Radix-2 1.5-b switched-capacitor network: (a) conventional, (b) background calibration and (c) q-control transfer curve for N=4.

At every conversion, only one of N capacitors is connected to  $V_r \times q$ , and other capacitors are connected to  $V_r \times D_j$ . The *q* signal is a digital binary-valued sequence generated from a pseudo random number generator. To measure  $R_i(+1)$ , the value of q alternates between +1 and 0. To measure  $R_j(-1)$ , the value of q alternates between -1 and 0. The split capacitor configuration can avoid the saturation issue by providing enough headroom and smaller extra output range. To alternate the connection of  $C_{s,i}$  and  $V_r \times q$ , the  $D_j$  weight relative to  $C_{s,i}$  can be obtained and stored for encoding usage. After encoding, the gain and sub-DAC errors in the MDAC can be recovered by this calibration technique.

Other linear calibration schemes also provide good techniques to achieve high-resolution ADC performance. [68] proposed a 'DEC+GEC' background calibration to cancel sub-DAC noise and correct the inter-stage gain error. To implement GEC, a small pseudorandom voltage amount is injected into sub-DAC. The Z-ADC output code  $D<sub>z</sub>$  contains amplified injection data which also has gain error. According to the collection of *Dz*, the gain error information can be extracted. The GEC technique is a correlation-based calibration scheme to achieve the background calibration without interrupting the ADC normal operation.

[69] proposed a LMS adaptive background calibration by using a slow but accurate ADC to remove the effect of component errors including capacitor mismatch, finite opamp gain, op-amp offset and sampling-switch-induced offset. This approach can be implemented without interrupting the ADC normal operation, but the reference ADC is not easy to achieve for scaled CMOS technologies. Using extra high accurate ADC just like the time-interleaved architecture, which need to consider the gain, offset and timing skew issues.

Although many calibration techniques are proposed to solve the gain/sub-DAC error problem, scaled CMOS technologies bring more severe issue: the nonlinearity of the residue amplifier can not be easily maintained with low power consumption. The capability of digital calibration technique needs more powerful functions to adapt for future thinner CMOS technologies. In the following sections, several prior nonlinear calibration techniques are discussed briefly, and then the proposed nonlinear calibration scheme will be introduced with detail analysis.

## 4.3 Prior Nonlinear Calibration Schemes

For the amplifier-based ADC architectures, the first stage residue amplifier has the highest resolution requirement. Here the first stage residue amplifier and its transfer curve are shown in Figure 4.10. If more than two stages are used, except the first stage, other preceding stages can be lumped as a 'Z-ADC'. If residue-amplifier is ideal with gain of  $A_R$ , the transfer function can be represented as

$$
V_2 = A_R \cdot (V_1 - V_{da}) \tag{4.13}
$$

where  $V_{da}$  is determined by the output of sub-ADC,  $D_1$ .

For fast CMOS VLSI scaling trend, the opamp design is difficult to meet high-speed



Figure 4.10: The non-idealities of the residue amplifier.

and high-resolution requirements at low power consumption. The amplifier-based ADC architectures need to adapt for the non-idealities of the analog circuits. The non-idealities of the residue amplifier cause Equation (4.13) to be

$$
V_2 = V_{OS,R} + f(V_1 - V_{da})
$$
\n(4.14)

where  $V_{OS,R}$  is the offset of the residue amplifier and  $f()$  represents the gain error and nonlinearity of the transfer curve. If the gain error and nonlinearity of the residue amplifier are not compensated, after digital error encoding, the ADC's output code *D<sup>o</sup>* will have distortion, shown in Figure 4.11. Figure 4.11 (a) shows the transfer curve between  $V_2$  and  $V_1$ , which is repeated for every different  $D_1$  code. After encoding, if the calibration is not applied, the output code *D<sup>o</sup>* will have large DNL and INL errors, shown in Figure 4.11 (b). However, if prior linear calibration techniques are applied, *D<sup>o</sup>* can achieve better DNL, but the INL performs not good enough, shown in Figure 4.11 (c). It causes ADC a worse SFDR performance.

Considering the non-idealities of a residue amplifier in ADC architectures, the digital calibration techniques can provide powerful functions to improve ADC performance with low power dissipation. Several digital calibration schemes have been proposed to correct both gain/sub-DAC error and nonlinearity of a residue amplifier [70, 71, 72, 32]. All of these calibration techniques are applied to improving the pipelined ADC architecture.

[71] calibrates the gain/sub-DAC error and nonlinearity by using two different clock frequencies to provide the timing slot for calibration input quantization. [72] applies many calibration input levels to measuring the characteristics of the residue amplifier's transfer curve. According to the LMS algorithm, the error amount can be compensated. Both [71] and [72] are foreground calibration schemes. To enable background calibration, [71] requires a sample-and-hold circuit with two asynchronous sampling clock signals, and [72] requires an interpolation filter which causes longer output latency and limits the bandwidth of the ADC input. [70] is a histogram-based background calibration schemes. It requires a busy input to be effective. Additional comparators for stage under calibration are necessary to generate the calibration information. [32] is a correlation-based background calibration scheme. To calibrate the non-idealities of residue amplifier, the randomized calibration input is injected into the sub-DAC. The addition of random input reduces the



Figure 4.11: (a) Nonlinearity of residue amplifier and its induced errors (b) without calibration and (c) with linear calibration.

output dynamic range. In the following sections, these four calibration techniques are discussed in brief description.

## 4.3.1 Redundant Residue Calibration Technique

To correct the nonlinearity of the residue amplifier, [70] applied a redundant residue mode to estimate the nonlinearity in the background. Figure 4.12 (a) shows the simplified ADC block diagram. The first stage with 1.5-bit/stage MDAC configuration is applied as an example to describe the calibration technique. The sub-ADC operates with extra one bit resolution by using redundant comparators. To complete the estimation, a simple digital logic is inserted to control the sub-DAC with sub-ADC output  $D_1$  and control signal MODE. The control signal MODE is generated from the post-processor, which collects the output  $D_1$  and Z-ADC output  $D_2$  to do the estimation and compensation. There are two residues, one is for MODE=0 and the other is for MODE=1, shown in Figure 4.12 (b). If the residue is ideal, without any nonlinearity, the distance (*h*) between two residues is constant, which is independent on the value of  $V_1$ . However, if the residue amplifier is not ideal but with nonlinearity, the distance  $H_1$  is different from  $H_2$  for variable  $V_1$  inputs  $V_a$  and  $V_b$  respectively. Figure 4.12 (c) shows these two residues with the nonlinearity. The shadow area represents the lost information caused by the distortion of the residue amplifier. If  $H_1$  and  $H_2$  can be obtained, the compensation factor  $p_2$  can be applied to correcting the Z-ADC output  $D_z$  in the post-processor.

The above method is applied if the input  $V_1$  can be assigned at the central value  $V_a$ and boundary value  $V_b$ . However, input assignment can not operate in the background. To avoid this, a histogram-based estimation technique was proposed in [70]. The distance estimation process is based on evaluating cumulative histograms of the corrected Z-ADC output  $D_z^c$  in the post-processor. The cumulative histogram count  $(CH(x))$  is collected for all output code  $D_z^c$  which is less than or equal to the code x. The control signal MODE is generated from a binary random sequence with equal probability for zero and one. The cumulative histogram counts for different two residues at same input  $V_1$  are expressed as  $CH(q)$  and  $CH(r)$  for MODE=0 and MODE=1 respectively. One important feature of the redundant residue calibration is that the count for arbitrary input  $V_1$  does not change



Figure 4.12: Redundant residue calibration (a) block diagram, (b) two residue modes and (c) redundant residues with nonlinearity.

due to MODE switching. If no MODE switching, only *CH*(*q*) is counted to have *n* counts. Ideally, if MODE is switching with equal probability for zero and one, the count of  $CH(q)$  is equal to that of  $CH(r)$ , which is equal to  $n/2$ . But due to randomness in the modulation, particular outcomes will vary and most often not result in a perfect *n/*2 split. However, the neighbor cumulative histogram counts for  $CH(r)$  are also collected with a large number of samples to suppress the error due to randomness. From the closest match, the distance estimate  $H_1$  is obtained. For the example shown in Figure 4.12 (c), the distance  $H_1$  is equal to  $(r - 1 - q)$ . It can be shown that  $H_1$  is an asymptotically unbiased estimate of the true residue distance  $H_1$ , i.e., for increasingly large sample sizes, the estimate approaches the true value. Actually, the variance of  $H_1$  is approximately inversely proportional to the total number of samples processed by the counter evaluation.

Similar to  $H_1$ ,  $H_2$  can also be obtained with the cumulative histogram counts. For the boundary input with MODE switching, the distance  $H_2$  of the corrected code  $D_z^c$  is estimated by the statistics based estimation technique. Figure 4.13 shows the post-processor which was proposed by [70]. The Z-ADC output  $D_z^c$  is corrected by the following equation,

$$
D_z^c = D_z + e(D_z, p_2)
$$
 (4.15)

where the error term  $e(D_z, p_2)$  represents the distorted amount, which is generated by the look-up table. The distances  $H_1$  and  $H_2$  are estimated by the corrected data  $D_z^c$ . If the distance  $H_1$  is different from  $H_2$ , the nonlinearity caused error is still not yet compensated with correct coefficient  $p_2$ . With the LMS algorithm, the coefficient  $p_2$  can be estimated asymptotically to achieve the equality:  $H_1 = H_2$ . To correct the gain error, the coefficient  $p_1$  is also estimated by the LMS algorithm to achieve the equality:  $H_2 = h_{ideal}$  [65]. If both equalities are achieved, the distances  $H_1$  and  $H_2$  are half a transition height, which is the digital representation of *Vr/*2.

The redundant residue calibration technique provides a possible implementation using a simple residue amplifier, such as the open-loop amplifier [70], to adapt for scaled CMOS technologies. However, there are two constraints necessary to be concerned. One is the assumption of a busy input. The calibration algorithm fails if the input is not sufficiently "busy" around the input voltages at which the distance estimates are taken. If cumulative histogram of the corrected output  $D_z^c$  is "flat" around the estimated input voltages, the



Figure 4.13: Digital post-processor for the redundant residue calibration.

neighborhood codes will be extended widely to result in the wrong distances. The other is the calibration time. The trade-off between the accuracy and tracking time constants in the LMS loops. Bounds on the tolerable variance in the correction parameters necessitate small loop coefficients  $\mu_1$  and  $\mu_2$ , which in turn limits the achievable tracking speed. A minor issue is that the calibration technique needs an extra one-bit resolution for the sub-ADC. The extra resolution doubles number of comparators used in the sub-ADC. For pipelined architecture, it is not an issue since the resolution of the sub-ADC is low. But for two-step architecture, the extra resolution for the sub-ADC causes additional power dissipation.

## 4.3.2 Boostrapped Digital Calibration Technique

The bootstrapped calibration technique inserts a calibration input in the specific period to measure the necessary data. Since the calibration is not based on the statistics computation, its calibration time is much shorter. Figure 4.14 (a) shows the way to insert the calibration input to the ADC [71]. There are two input signals selected by a MUX, which is controlled by CAL signal. One input signal is the real input which is processed by a


Figure 4.14: Bootstrapped calibration technique (a) block diagram, (b) gain error correction and (c) nonlinearity correction.

queue operating at sampling rate *f<sup>s</sup>* . The queue consists of one or more SHAs. The other signal is generated from a calibration DAC, which generates the necessary output voltage for calibration. The real input is selected if CAL=0; the calibration input is selected if CAL=1. After the MUX, the selected signal is passed into the ADC, which operates at conversion rate  $f_c$ . To do the background calibration,  $f_c > f_s$  is necessary to give the specific timing space (CAL=1) to do the calibration without interrupting the ADC normal operation. If  $f_c = f_s$ , the calibration technique is degenerated to be a foreground calibration.

In [71], the first stage of a pipelined ADC architecture is applied to verifing the calibration technique. The general first stage's residue amplifier, shown in Figure 2.8, is modeled as

$$
V_2 = (V_1 - V_{da})(G + B \cdot V_2^2) \tag{4.16}
$$

where  $V_{da}$  is the output of the sub-DAC.  $G$  and  $B$  are two coefficients to describe the transfer function. The coefficients  $G$  and  $B$  represent its constant gain and third-order nonlinearity respectively. If  $G$  and  $B$  can be exactly estimated, the input can be correctly recovered from Equation (4.16).

Figure 4.14 (b) is a plot of the residue characteristic of a 1.5-bit stage. The constantgain estimate *G* is found by modifying a method presented in [65]. The calibration DAC generates input  $x_c$  to be  $-0.25V_r$ , which is equal to the comparator's threshold, during calibration period. The comparator output is forced both high and low in successive calibration cycles, yielding the digital outputs  $Z_1$  and  $Z_0$  by combining first stage sub-ADC output  $D_1$  and the Z-ADC output  $D_z$ . If  $Z_1$  is larger than  $Z_0$ , it means the constant G is less than one. Otherwise, the constant *G* is larger than one. Therefore, the gain estimate is updated using

$$
G_e[k+1] = G_e[k] - \mu_g \cdot (Z_1 - Z_0) \tag{4.17}
$$

The above gain estimation is also applied for the other comparator's threshold, 0*.*25*V<sup>r</sup>* . Both resultant outputs are averaged to make a first-order correction for inaccuracies caused by common-mode to differential-mode conversion [67].

Figure 4.14 (c) shows two transfer curves for ideal and nonlinear residue amplifiers. Here the ADC output  $D_0$  can be viewed as the encoding for  $D_1$  and  $D_2$  since the nested calibration process in pipelined architecture is commonly used to lump the preceding stages as the Z-ADC with certain resolution. To avoid the possible saturation output code, the calibration test inputs are set to  $\pm 0.875V_r$ , not the full-scale values. The resulting measurements output codes are  $Z_3$  and  $Z_2$ , and their difference  $Z_3 - Z_2$  is also labeled in Figure 4.14 (c). If the residue amplifier (or MDAC) is ideal, the difference between  $Z_3$  and  $Z_2$  is called *BDIST*. If is is not ideal, the difference can be measured and then compared with *BDIST* to update the estimate of *B* as follows

$$
B_e[k+1] = B_e[k] - \mu_b \cdot (Z_3 - Z_2 - BDIST) \tag{4.18}
$$

With the estimates of  $G_e$  and  $B_e$ , the input of this stage can be represented as

$$
V_{1,cal} = \frac{V_2}{G_e + B_e \cdot V_2^2} + V_{da}
$$
 (4.19)

For the digital representation, the ADC output  $D_0$  can be expressed as

$$
D_o = \frac{E[S] \cdot E[S]}{G_e + B_e \cdot D_z^2} + D_1 \cdot 2^Z
$$
\n(4.20)

where *Z* means the resolution of the *Z*-ADC. A problem with this approach is that the errors in the calibration DAC will affect the accuracy of  $x_c$ , which should have exact values  $\pm 0.875V_r$ . This will limit the accuracy of the *B* estimate. To solve this problem, the calibration is bootstrapped, which means that the DAC is used to calibrate the ADC, and the ADC is used to calibrate the DAC. Bootstrapping reduces the accuracy requirement on the calibration DAC. The detail description can be referred in [71].

This calibration technique has shorter calibration time, compared with other statisticsbased algorithm but still has some issues. First is that two clock signals  $(f_c \text{ and } f_s)$  are applied in the calibration. To achieve the background calibration, *f<sup>c</sup>* must larger than *fs* . It means a faster ADC is applied to operating at slower sampling frequency, which increase the ADC design difficulty. Second is the convergence issue for both  $G_e$  and  $B_e$ estimates. The accuracy of the calibration input  $x_c$ , the  $G_e$  estimate and the  $B_e$  estimate are cross-correlated with each other. Such co-dependency may slow down the total convergent time. Third is the digital complexity for the calibration technique. Complex digital implementation will result in the power penalty for overall ADC power consumption.



Figure 4.15: Pipelined ADC with blind LMS calibration technique.

### 4.3.3 Blind LMS Calibration Technique

Figure 4.15 shows the pipelined ADC architecture with 15 1.5-bit stages and 1-bit sub-ADC to implement a 12-bit 200-MS/s ADC [72]. The ideal stage gain is approximately equal to 1.72 to tolerate possible inaccuracies, such as comparator's offset, amplifier's offset and stage-output saturation in the pipelined ADC architecture. Considering the non-idealities of the residue amplifier for every stage, the first and second stages are modeled with gain error and nonlinearity; other 13 stages are modeled with only gain error. To compensate the stages' non-idealities, their output code  $D_1 \cdots D_{16}$  are necessary recovered with individual correct inverse functions. For first and second stages, the inverse functions are labeled as  $f_1^{-1}$  $j_1^{-1}(x)$  and  $f_2^{-1}$  $\int_2^{-1}(x)$  respectively.

$$
f_1^{-1}(D_2^c) = h_{1,1} \cdot D_2^c + h_{1,3} \cdot (D_2^c)^3 \tag{4.21}
$$

$$
f_2^{-1}(D_3^c) = h_{2,1} \cdot D_3^c + h_{2,3} \cdot (D_3^c)^3 \tag{4.22}
$$

where  $h_{1,1}$  and  $h_{1,3}$  are first-order and third-order compensation coefficients for the first stage respectively.  $h_{2,1}$  and  $h_{2,3}$  are first-order and third-order compensation coefficients for the second stage respectively. From third to fifteen stages, the inverse functions are labeled as  $h_3 \cdots h_{15}$  respectively. The corrected output codes for each stage are represented



Figure 4.16: (a) Calibration concept and (b) input-output characteristic of blind LMS calibration technique.

as follow,

$$
D_{15}^c = D_{15} + h_{15} \cdot D_{16}^c
$$
  
\n
$$
D_{14}^c = D_{14} + h_{14} \cdot D_{15}^c
$$
  
\n...  
\n
$$
D_3^c = D_3 + h_3 \cdot D_4^c
$$
  
\n
$$
D_2^c = D_2 + h_{2,1} \cdot D_3^c + h_{2,3} \cdot (D_3^c)^3
$$
  
\n
$$
D_0 = D_1 + h_{1,1} \cdot D_2^c + h_{1,3} \cdot (D_2^c)^3
$$
  
\n(4.23)

All compensations are implemented in the digital domain. These compensation coefficients are estimated by a LMS algorithm concurrently, not by using ideal backend stage as a Z-ADC.

Figure 4.16 (a) illustrates the calibration concept in the foreground. There are three inputs  $\Delta V$ ,  $V_{1,j}$  and  $V_{1,j} + \Delta V$  applied to the ADC.  $\Delta V$  denotes a relatively small increment, e.g., 64 LSB. After ADC quantization, the output codes are  $D_{o,0}$ ,  $D_{o,j}$  and  $D'_{o,j}$  respectively. For the ideal transfer curve between  $V_1$  and  $D_0$ , all three outputs lie on a straight line

and  $D'_{o,j} - D_{o,j} = D_{o,0}$ , suggesting that the difference between  $D'_{o,j} - D_{o,j}$  and  $D_{o,0}$  can serve as the error to be minimized. If the cost function is defined as  $e^2 = [(D'_{o,j} - D_{o,j}) - D_{o,0}]^2$  for different *j*, it is expected that minimizing it moves  $D_{o,0}$  to the ideal value and hence  $D_{o,j}$ and  $D'_{o,j}$  (for each *j*) also to their ideal values, as shown in Figure 4.16 (b). However, only three output codes are not enough to describe this transfer curve since it is constructed by 16 pipelined stages. To achieve a small integral nonlinearity, *INLmax*, the spacing between these inputs  $V_{1,j}$  must remain a certain amount. In [72], the spacing is equal to 32 LSB. Therefore, for a 12-bit ADC, there are 256 input levels,  $V_{1,j}$  and  $V_{1,j} + \Delta V$  for  $j = 1 \cdots 128$ , assigned to achieve the calibration scheme. The mean square error is then expressed as

$$
e_{MSE}^2 = \frac{1}{128} \sum_{j=1}^{128} \left[ \left( D'_{o,j} - D_{o,j} \right) - \left( D_{o,0} - D_{os} \right) \right]^2 \tag{4.24}
$$

where  $D_{\alpha s}$  represents the offset of  $D_{\alpha}$ , which can be easily exacted from the collected **VILLEY** output codes.

This calibration technique basically operates in the foreground, which is not able to track the possible variation for these stage residue amplifiers due to temperature variation, device aging or supply voltage drift. To operate in the background, an input sample is occasionally skipped to apply one of these calibration inputs,  $\Delta V$ ,  $V_{1,j}$  and  $V_{1,j} + \Delta V$ for  $j = 1 \cdots 128$ . The skipped sample can be reconstructed by nonlinear interpolation filter if the maximum input frequency is slightly lower than half of the sampling rate, *f<sup>s</sup>* . The interpolation filter has 122 coefficients to produce an interpolated value with 9-bit accuracy, which degrades the average SNR by 0.1 dB. The nonlinear interpolation filter is expressed as follow:

$$
D_o[0] = \sum_{k=-1}^{-122} D_o[k]C(k) + \sum_{k=1}^{122} D_o[k]C(k)
$$
 (4.25)

where  $C(k)$  is formulated in [73]. However, the interpolation filter has the penalty of complex digital hardware implementation and longer latency (In this example, it has a latency of 122 clock cycles.). The blind LMS algorithm is another issue to estimate all coefficients concurrently. The concurrent estimation has the probability to cause longer calibration time and induce conditional convergence, which are necessary to be carefully verified. For example, if *∆V <* 32LSB, calibration algorithm will be fail to converge.



Figure 4.17: Simplified representation of a 14-bit pipelined ADC with HDC applied to the first pipeline stage.

### 4.3.4 Harmonic Distortion Correction Technique

Figure 4.17 shows the simplified 14-bit pipelined ADC with HDC in the first stage. For simplicity, except for the first stage, the preceding stages are viewed as a ideal 12-bit Z-ADC. The residue amplifier is modeled as a memoryless, weakly nonlinear function of the amplifier's input voltage,

$$
f(V_1) = \sum_{i=1}^{N} a_i \cdot V_1^i
$$
 (4.26)

where  $a_1$  is the gain error coefficient and  $a_i$ ,  $i > 1$  are nonlinearity coefficients. The offset of the residue amplifier is neglected here. The 9-level sub-ADC quantize the input  $V_1$  to yield output code  $D_1$ . To estimate these non-idealities of the residue amplifier, a calibration sequence q is added to  $D_1$  code to yield the  $D_1^q$  $\frac{q}{1}$  code in the background. The  $D_1^q$ 1 code then drives the modified sub-DAC, which has 65 levels for the calibration purpose [32]. The sub-DAC is modified with dynamic element matching (DEM) method to cancel the DAC noise [74] and generate the analog calibration voltage [32]. The inserted calibration signal q controls the input estimate  $V_{da}$  with small perturbation voltage to avoid amplifier's saturation. The amplified residue  $V_2$  is quantized by the Z-ADC to yield the  $D_z$  code. After weighting the  $D_z$  code, the  $r_1$  code is the digital representation of the residue  $V_1 - V_{da}$  with distortion. To correct the non-idealities of the residue amplifier, a harmonic distortion correction (HDC) block is added to generate a compensated code, *r c*  $\frac{c}{1}$ . Neglecting the quantization error and assuming the ideal  $Z$ -ADC,  $r_1$  can be expressed as

$$
r_1 = V_{r,1} + a_1 \cdot V_{r,1} + a_3 \cdot V_{r,1}^3 \tag{4.27}
$$

The corrected output  $r_1^c$  $\frac{c}{1}$  is processed by a rough digital inverse function,

$$
r_1^c = b_1 \cdot r_1 + b_3 \cdot r_1^3 \tag{4.28}
$$

 $\approx V_{r,1}$  + fifth and higher order terms (4.29)

**WILLIA** where  $b_1 = 1/(1 + a_1)$  is the gain compensation coefficient, and  $b_3 = -a_3/(1 + a_1)^4$  is the 3rd-order nonlinearity compensation coefficient. If HDC compensates well, the added calibration sequence can be exactly canceled to yield the digital output *Do*, which is the digital representation of the input *V*<sub>1</sub>.

The digital calibration sequence  $q$  is added to the output of the sub-ADC and then translated into analog form by the sub-DAC. The translated signal *q* causes several extra terms in the digitized residue. Two of the extra terms, which are proportional to  $a_1q$  and  $a_3q^3$ , are applied to estimating the  $a_1$  and  $a_3$  coefficients. To avoid the output saturation, the translated analog form must be a relatively small magnitude. The simplest calibration sequence with these properties is a four-level sequence of the form  $q[n] = q_1[n] + q_2[n] +$  $q_3[n]$ , where three  $q_i[n]$  sequences are 2-level, independent, zero-mean pseudo-random sequences that take on the values of  $\pm A$  (in [32],  $A = \Delta/16$ ,  $\Delta$  is the step voltage of the sub-ADC). For example, with this calibration sequence, the  $a_3q^3$  term in the digitized residue contains the term  $6a_3q_1q_2q_3$ . Since  $q_1q_2q_3$  is a known, 2-level, zero-mean pseudorandom sequence that takes on the values of  $\pm A^3$  and is uncorrelated with all the other signal components in the digitized residue, it follows that the average of the product of the digitized residue and  $6a_3q_1q_2q_3$  converges to  $6A^6a_3$  regardless of the input signal to the pipelined ADC.



Figure 4.18: Block diagram of the HDC logic in the first stage.

Figure 4.18 represents the HDC implementation in the first stage to correct the gain error and nonlinearity. In HDC, three parameters are estimated by calculating the following correlations:

$$
h_1 = -\frac{1}{A^2 M} \sum_{n=0}^{M-1} \frac{s_1[n]q_1[n]}{s_1[n]q_1[n]}
$$
(4.30)

$$
h_3 = -\frac{1}{6A^6M} \sum_{n=0}^{M-1} s_1[n]q_1[n]q_2[n]q_3[n] \tag{4.31}
$$

$$
h_2 = \frac{1}{M} \sum_{n=0}^{M-1} s_1^2[n] \tag{4.32}
$$

where  $s_1[n] = r_1[n] + q[n]$  and *M* is the number of samples averaged (e.g.  $M = 2^{32}$  [32]). It can be verified that, if the residue amplifier is the only significant source of nonlinearity in the system, these correlations converge to

$$
h_1 = a_1 + \left(7A^2 + 3\overline{e_{ADC,1}^2[n]}\right)a_3\tag{4.33}
$$

$$
h_3 = a_3 \tag{4.34}
$$

$$
h_2 \approx \overline{e_{ADC,1}^2[n]} \tag{4.35}
$$

in the limit as  $M \to \infty$  regardless of the input to the pipelined ADC, where  $\bar{x}$  indicates the infinite time average operation of  $x$  and  $e_{ADC,1}$  is the quantization error of sub-ADC in the first stage. The HDC algorithm uses these correlation values to calculate the estimated coefficients required by Equation (4.28) as follow:

$$
\frac{1}{1+a_1} = \frac{1}{1+h_1 - (7A^2 + 3h_2) \cdot h_3}
$$
(4.36)

$$
\frac{a_3}{1+a_1} = \frac{n_3}{1+h_1 - (7A^2 + 3h_2) \cdot h_3} \tag{4.37}
$$

With above two coefficients' estimation, the digitized residue  $r_1^c$  $\frac{c}{1}$  can be well compensated to yield correct output code *Do*.

To correct the gain error and nonlinearity of the residue amplifier, the HDC technique is proposed by [32]. This calibration technique operates in the background without interrupting the ADC normal operation. To achieve the background calibration, a random sequence  $q$  is applied to the input estimate  $V_{da}$  with small magnitude. However, the calibration time is 129 seconds for three stages calibration. The calibration time of the HDC technique is still longer than one second at 100 MS/s sampling rate even though foreground calibration has been executed. Moreover, to applied the calibration sequence in the analog form, the sub-DAC needs to be modified as a more complicated 65-levels DAC. The complicated analog circuit is not good for scaled CMOS technologies.

### 4.4 Proposed Calibration technique

Concluding with above calibration techniques, there are several concerns necessary to be considered.

- The calibration must be implemented in the background to track the non-idealities of the residue amplifier due to temperature variation, device aging and supply voltage drift.
- The convergence of the calibration technique must be guaranteed to avoid the wrong compensation.
- The calibration technique must be independent on the input distribution.
- The hardware implementation should be simple, less power consumption.
- The calibration time should be as short as possible to provide better tracking capability.
- The calibration technique should be less dependent on the accuracy of the analog circuits.

The proposed calibration technique is necessary to meet above requirements.

To correct these non-idealities, there are three mentioned features: capability, estimation and compensation. Adapting for scaled CMOS technologies, nonlinear calibration is a necessary solution to improve the overall ADC linearity. Estimation method is also an important issue. Foreground estimation is easy to implement but only suitable for static errors. To track the time-varying errors, background estimation can provide powerful capability to maintain certain performance. There are two ways to compensate these non-idealities of the residue amplifier: analog compensation and digital compensation techniques. Although the analog compensation can maintain better SNR performance, this way is difficult to implement on nanometer CMOS technologies. In contrast, the digital compensation can be easily implemented to benefit the CMOS scaling.

In this section, a digital background nonlinear calibration technique is proposed. It can relax the accuracy requirement of the residue amplifier and achieve low power consumption on the scaled CMOS technologies.

### 4.4.1 Calibration Mechanism

Figure 4.19 shows a generic N-bit amplifier-based ADC with the proposed calibration processor which mainly consists of a signal compensator and a coefficient estimator. Here the preceding stages are well-calibrated and lumped as a Z-ADC. The hold analog input  $V_1$  is firstly quantized by the L-bit sub-ADC to yield  $D_1$ . The sub-DAC is controlled by *D*<sub>1</sub> and *q*, which is a random sequence in  $\{-1, 0, +1\}$ . The q sequence has zero mean and the probability of ' $q = 0$ ' is equal to that of ' $q = -1$  and  $q = +1$ '. The random signal *q* is applied to generating a small perturbation voltage on the estimated input signal *Vda* for background estimation purpose. After the amplification, the amplified residue  $V_2$  is then stored for the Z-ADC quantization to yield  $D_z$ . Before being passed to the DEC, the digital code *D<sup>z</sup>* is pre-processed by the proposed calibration processor to get the corrected



Figure 4.19: Proposed digital calibration processor.

output  $D_z^c$ . The codes  $D_1$  and  $D_z^c$  are then encoded to yield the final output code  $D_o$ . This calibration technique does not modify the input signal path of the ADC, can maintain maximum application flexibility for most amplifier-based ADC architectures.

The proposed calibration concept is shown in Figure 4.20. Assume the analog input  $V_1$  is fixed and the Z-ADC does not introduce quantization errors. As mentioned above, the random signal *q* determines three perturbation voltages on  $V_{da}$ :  $V_{da}$ ,  $V_{da}$  –  $V_{LSB}$  and  $V_{da} + V_{LSB}$ . If the transfer function between  $D_z^c$  and  $V_1 - V_{da}$  is ideal, the differences of these three corresponding  $D_z^c$  codes are *A*. *A* is the ideal gain of the residue amplifier. However, if the transfer function is not ideal, these differences can be applied as the estimation information for these non-idealities. Here two actual differences of the corresponding  $D_z^c$  codes are defined as  $H_1$  and  $H_2$ . If the nonlinearity of the transfer curve is existed,  $H_1 \neq H_2$ . If the gain error of the transfer curve is still existed,  $H_1 + H_2 \neq 2A$ . Therefore, two error indices are defined as follow:

$$
E_1 = H_1 + H_2 - 2A \tag{4.38}
$$

$$
E_2 = H_1 - H_2 \tag{4.39}
$$



Figure 4.20: Gain error and nonlinearity detection.

### 4.4.2 Signal Compensation

The signal compensation is implemented in the digital domain, shown in Figure 4.21. It can be represented as

$$
D_z^c = \sum_{1 \le i \le k} b_i D_z^i \tag{4.40}
$$

where k is the highest compensator order. The above equation is applied as a compensation function which is corresponding to the transfer function of the residue amplifier. If k is larger enough, the compensation function behaves like an ideal inverse function if all coefficients are exact. However, higher order compensation also indicates more complicated digital hardware implementation. In general, the value of k can be determined by the simulation results at the design stage. It is proportional to the resolution of the Z-ADC. If the resolution of the Z-ADC is lower, k is smaller. These coefficients  $b_1$ ,  $b_2$ ,  $\cdots$ ,  $b_k$  are estimated by the proposed coefficient estimator, shown in Figure 4.19.

# 4.4.3 Coe**ffi**cient Estimation

For an arbitrary analog input signal  $V_1$ , the statistics method is applied in the coefficient estimator, which is shown in Figure 4.22. The built-in averaging functions are applied to extract information from the  $D_z^c$  codes. The random signal  $q$  is applied in the background to maintain the ADC normal operation. It is uncorrelated with the input signal  $V_1$ . The coefficient estimator firstly receives the  $D_z^c$  codes from the signal compensator, and sorts the data according to the associated *q*. The estimator then averages the sorted data, and applies subtraction to acquire  $H_1$  and  $H_2$ . There are  $2^{2(N-L+1)}$  samples are accumulated to do the moving average. It also acquires the averages of  $D_z^c - qA$  and  $(D_z^c - qA)^2$ , denoted as *M* and *S* respectively. All averaging functions and square function operates at highspeed frequency, which is the same as sampling frequency. The acquired data,  $H_1$ ,  $H_2$ , *M* and *S* are updated once every  $2^{2(N-L+1)}$  samples. With  $H_1$  and  $H_2$ , the error indices,  $E_1$  and  $E_2$ , can be yield by using simple digital adders, as expressed in Equation (4.38) and Equation (4.39). Since the updating rate is much lower than the sampling rate, after the averaging functions, all other logic circuits can operate at lower clock frequency to further reduce its power dissipation.



Figure 4.22: The digital coefficient estimator.

For the coefficient estimation for  $b_1$ ,  $b_2$ ,  $\dots$ ,  $b_k$ , a Lyapunov-based estimator is proposed. Here a mean square error (MSE) index is defined as

$$
L = \frac{1}{2}E_1^2 + \frac{1}{2}E_2^2\tag{4.41}
$$

Employing the Lyapunov second theorem on stability [75] to ensure that L will approach to zero asymptotically, the coefficient iterative equations for  $b_1, b_2, \dots, b_k$ :

$$
b_1[n+1] = b_1[n] - \mu_1 \times sgn(Z_1)
$$
\n(4.42)

$$
b_2[n+1] = b_2[n] - \mu_2 \times sgn(Z_2)
$$
\n
$$
\vdots
$$
\n(4.43)

$$
b_k[n+1] = b_k[n] - \mu_k \times \operatorname{sgn}(Z_k)
$$
\n(4.44)

where  $\mu_i$  is the step number for coefficients  $b_i$ ,  $\forall i = 1, ..., k$  and  $Z_i$  is defined as follows

$$
Z_i = f_i(b_1[n], b_2[n], \cdots, b_k[n], E_1, E_2, M, S), \forall i = 1, ..., k. \tag{4.45}
$$

The value of  $sgn(x)$  is +1 if  $x > 0$ ,  $0 \overline{if} x = 0$  and  $-1$  if  $x < 0$ . Employing the sgn function simplifies the DCP hardware and reduces its power consumption. Here a generaic analytic conclusion is not provided, since it is not necessary and not easy. To verify its correctness, a simplified version of only  $b_1$  and  $b_3$  is described in Appendix A. If higher order compensation is necessary, the readers can conclude with the same process.

Here the transfer function between  $D_z^c$  and ' $V_1 - V_{da}$ ' is represented as

$$
y_d = f(y) = a_0 + a_1y + a_2y^2 + a_3y^3 + a_4y^4 + \cdots
$$
 (4.46)

where  $a_i$ ,  $\forall i = 0, \ldots \infty$  are the coefficients for a generic polynomial expression for the residue amplifier. The symbol  $y_d$  represents the amplified residue  $V_2$  which is quantized by Z-ADC. The symbol *y* represents the residue ' $V_1 - V_{da}$ '. To generally analyze the above high-order compensation and estimation is complicated and not necessary. Based on the above calibration concept, the following analysis is simplified to consider only lower-order characteristics to verify its convergence and stability.

To simplify the analysis, some assumptions are made: (1) even-order parameters can be ignored since the differential amplifier architectures are usually applied, (2) the higherorder (larger than 4th-order) coefficients are ignored, and  $(3)$  the offset  $a_0$  can be removed by post digital processing. Therefore, the approximated transfer function can be re-written as

$$
y_d = f(y) \approx a_1 y + a_3 y^3 \tag{4.47}
$$

Similarly for the signal compensator, only  $b_1$  and  $b_3$  are applied to simplifing the qualitative analysis.

$$
y_c = b_1 \cdot y_d + b_3 \cdot y_d^3 \tag{4.48}
$$

where the symbol  $y_c$  represents the  $D_z^c$ . According to the description in Appendix A, the iterative equations for  $b_1$  and  $b_3$  are represented as

$$
b_1[k+1] = b_1[k] - \mu_1 \times sgn(Z_1)
$$
\n(4.49)

$$
b_3[k+1] = b_3[k] - \mu_3 \times sgn(Z_3)
$$
\n(4.50)

$$
Z_1 = E_1 \left( b_1^3[k] - (3S + 64)b_3[k] \right) - E_2(24Mb_3[k]) \tag{4.51}
$$

$$
Z_3 = E_1(3S + 64) + E_2(24M)
$$
\n(4.52)

Considering the DCP which includes a compensator and an estimator, it can be viewed as a feedback loop. A feedback control system has three issues necessary to be concerned: (1) guaranteed convergence, (2) global concave solution and (3) small steady-state error. The convergence is firstly to be considered since it determines whether the proposed calibration technique can reach to a steady-state value or not? In Appendix A, the convergence is guaranteed by the Lyapunov second theorem on stability with two sufficient conditions:

$$
L \ge 0 \quad \text{and} \quad \frac{dL}{dt} < 0 \tag{4.53}
$$

Since iterative equations of  $b_1$  and  $b_3$  always guarantee above two conditions, the error index *L* will approach zero asymptotically.

In Section 4.4.4, another important issue is how many concave solutions exist in the three-dimensional  $(L, b_1, b_3)$  space? Finally, the steady-state error is discussed to confirm whether the calibration technique can achieve good performance or not?

### 4.4.4 Convergence Analysis

Since the convergence of L is guaranteed, finally we can get an optimum solution of  $b_1$ and  $b_3$ , called  $(b_1, b_3)_{opt}$ . Actually, several uncertainties affect this result to generate a



Figure 4.23: Convergent behavior for the estimation with (a) limit cycle issue, (b) global concave solution and (c) local concave solutions.

so-called 'limit cycle' problem for most nonlinear control systems. While a limit cycle occurs, it means that *L* just approaches 'around-zero', but not zero. That will introduces larger steady-state errors for the corrected code  $D_z^c$ . Nonlinear feedback systems usually have the limit cycle problem. For a background calibrated ADC, the quantization error of Z-ADC and truncation error in the calibration engine usually introduce certain amount of nonlinearities in the feedback loop. Figure 4.23 (a) shows the trajectory of *L* and possible limit cycle. To avoid this, the truncation error in the calibration engine should be reduced with with wider code-length of digital variables and coefficients. For the proposed iterative equations for  $b_1$  and  $b_3$ , the sign function will also induces extra possible fluctuation. Smaller step numbers for these iterative equations is a easier way to reduce the fluctuation range. Both of them are combined to become a more complicated system, which is difficult to analyze using simple mathematical models. In this thesis, the detail analysis is not provided, but only mentioned on the above qualitative description.

For a nonlinear feedback system, another issue is whether the DCP provides a global or local concave solution for  $b_1$  and  $b_3$ . Figure 4.23 (b) and (c) show global and local concave conditions respectively. Although limit cycle problem causes the values of  $b_1$ and  $b_3$  are not fixed, but they still walk around  $(b_1, b_3)_{opt}$ . If  $(b_1, b_3)_{opt}$  is a concave point for *L* on the global space, as shown in Figure 4.23 (b), the estimation result is still good. However, as shown in Figure 4.23 (c), there are two possible paths to converge at one singular point. The estimation may have more than one solutions to make *L* approach to zero. The final solution can be  $(b_1, b_3)$ <sub>*A*</sub> or  $(b_1, b_3)$ <sub>*B*</sub>. Only one of them is the exact result we want, the other one is just a local concave solution. If *L* converges to a local concave solution, it will cause worse linearity performance.

To verify this, a behavior simulation is proposed before circuit implementation stage. In this simulation, we sweep  $b_1$  and  $b_3$  to check how many concave solutions exist. Here we use the following transfer function to represent the residue amplifier,

$$
y_d = a_1 \cdot y + a_3 \cdot y^3 + a_5 \cdot y^5 + a_7 \cdot y^7 \tag{4.54}
$$

where  $a_1 = 1.004$ ,  $a_3 = -0.04$ ,  $a_5 = -0.1$  and  $a_7 = 0.03$ .  $y_d$  and y are defined in Section 4.4.3. Figure 4.24 shows the results whether the solution is global or local concave for  $b_1$  and  $b_3$ . In this simulation, all possible values for  $b_1$  and  $b_3$  are given to check



Figure 4.25: Linearity before and after calibration.

### 4.5. SUMMARY 111

the value of *L*. Left plot is the large scale sweeping, to check the number of concave solutions. Right plot is the result around the concave solution, it shows only one exact solution  $(b_1, b_3)_{opt}$  is around (1.0, 0.1). The simulation results shows that for this nonlinear feedback system, there exists only one concave solution to make *L* approach to zero.

After the confirmation of the global concave solution, the steady-state correction error is the next concern. Since the calibration technique is proposed to suppress the nonidealities of the residue amplifier on the output code  $D_z^c$ , the steady-state error should be less than one *VLSB* to get no missing code. Here we use ADC linearity performance to check whether the correction is good or not. Figure 4.25 shows the DNL and INL simulation results. Before calibration, the DNL is between -1 LSB and 0.5 LSB, there are many missing codes. INL is between -4 LSB and 4 LSB, caused by the gain error and nonlinearity of the residue amplifier. After calibration, the DNL is improvd between -0.7 LSB and 0.4 LSB, there is no missing code. INL is improved between -0.5 LSB and 0.5 LSB. This result represents that calibration processor can effectively correct the gain error and nonlinearity of the residue amplifier.

With the convergence analysis, including concave solution and steady-state error check, the proposed calibration technique can effectively correct the output code  $D_z^c$  to improve the overall ADC performance.

### 4.5 Summary

In this chapter, three characteristics for the calibration techniques are considered: estimation, compensation and capability. How to estimate the necessary coefficients to compensate the Z-ADC output code is the first step to analyze a calibration technique. Foreground estimation is simple, but only useful for unchangeable non-idealities. Background estimation can provide better error tracking capability to observe those changes due to temperature variation, supply voltage drift and device aging. Compensation method is also important. Analog way can directly repair the non-idealities, but it is not easy to implement, specially on scaled CMOS technologies. Digital compensation provides easier implementation by using the benefits of scaled CMOS technologies.

In recent years, more and more nonlinear calibration techniques are proposed to cor-

rect the non-idealities of the residue amplifier on nanometer CMOS technologies. Nonlinear calibration can relax the complicated analog circuits, which need more design efforts. Several digital calibration techniques have been proposed to correct both gain error and nonlinearity of a residue amplifier [70, 71, 72, 32]. These techniques are all applied on the pipelined ADC architectures. Both [71] and [72] are foreground calibration schemes. To enable background calibration, [71] requires a sample-and-hold with two different sampling rates, and [72] requires an interpolation filter which limits the bandwidth of the ADC input and has longer output latency. [70] is a histogram-based background calibration scheme. It requires a busy input to be effective. [32] is a correlation-based background calibration scheme with longer calibration time and complicated sub-DAC. When applying to a two-step ADC, all the above schemes require substantial modification to the analog signal path.

The proposed calibration technique operates in the background to track the non-idealities due to temperature variation, device aging and supply voltage drift. Its convergence is guaranteed by the Lyapunov second theorem on stability, described in Appendix A. It is independent on the input distribution, can be applied into most amplifier-based ADC architectures. It is simpler to estimate the necessary coefficients by using iterative equations with simple sign function, which greatly reduces hardware requirement. Its calibration time is short since only  $2^{2Z+2}$  samples are necessary to collect. To short the convergent time, a switching step number algorithm is proposed in Section 5.3.5. The original analog signal paths are not modified to do the calibration, and only the switch matrix is modified for calibration.

# Chapter 5

## A 10-bit 100-MS/s Two-Step ADC

### 5.1 Introduction

For wireless communication, a 10-bit 100-MSPS ADC can be a good design example. In general, the pipelined ADC architectures are often considered to implement with operational amplifiers. But due to the characteristic of nanometer CMOS devices, amplifier design is more difficult if the device's length is smaller at low supply voltage. This is because the operational amplifier needs higher gain-bandwidth requirement (for example, 10-bit ADC needs an operational amplifier with open-loop gain of over 60dB, but intrinsic gain of nanometer CMOS transistors is generally lower than 20dB.). To maintain enough output dynamic range of the amplifier, the amplifier design becomes more complicated. Moreover, the amplifier design is highly sensitive to CMOS technologies. It means that analog designers may think of new amplifier's architecture while changing to next advanced CMOS process. To solve this issue, we must simplify the analog circuits to adapt for scaled CMOS technologies.

Considering the fact that flash ADC architecture consists of only comparators which are easily implemented with the nanometer CMOS devices. But unfortunately, for 10-bit resolution, flash ADC is not suitable due to the usage of 1024 comparators. Subranging ADC architecture is similar to flash ADC, but with less comparators. For medium speed operation (between 20-MSPS and 200-MSPS), subranging ADCs provide another choice with lower power consumption. There is no high linearity amplifier requirement to implement a subranging ADC. In general, it only needs comparators, MUX and a resistor string. The resolution of the comparator is one of the key issues. Another issue is the complex MUX which is always the bottleneck of ADC operating speed. Again, due to the nanometer CMOS devices, the performance of the switches is worse. To improve the issues of the subranging ADC, the two-step ADC architecture was proposed. It contains coarse-ADC, fine ADC, resistor-string DAC and residue amplifier. Combining with the considerations of flash ADC and subranging ADC architectures, we examine the two-step ADC architecture and demonstrate their performances in the nanoscale CMOS technologies. Our proposed ADC design concept is to simplify the necessary analog circuits and digitally enhance the analog circuitry by the proposed background calibration technique.

In this chapter, the proposed ADC architecture is illustrated in Section 5.2. Section 5.3 describes the building blocks with their circuit designs in detail. Section 5.4 shows the measurement results. Section 5.5 draws a brief summary with the 10-bit ADCs comparison results.

### 5.2 Architecture

The proposed 10-bit two-step ADC architecture is shown in Figure 5.1. The ADC operates with two non-overlapping clocks,  $\phi_1$  and  $\phi_2$ . The duty ratios for  $\phi_1$  and  $\phi_2$  are 25% and 75% respectively. The clock  $\phi_{1a}$  is the advanced version of  $\phi_1$ , for the bottom plate sampling purpose. The major clock timing is shown in Figure 5.2.  $\phi_1$ ,  $\phi_{1a}$  and  $\phi_2$  are global clocks, generated from the clock generator. The clocks  $\phi_c$  and  $\phi_p$  are locak clocks, generated in coarse ADC.  $\phi_c$  is applied to driving the latch circuits and  $\phi_p$  is applied to offset compensation for the latch circuits. The clocks  $\phi_f$  and  $\phi_x$  are locak clocks, generated in fine ADC.  $\phi_f$  is applied to driving the latch circuits and  $\phi_x$  is applied to offset compensation for the latch circuits. The clock  $\phi_d$  is used for digital circuits, including calibration processor and DEC.

At the beginning of  $\phi_2 = 1$ , the coarse ADC (CADC) compares the analog input  $V_1$  with 33 coarse references  $V_{RC}$  to estimate the magnitude of  $V_1$ , yielding the 5-bit digital output  $D_1$ . The  $V_{RC}$  references are generated from a resistor string. The  $D_1$  code and *q* drives the resistor-DAC (RDAC) to select one voltage from 96 possible voltages,



Figure 5.2: Major clock signals used in the two-step ADC.

generated by the resistor string. Its output,  $V_{da}$ , is an estimation of the input  $V_1$ .

During  $\phi_1 = 1$ , the analog input  $V_1$  is also sampled onto the sampling capacitor  $C_s$ . During  $\phi_2 = 1$ , the residue amplifier (RAMP) amplifies the difference between  $V_1$  and  $V_{da}$ , yielding the amplified residue signal  $V_2$ . The RAMP is an open-loop amplifier with a nominal voltage gain of 8. The fine ADC (FADC) then compares the residue  $V_2$  with 65 fine references  $V_{RF}$  to estimate the magnitude of  $V_2$ , yielding the 6-bit digital output  $D_2$ . The FADC has an input range of 64 steps. In an ideal two-step ADC, the FADC needs only an input range of 32 steps. The 1-bit redundancy is added to tolerate the gain error and offset of the RAMP, and comparator offset in the CADC. It is also used to accommodate the extra signal range required by the RAMP digital calibration. The RAMP voltage gain mitigates the FADC resolution requirement. To reduce power consumption, the RAMP uses an open-loop single-stage amplifier. Its gain error and nonlinearity are corrected by the digital calibration processor (DCP) shown in Figure 5.1. The DCP receives the  $D_2$ code from the FADC and generates a corrected *D c*  $\frac{c}{2}$  code. The digital error correction (DEC) then combines  $D_1$  and  $D_2^c$  $\frac{c}{2}$  to produce the final ADC digital output  $D_0$ . The DCP also generates a digital random sequence  $q \in \{-1, 0, +1\}$ . The *q* sequence also drives the RDAC so that a random signal is injected into the RAMP. The DCP uses this random signal to calibrate the RAMP in the background.

The analog signal path of the ADC is fully differential. The top and bottom reference voltages are set to be VDD and VSS respectively. Using supply and ground voltages as the references can save the power consumption of the reference buffers. The ADC differential input range is '2  $\times$  VDD' (in this design, it is 2 V). One LSB is 1.95 mV for 10-bit resolution. To adapt for output range and gain of the residue amplifier, the FADC has a differential input range of 1 V and a step size of 8 LSB. The gain of residue amplifier can really mitigate the resolution of the FADC.



Figure 5.3: The 5-bit flash type CADC.

### 5.3 Circuits Description

### 5.3.1 Comparator

The CADC is a 5-bit flash ADC, shown in Figure 5.3, which consists of 33 comparators, de-bubble logic, clock buffers and a dynamic ROM encoder. The comparator is implemented by the latch type comparator with an offset compensation loop, mentioned in Section 3.4. The analog input  $V_1$  is directly connected to the ADC's input. The reference voltages  $V_{RC}[n]$ ,  $n = 0 \cdots 32$  are provided by the resistor string in the RDAC. The extra two references  $V_{RC}$ [0] and  $V_{RC}$ [32] are applied to checking the top and bottom range of the ADC input signal. Two global clock signals  $\phi_{1a}$  and  $\phi_2$  provide the CADC timing information.

Figure 5.4 shows the architecture of the comparator in the CADC. Its function is comparing the input  $V_1$  with a reference  $V_{RC}[n]$ , where *n* is an integer between 0 and 32 for indexing one of the  $V_{RC}$  coarse references. The comparator includes a regenerative latch with an offset calibration control loop. To reduce power consumption, there is no conventional pre-amplifier. The *VOS* in front of the latch represents the input-referred offset of the latch due to device mismatches. The *Vcm* represents the input common-mode voltage. The latch is triggered by the clock  $\phi_c$ , which is generated from the clock signals  $\phi_{1a}$  and  $\phi_2$ . Comparisons are made near the beginnings of both  $\phi_1$  and  $\phi_2$  periods. The comparison determines the polarity of the differential voltage at the input port  $V_a$ , but with an equivalent input offset of  $V_{OS} + V_c - V_{cm}$ . In Figure 5.4, the switch S3 is controlled by the clock  $\phi_{1a}$ , which is an advanced version of the clock  $\phi_1$ . The switch S3 is opened before



Figure 5.5: Timing generator for comparator.

the switch S1 so that the bottom-plate sampling operation is enabled.

All switches using pMOS or nMOS transistors with minimum length, *Lmin*=80 nm. The switch S1 is implemented by both pMOS and nMOS transistors to tolerate large input swing range. The width of the switch S1 is designed as  $(W_p, W_n) = (2 \mu m, 0.8 \mu m)$ . The switch S2 is implemented by pMOS, nMOS or both transistors. It is determined by the voltage level of the connected reference. If  $V_{RC}[n] < \frac{1}{4}$ VDD, it is nMOS transistor with the width of 0.5  $\mu$ m. If  $V_{RC}[n] > \frac{3}{4}$ VDD, it is pMOS transistor with the width of 1  $\mu$ m. Otherwise, it is implemented by both pMOS and nMOS transistors, the same as switch S1. The switch S3 is implemented by both pMOS and nMOS transistors, the same as switch S1, since  $V_{cm} = \text{VDD}/2$ . The  $C_1$  sampling capacitor is a 25 fF metal-oxide-metal (MOM) capacitor. The ac coupling of the  $C_1$  input network causes 10% signal loss.

In Figure 5.4, the effect of  $V_{OS}$  is removed by the offset-calibration charge pump (OCCP). During  $\phi_1 = 1$ , the ADC analog input  $V_1$  is sampled onto the capacitor  $C_1$  and the latch input  $V_a$  is connected to the common-mode voltage  $V_{cm}$ . The latch then makes a comparison for the calibration. If the comparison result  $D_c$  is 1, an up pulse is generated in the OCCP, and  $V_c$  is increased by charging the capacitor  $C_2$ . If  $D_c$  is 0, a down pulse is generated in the OCCP, and  $V_c$  is decreased by discharging the capacitor  $C_2$ . Voltage  $V_c$ eventually converges to  $V_{cm} - V_{OS}$ . The effect of  $V_{OS}$  is then canceled. During  $\phi_2 = 1$ , the capacitor  $C_1$  is connected to  $V_{RC}[n]$ . The latch then makes a comparison for the conversion, and the output  $D_c$  represents the polarity of  $V_1 - V_{RC}[n]$ .

Figure 5.5 represents the necessary local clock signals from global clock signal  $\phi_1$ . Clock signal  $\phi_c$  is applied to triggering the comparator to make a comparison. Clock signal  $\phi_p$  is combined with  $D_c$  to generate the control pulse to charge or discharge the capacitor  $C_2$ , shown in Figure 5.4. To save power consumption, the local clock generator is shared by 8 comparators. The timing delay  $t_1$  and  $t_2$  are designed to be about 0.6 ns and 0.8 ns respectively. The control pulse width  $t_3$  is designed to be 1.0 ns.

Figure 5.6 shows the latch schematic. There are two input ports. One port receives the differential input  $V_a$ . The other port receives the difference between  $V_c$  and  $V_{cm}$ , where  $V_{cm}$ is a common-mode reference and *V<sup>c</sup>* adjusts the offset of the latch. Transistors M3 and M7 are added to reduce the conducting currents when the latch is turned on. Kickback noises at the inputs of the latch are also reduced. Without considering the matching requirement,



Figure 5.6: Schematic of the latch in CADC comparator.

nMOS transistors M1-M10 have width of 0.2 *µ*m and pMOS transistors M11-M12 have the width of 0.2  $\mu$ m. All  $\phi_c$ -controlled pMOS transistors are the width of 0.2  $\mu$ m. From Monte Carlo simulation results, the offset standard deviation  $\sigma(V_{OS})$  is about 50 mV.

The  $C_2$  capacitor in the OCCP is realized using a nMOS transistor. Its capacitance is designed to be about 1 pF. The output currents of the charge-pump current sources,  $I_p$ and  $I_n$ , are about 1  $\mu$ A, which is only active at the charging or discharging period. The width of the up and down pulses is 1 ns, which is determined by an inverter chain. Thus, in each calibration step, the *V<sup>c</sup>* is changed by about 1 mV, which is about 1/2 LSB for the ADC. After the offset calibration settles,  $V_c$  may vary in the same direction for at most two consecutive calibration steps, yielding a worst-case fluctuation of  $\pm 1$  mV. In other words, the comparator offset is reduced to less than 1 mV by the OCCP. The  $V_c$  fluctuation can be affected by  $I_p$ ,  $I_n$ ,  $C_2$ , and the width of the up and down pulses. Their variations are tolerated due to the FADC 1-bit redundancy. The matching between  $I_p$  and  $I_p$  is not crucial. It affects only the ratio of the up and down pulses. The analog compensation, using an extra input pair, can provide both wider tuning range and finer tuning step. Even it introduces extra offset voltage, this offset is also easily compensated by the OCCP.

Operating at 100 MHz clock rate, each CADC comparator consumes 18 *µ*W. The entire CADC, including comparators, de-bubble logic, ROM, and clock buffers, consumes 0.8 mW. The total input capacitance of the CADC is 0.8 pF.

The FADC is a 6-bit flash ADC, similar to the CADC architecture. It includes 65 comparators, de-bubble logic, clock buffers and a ROM encoder. Figure 5.7 shows the architecture of the FADC comparator. Its function is comparing the RAMP output  $V_2$  with a reference  $V_{RF}[n]$ , where *n* is an integer between 0 and 64 for indexing one of the  $V_{RF}$  fine references. The switches S1 and S2 are implemented with pMOS and nMOS transistors, which have the width of 2  $\mu$ m and 0.8  $\mu$ m respectively. Similar to the CADC comparator, it includes a regenerative latch and an offset-calibration charge pump (OCCP). The latch is triggered by the clock  $\phi_f$ , which is generated from the clock signals  $\phi_1$  and  $\phi_2$ . Comparisons are made near the ends of both  $\phi_1$  and  $\phi_2$  periods. During  $\phi_1 = 1$ , both input ports  $V_a$  and  $V_r$  are connected to the  $V_{RF}[n]$  reference. The latch makes a calibration comparison, the OCCP then adjusts  $V_c$  to minimize the input offset. The  $V_c$  fluctuation for the FADC comparator should be less than  $\pm 8$  mV, i.e.,  $\pm 1/2$  of the FADC input step size.



Figure 5.8: Schematic of the latch in FADC comparator.



Figure 5.9: Schematic of the residue amplifier (RAMP).

Near the end of  $\phi_2 = 1$ , the latch makes a conversion comparison, the resulting  $D_f$  represents the polarity of  $V_2 - V_{RF}[n]$ . Unlike the CADC comparator, the FADC comparator does not employ the switched-capacitor network to perform the  $V_2 - V_{RF}[n]$  subtraction. The reason is to avoid extra capacitive loading for the RAMP.

Figure 5.8 shows the schematic of the latch in the FADC comparator. It has three input source-coupled pairs. The M1-M2 pair is connected to the positive terminals of the input ports *V<sup>a</sup>* and *V<sup>r</sup>* , while the M5-M6 pair is connected to the negative terminals. All nMOS transistors M1-M14 have the width of  $0.2 \mu$ m and pMOS transistors M15-M16 have the width of 0.2  $\mu$ m. Operating at 100 MHz clock rate, each FADC comparator consumes 22  $\mu$ W. The entire FADC, including comparators, de-bubble logic, a ROM encoder and clock buffers, consumes 1.7 mW.

### 5.3.2 Residue Amplifier (RAMP)

Figure 5.9 shows the residue amplifier (RAMP) schematic. It comprises a switchedcapacitor input network and a single-stage differential amplifier. The input sampling

| M0a            | M <sub>0</sub>                                    | M1                                                                                 | M <sub>2</sub>            |
|----------------|---------------------------------------------------|------------------------------------------------------------------------------------|---------------------------|
|                | $96 \mu m / 0.24 \mu m$   96 $\mu m / 0.24 \mu m$ | $45 \mu m / 0.16 \mu m$                                                            | 45 $\mu$ m / 0.16 $\mu$ m |
| M <sub>3</sub> | M4                                                | M5                                                                                 |                           |
|                |                                                   | 72 $\mu$ m / 0.24 $\mu$ m   72 $\mu$ m / 0.24 $\mu$ m   400 $\mu$ m / 0.24 $\mu$ m |                           |

Table 5.1: RAMP transistor size summary

switches S1–S6 are nMOS transistors with constant- $V_{gs}$  bootstrapped gate drive [76].

pMOS transistors M3 and M4 are current sources. Resistors  $R_1$  and  $R_2$  are realized by p+ poly with n-well to suppress the coupling noise from the substrate. Their resistances are designed to be 5 K*Ω*. They are applied as the passive loads to provide better RAMP linearity. Their resistance is close to the output resistance of M1 and M2. pMOS transistor M5 is added to improve the power supply rejection ratio (PSRR), and the common-mode gain. Total tail current of the amplifier is about 0.8 mA. Half of the tail current in M0 is controlled by a switched-capacitor common-mode feedback (CMFB). The differential output range is designed to be 1.0 V (single-ended swing range is from 0.25 V to 0.75 V). The output time constant with the loading capacitance of 500 fF is about 0.83 ns, which is enough to settle within 4 ns for 6-bit resolution. Actually, the time constant is proportional to the tail current, or power consumption of the RAMP. More power consumption cause smaller time constant to achieve higher operation speed for the RAMP. There is no circuit penalty necessary to dissipate, such as miller compensation for close-loop configuration. Table 5.1 summarizes the transistors and resistors used in the RAMP.

During  $\phi_1 = 1$ , the differential input  $V_1$  is sampled onto the  $C_{S_1}$  and  $C_{S_2}$  capacitors. At the same time, the inputs and the outputs of the differential amplifier are shorted for offset cancellation. This offset cancellation suppresses the variation of the RAMP output range. During  $\phi_2 = 1$ , the residue  $V_1 - V_{da}$  is amplified by the differential amplifier in open-loop configuration.

Figure 5.10 shows the applied switched-capacitor CMFB circuit. While RAMP operates as a residue amplifier, switches S1-S4 are closed, the  $C_1$  and  $C_2$  are charged by  $V_{cm}$ and  $V_{b3}$ . While RAMP is connected as unit-gain feedback, switches S5-S8 are closed, the charge-redistribution mechanism is activated to adjust the bias voltage  $V_{cfb}$ . Finally, the output common-mode voltage approaches to  $V_{cm}$  asymptotically. To reduce the extra



Figure 5.10: The switched-capacitor common-mode feedback (CMFB) circuit.

output loading on outputs of the RAMP, the capacitance of  $C_3$  and  $C_4$  are designed as 100 fF. The capacitance  $C_1$  and  $C_2$  are 25 fF to act as a low-pass filter. The parasitic capacitors on the high-impedance nodes will change the output common-mode voltage if the capacitance of  $C_1$ - $C_4$  are not large enough.

Figure 5.11 shows the bootstrapped switch circuit which is applied to maintaining 10-bit linearity. During  $\phi_1$  is low, the voltage difference between two terminal of the capacitor  $C_1$  is charged to VDD. During  $\phi_1$  is high, one terminal of  $C_1$  is connected to input  $V_i$  and the other terminal will be charged to ' $V_i + VDD$ ' to make the node voltage  $V_b$ approach to ' $V_i$ +VDD'. Therefore the gate-to-source voltage of the input transistor M0 is VDD, which is independent to input  $V_i$ . Actually, this gate-to-source voltage of M1 just approaches to VDD, but not VDD. However, its independence on the input  $V_i$  is the most important feature for the bootstrapped switch.

The reliability of the transistors used in the bootstrapped switch circuit is necessary to be improved. To avoid the reliability issue, the nMOS transistor M3T is a three times minimum-length thin-oxide device. The transistor M2a is also applied to improving the reliability issue of M0. The transistor M4a is applied to maintaining the normal operation of the bootstrapped switch. By the simulation results, this input sampler circuit cam achieve over 80 dB spurs-free dynamic range (SFDR) by using the bootstrapped switch.



 $C_1$  is a metal-oxide-metal (MOM) capacitor with a capacitance of 100 fF.

The input capacitors,  $C_{S1}$  and  $C_{S2}$ , are metal-oxide-metal capacitors with a capacitance of 250 fF. The ac coupling of the  $C<sub>S</sub>$  input network causes 20% signal loss. The entire RAMP provides a nominal voltage gain of 8 for residue amplification. The RAMP consumes a total power of 1.1 mW at 100 MS/s sampling rate.

### 5.3.3 Resistor-String DAC (RDAC)

The resistor string shown in Figure 5.1 provides 33 differential *VRC* references for the CADC and 65 differential  $V_{RF}$  references for the FADC. Since the resistor string is spread for a long distance (in this design, it is about 400 um). The voltage steps near two boundaries of the resistor string could be different due to process gradient variation. All differential references are in fact generated from two parallel resistor strings with currents flowing in the opposite direction. This arrangement can reduce the variation due to gradient effect which causes the third order linearity issue for differential operation. To improve the lin-
earity, these two resistor strings are tied together by metal wires connecting the  $V_{RC}$  taps of identical voltages.

For example, the first tape of both strings have the resistance of  $R + \alpha \cdot R$  and  $R - \alpha \cdot R$ due to process gradient.  $\alpha$  means the deviation ratio of the tape resistance. With above two configurations, two opposite resistor strings and wire-connected tapes, the resistance deviation can be greatly reduced (from  $\alpha$  to  $\alpha^2$ ).

$$
R_{tap} = 2(R + \alpha \cdot R)||2(R - \alpha \cdot R)
$$
\n(5.1)

$$
= R - \alpha^2 \cdot R \tag{5.2}
$$

Since the resistor strings have large voltage drop on both terminals, the voltage dependent material should be avoided to use. In this thesis, The resistor strings are implemented with nonsalicide polysilicon, shown in Figure 5.12. The n-well is placed under the resistor strings to isolate the coupling noise from the substrate. The  $p+$  guard-ring is placed around the n-well to suppress the substrate noise in advance.

In this thesis, total resistance between VDD and VSS is 1 K*Ω* to achieve better linearity with small enough time constant for all  $V_{RC}$  and  $V_{RF}$  references. If the resistance is smaller, the kickback noise from the comparator is lower to affect these references. But larger power consumption is the penalty. Each string has the resistance of 2 K*Ω*, divided into 32 coarse tapes. Each coarse tapes consists of 4 fine tapes. The ADC linearity is determined ultimately by the resistor string linearity. The geometry of each tapes must be equal with same environment. The width of each string is 20  $\mu$ m to meet the matching requirement of 10-bit accuracy. With both design and layout considerations, the ADC linearity can be well-maintained. The resistor strings consume 1 mW for 1V supply voltage.

To generate the RDAC output *Vda*, the resistor string also generate a set of 3 different references separated by 8 LSB for each of the  $32 D_1$  codes. Figure 5.13 shows the RDAC which comprises a decoder and a MUX. The decoder combines the digital inputs  $D_1$ and *q* to drive the analog switches in the MUX. These switches are implemented by the same criterion as described in Section 5.3.1. If  $V_R[n] < \frac{1}{4}$ VDD, it is nMOS transistor with the width of 3.2  $\mu$ m. If  $V_R[n] > \frac{3}{4}VDD$ , it is pMOS transistor with the width of 8 *µ*m. Otherwise, it is the combination of PNOS and nMOS transistors with the width of 8  $\mu$ m and 3.2  $\mu$ m respectively. The equivalent resistance of the switch dominates the



Figure 5.12: Layout of two resistor strings with the opposite direction.

time constant of the input estimate  $V_{da}$ . The MUX selects one out of 96 possible voltages, which are generated by the resistor string. The RDAC output  $V_{da}$ , determined by  $D_1$  and *q*, is expressed as

$$
V_{da} = (32 \cdot D_1 - 8 \cdot q) \times \text{LSB} \tag{5.3}
$$

A random signal *q* with a magnitude of 8 LSB is injected into the analog signal path to enable the digital background calibration described in Section 5.3.6.

#### 5.3.4 Distributed Input Track-and-Hold

The ADC does not have a dedicated input sampler. As shown in Figure 5.14, the analog input  $V_1$  is sampled by the passive samplers in the RAMP and in the CADC comparators. The clocks  $\phi_1$  and  $\phi_{1a}$  control the samplers. The timing skews of the clocks are minimized by careful matching the delay of the clock buffers by tree-like network and layout matching.

The matching of the  $V_1$  signal paths is also critical. Due to the resistivity of metal wires and analog switches, the transfer function from the  $V_1$  input to each sampling capacitor



Figure 5.13: Schematic of the resistor-string DAC.



Figure 5.14: Distributed input track-and-hold.

in the sampling mode is a low-pass filter. The transfer functions should be identical. As shown in Figure 5.14, a tree-like routing scheme is also applied to connecting the  $V_1$  input to the CADC comparators. In addition, the transfer function from  $V_1$  to the RAMP input,  $V_1(R)$ , is made to match the transfer function from  $V_1$  to the middle of CADC,  $V_1(16)$ . The  $V_1$  signal paths are routed using the top two metal layers shorted as a single wire. More descriptions are illustrated in Appendix C.2.

#### 5.3.5 Digital Calibration Processor

The proposed digital calibration processor (DCP) for a two-step ADC is shown in Figure 5.15. As mentioned in Section 4.4, the DCP consists of a signal compensator and a coefficient estimator. The signal compensator contains first-order coefficient  $b_1$  and third-order coefficient  $b_3$  to correct gain error and nonlinearity respectively in the digital domain. The exact values of  $b_1$  and  $b_3$  are estimated by the proposed coefficient estimator.

The background estimation is activated by a tri-level random sequence *q*. The tri-level random signal *q* is constructed by combining two uncorrelated binary pseudo-random sequences  $q_1$  and  $q_2$ . Each binary random sequence has the length of  $2^{14}$  and equal numbers of zero and one. The relationship between them is as follow:  $q=-1$  if  $(q_1, q_2) = (0, 0)$ , and  $q=+1$  if  $(q_1, q_2) = (1, 1)$ ; otherwise,  $q=0$ . With random signal q, the estimated input  $V_{da}$  is randomly subtracted by  $q \times 8LSB$ . The estimator first receives the  $D_2^c$  $\frac{c}{2}$  codes from the compensator, and sorts the data according to the associated *q*. It then averages the sorted data, and applies subtraction to acquire  $H_1$  and  $H_2$ . The estimator also acquires the average of  $D_2^c$  $\frac{c}{2}$  after removing the calibration information, denoted as  $M$ .

The acquired data  $H_1$ ,  $H_2$  and M are updated once every R samples, which is determined by the resolution of FADC, *Z* bits. To extract the calibration information from  $D_2^c$  $L_2^c$ , the input-related amount in  $D_2^c$  must be removed by the averaging function with *R* samples. We make an assumption that this input-related amount is uniformly distributed between  $-0.5V$  and  $+0.5V$ . Its standrad deviation, attenuated by *R* to be  $1/sqrt12 \cdot R$ , must be less than  $1/2 \cdot V_{LSB}$ . We can obtain the limitation of *R*,

$$
R > \frac{1}{3} \cdot 2^{2Z} \tag{5.4}
$$

In our design, we use  $R = 2^{2Z}$  to tolerate the inaccuracy of the assumption. Moreover,



Figure 5.15: The proposed digital calibration processor (DCP): (a) signal compensator and (b) coefficient estimator.

since the random signal *q* has three values of -1, 0 and +1, the possibilities for  $q = -1$  or  $q = +1$  are one-of-fourth. Therefore, the value of *R* is designed to be  $2^{2Z+2}$ . In this ADC, the FADC is 6 bits, which means  $Z = 6$  and  $R = 2^{14}$ . The averaging function uses  $2^{14} D_2^c$ 2 samples to extract the calibration information.

Two error indices  $E_1$  and  $E_2$  are calculated as follows,

$$
E_1 = H_1 + H_2 - 16 \tag{5.5}
$$

$$
E_2 = H_1 - H_2 \tag{5.6}
$$

Here,  $E_1$  reveals the gain error, and  $E_2$  reveals the nonlinearity. Combining both error terms, a single error function *L* is defined as

$$
L = \frac{1}{2}E_1^2 + \frac{1}{2}E_2^2
$$
\n(5.7)

Employing the Lyapunov second theorem on stability [75] to ensure that *L* will approach to zero asymptotically, we can find the following equations to estimate  $b_1$  and  $b_3$ .  $b_3$ .

$$
b_1[k+1] = b_1[k] - \mu_1 \times \text{sgn}(Z_1)
$$
 (5.8)

$$
b_3[k+1] = b_3[k] - \mu_3 \times \text{sgn}(Z_3)
$$
 (5.9)

where

$$
Z_1 = E_1 (b_1^3[k] - (3S + 64)b_3[k]) - E_2(24Mb_3[k])
$$
 (5.10)

$$
Z_3 = E_1(3S + 64) + E_2(24M)
$$
\n(5.11)

The value of sgn(*x*) is +1 if  $x > 0$ , 0 if  $x = 0$ , and  $-1$  if  $x < 0$ . Employing the sgn function simplifies the DCP hardware and reduces its power consumption. Derivation of the above equations is included in Appendix A. The *S* variable in Equation (5.10) and Equation (5.11) can be further simplified by replacing it with a constant 256*/*3, which is  $E[(D_2^c)]$  $2^c$ <sup>2</sup> $]$  when  $D_2^c$  $\frac{c}{2}$  is uniformly distributed between  $-16$  and  $+16$ . Simulations show that using a constant *S* do not affect the calibration process.

The step numbers  $\mu_1$  and  $\mu_3$  are two positive constants. Smaller  $\mu_1$  and  $\mu_3$  result in less fluctuations of  $b_1$  and  $b_3$ , but slower convergent time. We want shorter convergent time

when errors are large and less fluctuation when errors are small. To reduce the convergent time of the proposed calibration technique without sacrifice of fluctuation ranges, a switching step number algorithm is proposed by using the error index  $E_1$ . For an example of two switching numbers:  $k_1$  and  $k_2$ ,  $k_1$  is larger and  $k_2$  is smaller. The step number  $\mu_1$ is defined as follow,

$$
\mu_1 = \begin{cases} k_1 & \text{if } |E_1| \ge 1, \\ k_2 & \text{if } |E_1| < 1. \end{cases}
$$
 (5.12)

The DCP also choose  $\mu_3 = \mu_1/1024$ . To speed up the convergent time,  $k_1$  is as large as possible theoretically. However, due to the unknown convergence values for  $b_1$  and  $b_3$ ,  $k_1$ can not be too large. Large  $k_1$  may cause large fluctuations of  $b_1$  and  $b_3$  if  $|E_1|$  is always larger than one. To avoid this issue,  $k_1$  should be less then inverse of  $d$ ,  $k_1 < 1/d$ . If  $k_1$ is equal to  $1/d$ , maximum  $|E_1|$  is equal to one. *d* is the calibration amount in  $D_2^c$  $\frac{c}{2}$ . In our design,  $k_1$  must be less than  $1/8$  since *d* is 8. For safety, I choose  $k_1$  is  $1/64$  and  $k_2$  is 1*/*256. Although it will increase the convergent time, but not too much.

Figure 5.16 shows the calibration behavior of the proposed DCP. In the simulation, the RAMP is followed by an ideal 6-bit FADC. The RAMP transfer function is obtained from SPICE simulation, and is modeled as

$$
y_d = a_1 \cdot y + a_3 \cdot y^3 + a_5 \cdot y^5 + a_7 \cdot y^7 \tag{5.13}
$$

where *y* is the FADC output of an ideal RAMP, and  $y_d$  is that of a real RAMP. The values of the coefficients are  $a_1 = 0.8$ ,  $a_3 = -5.5 \times 10^{-5}$ ,  $a_5 = -1 \times 10^{-7}$  and  $a_7 = 3 \times 10^{-10}$ . The initial values for  $b_1$  and  $b_3$  are set as 1.0 and 5.5  $\times$  10<sup>-5</sup> respectively. The ADC input is a full-scaled sine wave. The coefficients  $b_1$  and  $b_3$  are settled to 1*.*225 and 2*.75* × 10<sup>-4</sup> respectively.

Three kinds of input patterns are simulated: DC, random and sine wave input signals. For random and sine wave inputs, the steady-state values of  $b_1$  and  $b_3$  are similar. The error function *L* approaches zero asymptotically. However, for DC input, *L* does not approach zero but fluctuates around  $L = 1$ . But the corrected code  $D_2^c$  $\frac{c}{2}$  still has a constant DC value. It is caused by the quantization error of the FADC, results in a monotonic data collection. It only occurs at some special DC inputs which cause the large quantization error. Actually, by using a digital dither, *L* can approach zero asymptotically.



Figure 5.16: Transient behavior of the proposed digital calibration scheme.

The overall convergent time is about 30 iteration cycles. Each iteration cycle is  $2^{14}$ sampling periods. At 100 MS/s sampling rate, the calibration convergent time is about 5 ms. Compared with other nonlinear calibration techniques [70, 71, 72, 32], this calibration technique has less calibration time. This is mainly caused by the two-step ADC architecture which can further reduce the resolution of Z-ADC, compared with pipelined ADC architectures. The variable step number mechanism is also helpful to reduce the calibration time.

#### 5.3.6 Other Digital Circuits

For 10-bit 100-MS/s ADC, the clock jitter should be considered in the design stage. Since the clock signals are widly applied to RAMP, CADC and FADC, the clock buffers are necessary to be carefully arranged. The most important clock signals are for the RAMP, which have 10-bit accuracy requirement. Dedicated clock buffers are necessary for the RAMP from the clock generator. The last buffers to control the analog switches using analog power supply to reduce the coupling paths from clock domain to analog domain. Actually, the clock jitter coming from the supply-induced noise of buffers is the dominant terms for the sampling jitter. The isolation in the power domain can reduce certain amount of clock jitter. For a 10-bit ADC with 50MHz input frequency, the standard deviation of the clock jitter is only 1.1 ps. The detailed jitter requirement is dependent on the ADC resolution and the input frequency, concluded in Appendix C.1.

In CADC and FADC, the dynamic ROM encoders are applied to translating the edge codes into Gray codes, only one bit transition is allowed for neighboring code values. Thermal codes are processed by the de-bubble logic to yield the edge codes. Figure 5.17 shows the ROM encoder applied in the FADC. The operation of the ROM is divided into two phases: pre-charging phase and evaluation phase. At the pre-charging phase  $(CK=1)$ , the input  $D_f$ [63 : 1] are gated and the output nodes  $D_2$ [5 : 0] are charged to VDD through these pMOS transistors. At the evaluation phase (CK=0), the input  $D_f$  [63 : 1] are latched to control the individual nMOS transistors to behave the translation to the Gray codes  $D_2[5:0]$ . The gray codes translation can effectively reduce the effect of the bubbles in the thermal codes [77]. The pre-charge pMOS transistors are designed with the width of



Figure 5.17: Simplified dynamic ROM used in the FADC.

1.2  $\mu$ m. The evaluation nMOS transistors are designed with the width of 0.24  $\mu$ m.

The DEC, shown in Figure 5.1, combines the 5-bit  $D_1$  from the CADC and the 6-bit  $D_2^c$  $\frac{c}{2}$  from the DCP to produce the final ADC digital output  $D_0$ . The *q* signal injected from the RDAC must be removed from  $D_0$ . Thus,  $D_0$  is calculated as

$$
D_o = D_1 \times 2^5 + D_2^c - q \times 8 \tag{5.14}
$$

The full range of the 10-bit output code  $D_0$  is from  $-512$  to  $+511$ .

### 5.4 Experimental Results

The ADC was fabricated using a 90 nm digital CMOS technology with one layer of polysilicon and six layers of metal. The ADC chip micrograph is shown in Figure 5.18. It occupies an active area of 0.36 mm<sup>2</sup>. Operating at 100 MHz sampling frequency, the ADC core consumes a total power of 6 mW from a 1 V supply. The single-ended input swing range can be as high as 1 V, equal to the supply voltage VDD. The input capacitance of this ADC is about 1.2 pF, which includes the input capacitance of RAMP and CADC, and other parasitic capacitance. All digital circuits including calibration processor, clock generator and clock buffers, dissipate 1.4 mW at 100 MS/s sampling rate.

Figure 5.19 shows the simplified block diagram of the instrumentation setup. The input signal is generated by Agilent E4438C signal generator, and then it passes through a band-pass filter (BPF) to connect a transformer ADT1-1WT. After the transformer network, differential input signals AIP and AIN are generated to the ADC chip. The clock signal is generated by HP 8648C frequency synthesizer, and then connected to a transformer ADT1-1WT to generate differential clock signals CKP and CKN. The frequency of the clock signals are twice of ADC's sampling frequency. These two sine wave signals are translated into digital clock signals by built-in clock receiver, and then passed to a clock generator in the ADC chip to generate all necessary global clock signals. The digital code DOUT is captured using Agilent 16702B logical analyzer, and then analyzed by using the Matlab program. The digital data DOUT are synchronized with output clock signal CKO, which is generated from the ADC digital calibration processor.

The differential nonlinearity (DNL) and integral nonlinearity (INL) are measured by



Figure 5.18: ADC chip micrograph.



Figure 5.19: Block diagram of the instrumentation setup.



Figure 5.20: Measured CADC differential nonlinearity (DNL).

using a code-density test [78] with a 1 MHz sine wave input. To verify the comparator offset compensation, the 5-bit CADC output codes are dumped to measure its linearity. Figure 5.20 shows the measured DNL of the CADC before and after comparator's offset compensation. The DNL is −1/+1 LSB before calibration, and is improved to −0*.*25/+0*.*25 LSB after calibration. The proposed offset compensation scheme effectively reduces the equivalent input offset of the latch-type comparator.

Figure 5.21 and Figure 5.22 show the measured DNL and INL of the ADC before and after the RAMP digital calibration with 1 MHz sinusoidal input. 65536 output data are collected to do the code density test. The DNL is improved from −1/+4 LSB to −0*.*5/+0*.*6 LSB by the calibration. Missing codes are eliminated by the proposed calibration technique. The INL is improved from −17/+18 LSB to −0*.*9/+0*.*9 LSB by the calibration.

Figure 5.23 shows the ADC output spectrums before and after the RAMP nonlinear calibration at 100 MS/s sampling rate. The input is an 1 MHz sinusoidal signal with differential amplitude of 1.9  $V_{pp}$ . Without the proposed calibration, the signal-to-noiseplus-distortion ratio (SNDR) is 35 dB and the spurious-free dynamic range (SFDR) is 43 dB. After the calibration is activated, the SNDR is improved by 23 dB to 58 dB and the SFDR is improved by 32 dB to 75 dB. Figure 5.24 shows the ADC output spectrums with input frequency of 40MHz. The measured SNDR and SFDR are 56 dB and 67 dB respectively.

Figure 5.25 shows the ADC dynamic performance versus input frequencies at 100 MHz sampling rate. The measured SFDR degrades gradually towards higher input frequencies. It is caused by the mismatch between the CADC distributed input track-and-holds and the RAMP track-and-hold. The resonance around 40 MHz input frequency may come from the external transformer circuits, but it can not be proved. The SNDR degradation at higher input frequencies is due to the sampling clock jitter. The effective resolution bandwidth (ERBW) is about 46 MHz.

Figure 5.26 shows the ADC dynamic performance versus sampling rates. The input is a 1 MHz sinusoidal signal. The SFDR is higher than 66 dB up to 160 MS/s sampling rate, and the SNDR can maintain 56 dB up to 150 MS/s sampling rate. The SFDR begin to degrade for sampling rates higher than 100 MS/s. This is mainly due to the incomplete



Figure 5.22: Measured ADC integral nonlinearity (INL).



Figure 5.23: Measured output spectrum at 100MS/s before and after RAMP's calibration.



Figure 5.24: Measured output spectrum with 40MHz input frequency at 100MS/s.



Figure 5.26: Dynamic performance versus sampling frequency.

| Technology                            | 90nm CMOS   |
|---------------------------------------|-------------|
| Supply Voltage (V)                    | 1.0         |
| Resolution (bit)                      | 10          |
| Sampling Rate (MHz)                   | 100         |
| Input Range $(V_{pp}$ differential)   | 2.0         |
| Input Loading (pF)                    | 1.2         |
| DNL (LSB)                             | $+0.6/-0.5$ |
| INL (LSB)                             | $+0.9/-0.9$ |
| SNDR (dB) $(F_{in}=1 \text{ MHz})$    | 58          |
| SNDR (dB) $(F_{in} = 50$ MHz)         | 53.7        |
| SFDR (dB) $(F_{in} = 1 \text{ MHz})$  | 75          |
| SFDR (dB) $(F_{in} = 50 \text{ MHz})$ | 64          |
| THD (dB) $(F_{in}=1 \text{ MHz})$     | $-70$       |
| THD (dB) $(F_{in} = 50 \text{ MHz})$  | $-60$       |
| Power Consumption (mW)                | 6           |
| FOM1 (fJ/conv.-step)                  | 92          |
| FOM2 (fJ-V/conv.-step)                | 100         |
| Active Area $(mm^2)$<br><u>FIEISI</u> | 0.36        |
| 1896                                  |             |
| <b>THURSDAY</b>                       |             |

Table 5.2: Performance Summary

settling of the RAMP.

Table 5.2 summarizes the measured specifications of this ADC chip. To evaluate the performance of ADC architectures, FOM1 is the same as FOM, defined in Section 2.8. However, to adapt for the design difficulty on the nano-scale CMOS technologies, another FOM is applied as FOM2. FOM1 and FOM2 are defined as follow:

$$
FOM1 = \frac{Power}{min(F_s, 2ERBW) \times 2^{ENOB_{DC}}}
$$
 (5.15)

$$
FOM2 = \frac{Power}{min(F_s, 2ERBW) \times 2^{ENOB_{DC}}} \times VDD
$$
 (5.16)

where  $F_s$  is the ADC's sampling frequency, ERBW is the ADC's effective resolution bandwidth and ENOB<sub>DC</sub> is the ADC's effective number of bits at low input frequency. The supply voltage VDD is added to complete the FOM consideration [69].

| Design                             | [79] | [80] | [81] | [82] | [83] | $[36]$ | This work |
|------------------------------------|------|------|------|------|------|--------|-----------|
| Technology (nm)                    | 90   | 90   | 130  | 65   | 90   | 90     | 90        |
| Supply $(V)$                       | 0.8  | 1.0  | 1.2  | 1.2  | 1.2  | 1.0    | 1.0       |
| Power $(mW)$                       | 6.5  | 33   | 19.2 | 1.78 | 1.44 | 4.5    | 6         |
| $F_S$ (MHz)                        | 80   | 100  | 60   | 26   | 50   | 100    | 100       |
| $SNDR$ (dB)                        | 55   | 55.3 | 56   | 54.3 | 49.4 | 55     | 58        |
| FOM1(fJ/conv-step)                 | 176  | 694  | 621  | 162  | 119  | 98     | 100       |
| $FOM2(fJ\cdot V/conv\text{-step})$ | 162  | 694  | 497  | 195  | 144  | 98     | 100       |

Table 5.3: 10-bit ADCs comparison

### 5.5 Summary

A 10-bit 100-MS/s two-step ADC fabricated in a 90 nm CMOS technology is presented. It effectively takes the advantage of the nanoscale technology to achieve low-power dissipation. Its internal coarse and fine ADCs are realized with the latch-type comparators whose accuracy are enhanced by proposed offset compensation technique. The gain error and nonlinearity of the open-loop residue amplifier are corrected in the digital domain with the proposed calibration processor. The ADC consumes only 6 mW from a single 1 V supply.

To tolerate the non-idealities of MOS transistors and passive elements, traditional design uses larger size to meet the matching requirement, which is power hungry configuration. In recent years, the digitally calibrated ADCs are more energy-efficient than traditional ADCs. Using digital calibration can effectively suppress the matching requirement. It also improves the ADC operation speed. Except for the power consumption, the design robustness is also maintained for scaled CMOS technologies with low supply voltage. Low supply voltage limits the accurate analog circuits, such as opamp. Digitally enhanced ADC can use simple analog circuit without considering its non-idealities, such as gain error and nonlinearity.

Table 5.3 compares this work with other 10-bit ADCs published in recent years. SAR ADCs are not compared in this table since their main issues, such as input capacitance, reference power consumption, speed limitation, are usually not considered in the FOM computation.



# Chapter 6

## Conclusions and Future Works

#### 6.1 Conclusions

For scaled CMOS technologies, the high-performance analog circuits are difficult to implement with lower power consumption. To adapt for such condition, digitally enhanced analog circuit design techniques are popular to achieve low power consumption and high integration purpose. Actually digital calibration techniques are useful to achieve the SOC integration to benefit the CMOS scaling.

Low power ADC design can be a good example to illustrate this benefit. Considering a two-step ADC architecture, both the residue amplifier and the comparator in coarse and fine ADCs are key analog circuits. By traditional design concepts, the averaging and offset storage techniques are applied to improving the comparator's offset with certain power consumption. In this thesis, an offset compensation technique is proposed to cancel the input offset with ultra low power consumption.

Accurate residue amplifier is more difficult to achieve low power consumption on scaled CMOS technologies with low supply voltage. In this thesis, the residue amplifier can be implemented by a simple open-loop amplifier with non-idealities due to scaled CMOS transistors. Open-loop amplifier can provide better design flexibility for noiselimited ADCs. Its power dissipation can be independent on the sampling capacitance, compared with close-loop amplifiers. Its non-idealities including gain error and nonlinearity are corrected by the proposed calibration processor in the digital domain. The calibration processor can operate in the background without interrupting the ADC normal operation. The ADC performance can be greatly improved by using digital calibration techniques.

A 10-bit 100-MS/s two-step ADC fabricated using a 90 nm CMOS technology has been demonstrated to evaluate the proposed calibration techniques. This ADC uses the distributed input track-and-hold network to eliminate the necessary power consumption for the dedicated THA. It is also useful to extend the input dynamic range to supply voltage (single-ended swing range). With offset-compensated comparators and digitally enhanced residue amplifier, the proposed ADC only consumes a total power of 6 mW at 100 MS/s sampling rate. The digital calibration processor only consumes power of less than 1 mW. The ADC prototype occupies die area of 0.36 mm<sup>2</sup> and its figure of merit is 100 fJ/conversion-step. Compared with traditional two-step ADCs, the power consumption of the proposed ADC is greatly improved.

# 6.2 Recommendations for Future Investigation

This section presents several suggestions for future investigations in low power ADC design.

- The proposed offset compensation for the comparator needs twice comparisons, which still wastes one half power dissipation. Digital estimation [15] is proposed, but it needs more digital circuits to operate. Simpler offset estimation scheme should be invented to save more power consumption and occupy smaller die area for the comparator-based ADC architectures.
- The proposed calibration technique is suitable for the two-step ADC architecture. The sub-DAC error can be ignored because the linearity of the resistor string is wellmaintained. While it is applied to the pipelined ADC architecture, the sub-DAC error can be reduced by capacitor matching with larger capacitance. To mitigate the requirement of capacitor matching, the proposed calibration technique should be modified to be insensitive to sub-DAC error.
- The open-loop amplifier is applied to implementing the residue amplifier. However, it is still similar to an analog circuit with smaller output dynamic range. Although this amplifier is easy to implement even for process migration, this analog circuit still suffer from the output dynamic range, which limits the accuracy requirement for the next-stage sub-ADC. A 'digital-like' residue amplifier should have the features of easier implementation, larger output dynamic range and lower power consumption.
- Considering the SFDR issue for the distributed input track-and-hold (T/H) used in subranging or two-step ADCs, A dedicated THA is better to improve the SFDR performance. However, how to design the THA with low power consumption and large input dynamic range is a hard work. If distributed T/H is applied in the ADC design, a systematic design methodology is necessary since most ADC architectures use this technology to save power consumption and enlarge ADC input dynamic range. range.
- The proposed calibration technique is applied to a 10-bit 100-MS/s ADC. To evaluate its effectiveness, this technique should be further applied to an ADC which has higher resolution ( $\geq 12$  bits) and faster sampling rate ( $\geq 200\text{-MS/s}$ ). For a 12-bit ADC design, it can be implemented by two-step ADC or pipelined ADC architectures. Two-step ADC uses resistor to generate necessary reference voltages. The linearity of resistor string must have 12-bit accuracy. Pipelined ADC uses capacitors to implement, the capacitor matching is also a major concern. Gain and nonlinearity calibration techniques can effectively reduce these requirement.

CHAPTER 6. CONCLUSIONS AND FUTURE WORKS



# Appendix A

## Lyapunov Second Theorem on Stability

For a residue amplifier which has the non-idealities of gain error and nonlinearity, a signal compensator is proposed with two coefficients,  $b_1$  and  $b_3$ . The transfer function of the residue amplifier is represented as

$$
y_d = f(y) = g_3 y + a_3 y^3
$$
\n(A.1)

In above equation, only gain error  $(a_1)$  and 3rd-order nonlinearity  $(a_3)$  are considered. The amplifier's offset can be extracted by the Z-ADC and then removed in the calibration processor. To compensate these non-idealities, correct coefficients are necessary to maintain the overall ADC performance.

The appendix shows how to find the correct coefficients based on the Lyapunov second theorem on stability. To estimate the coefficients in the background, a random signal *q* is applied to the estimated input  $V_{da}$ . For three different values of  $q$ ,  $-1$ , 0 and  $+1$ , the output of the residue amplifier can be represented as

$$
y_{d,0} = a_1 \cdot y + a_3 \cdot y^3
$$
  
\n
$$
y_{d,+1} = a_1 \cdot (y + d) + a_3 \cdot (y + d)^3
$$
  
\n
$$
y_{d,-1} = a_1 \cdot (y - d) + a_3 \cdot (y - d)^3
$$
\n(A.2)

where d is the calibration amount. In this thesis, d is 8. As shown in Figure 4.21, the *y<sup>d</sup>*

data are corrected by  $b_1$  and  $b_3$  to generate corrected data  $y_c$  as below,

$$
y_{c,0} = b_1 \cdot y_{d,0} + b_3 \cdot y_{d,0}^3
$$
  
\n
$$
y_{c,+1} = b_1 \cdot y_{d,+1} + b_3 \cdot y_{d,+1}^3
$$
  
\n
$$
y_{c,-1} = b_1 \cdot y_{d,-1} + b_3 \cdot y_{d,-1}^3
$$
\n(A.3)

The correction makes  $y_{c,0} \approx y$ . Thus, we have

$$
b_1 = \frac{1}{a_1} \qquad b_3 = -\frac{a_3}{a_1^4} \tag{A.4}
$$

We neglect other high-order terms, which are treated as disturbances in the estimation process.

In Figure 4.22, the coefficient estimator collects the  $y_c$  data to extract information. The acquired variables are

$$
M = E[y_c] \approx E[y] \tag{A.5}
$$

$$
S = E[yc2] \approx E[y2] \tag{A.6}
$$

$$
E_1 = H_1 + H_2 = 16
$$
  
=  $E[y_{c,+1}]$  -  $E[y_{c,-1}]$ <sup>1896</sup>  
 $\approx 16(b, a, -1) + 16(b, a, +b, a^3)(3S+64)$  (A.7)

$$
\approx 16(b_1a_1-1) + 16(b_1a_3+b_3a_1^3)(3S+64)
$$

$$
E_2 = H_1 - H_2
$$
  
=  $(E[y_{c,+1}] - E[y_{c,0}]) - (E[y_{c,0}] - E[y_{c,-1}])$  (A.8)  
 $\approx 384M(b_1a_3 + b_3a_1^3)$ 

A positive semi-definite function *L* is defined as

$$
L = \frac{1}{2}E_1^2 + \frac{1}{2}E_2^2
$$
 (A.9)

By the Lyapunov second theorem on stability, if *L* satisfies

$$
L \ge 0 \quad \text{and} \quad \frac{dL}{dt} < 0 \tag{A.10}
$$

*L* is called a Lyapunov function candidate and the system is asymptotically stable. From Equation (A.9), the condition  $L \ge 0$  is always true. Since  $E_1$  and  $E_2$  are functions of  $b_1$ and  $b_3$ , which are varying with time, the condition  $dL/dt < 0$  can be rewritten as

$$
\frac{dL}{dt} = \frac{\partial L}{\partial b_1} \cdot \frac{db_1}{dt} + \frac{\partial L}{\partial b_3} \cdot \frac{db_3}{dt} < 0 \tag{A.11}
$$

One sufficient condition to satisfy the above inequality is

$$
\frac{\partial L}{\partial b_1} \cdot \frac{db_1}{dt} < 0 \quad \text{and} \quad \frac{\partial L}{\partial b_3} \cdot \frac{db_3}{dt} < 0 \tag{A.12}
$$

For a discrete-time system with slow varying  $b_1$  and  $b_3$ , the above equations become

$$
\frac{\partial L}{\partial b_1} \cdot (b_1[k+1] - b_1[k]) = -\alpha_1 < 0 \tag{A.13}
$$

$$
\frac{\partial L}{\partial b_3} \cdot (b_3[k+1] - b_3[k]) = -\alpha_3 < 0 \tag{A.14}
$$

where  $\alpha_1$  and  $\alpha_3$  are two positive variables. Thus, the difference equation for  $b_1$  estimation can be expressed as

$$
b_1[k+1] = b_1[k] - \alpha_1 \left(\frac{\partial L}{\partial b_1}\right)^{-1}
$$
  
=  $b_1[k] - \left[\alpha_1 \left(\frac{\partial L}{\partial b_1}\right)^{-2}\right] \times \frac{\partial L}{\partial b_1}$  (A.15)

Since  $\alpha_1(\partial L/\partial b_1)^{-2}$  is always positive, the above equation can be represented by

$$
b_1[k+1] = b_1[k] - \mu_1^2 \left(\frac{\partial L}{\partial b_1}\right) \tag{A.16}
$$

where  $\mu_1$  is a positive constant, denoted as the updating factor for  $b_1$ . With similar procedure, the estimation equation for  $b_3$  can also be obtained as

$$
b_3[k+1] = b_3[k] - \mu_3 \times \left(\frac{\partial L}{\partial b_3}\right) \tag{A.17}
$$

where  $\mu_3$  is a positive constant, denoted as the updating factor for  $b_3$ . From Equation (A.9),  $\partial L/\partial b_1$  and  $\partial L/\partial b_3$  are

$$
\frac{\partial L}{\partial b_1} = E_1 \cdot \frac{\partial E_1}{\partial b_1} + E_2 \cdot \frac{\partial E_2}{\partial b_1}
$$
 (A.18)

$$
\frac{\partial L}{\partial b_1} = E_1 \cdot \frac{\partial E_1}{\partial b_3} + E_2 \cdot \frac{\partial E_2}{\partial b_3}
$$
 (A.19)

Applying Equation (A.4), Equation (A.7) and Equation (A.8), the above equations can be rewritten as

$$
\frac{\partial L}{\partial b_1} = \frac{16}{b_1^4} \times \left[ E_1 \left( b_1^3 - (3S + 64)b_3 \right) - E_2(24Mb_3) \right] \tag{A.20}
$$

$$
\frac{\partial L}{\partial b_3} = \frac{16}{b_1^3} \times [E_1(3S + 64) + E_2(24M)] \tag{A.21}
$$

Equation (5.8) and Equation (5.9) shown in Section 5.3.5 are the simplified implementations of Equation (A.16) and Equation (A.17) respectively. The  $Z_1$  and  $Z_3$  variables defined in Equation (5.10) and Equation (5.11) are the simplified  $\partial L/\partial b_1$  and  $\partial L/\partial b_3$ shown in Equation (A.20) and Equation (A.21). The  $16/b_1^4$  term in Equation (A.20) is neglected in calculating  $Z_1$  since the term is always positive and thus does not affect the final value of sgn( $Z_1$ ). The  $16/b_1^3$  term in Equation (A.21) is also neglected in calculating  $Z_3$ . Therefore, these two updating parameters  $Z_1$  and  $Z_3$  are calculated as below equations,

$$
Z_1 = E_1 \left( b_1^3[k] - (3S + 64)b_3[k] \right) - E_2(24Mb_3[k]) \tag{A.22}
$$

$$
Z_3 = E_1(3S + 64) + E_2(24M)
$$
 (A.23)



# Appendix B

## Comparator Modeling

Comparator is widely applied in all types of ADCs, its comparison speed is important to describe analytically. But comparator is not easy to be modeled due to the circuit behavior is not biased on one operating point, similar to other analog circuits. For a simple latch using cross-coupled inverters (or named back-to-back inverters), the delay analysis is given in [84]. This analysis gives us a simple picture to show how process-dependent a latch is. About the comparators used in ADCs, the above analysis may be not enough to describe the timing delay more accurately. Even some analytical models are provided [85, 61, 86, 87], several assumptions are necessary given to achieve simple analytic models. However, modeling of comparator is still necessary to provide designers a simple design concept about its parameters to affect the performance, such as comparison speed, input offset, input-referred noise, et al.

### B.1 Comparison Speed

Here the comparison speed is referred to the description in [85]. For latch-type comparator, after clock goes high, the comparator will have two operation phases: discharging phase and regenerative phase. Referred to the proposed latch-type comparator, shown in Figure B.1, we consider the condition that input and reference signals are very close. The comparison time *tcmp* can be defined by two delays:

$$
t_{cmp} = t_o + t_{latch}
$$
 (B.1)



where  $t_o$  and  $t_{latch}$  means the delay during discharging phase and regenerative phase respectively. Discharging phase means the transistors M13 and M14 are not turned on. Regenerative phase means the cross-couple inverters are activated to do the regeneration.

During discharging phase, the output nodes *Vop* and *Von* will discharge the capacitive loading  $C_L$  from VDD to VDD- $V_{thp}$ . The discharge delay  $t_o$  is given by

$$
t_o = \frac{C_L V_{thp}}{I_p} \tag{B.2}
$$

where  $C_L$  is the output loading on nodes  $V_{op}$  or  $V_{on}$ , and  $I_p$  is the larger discharge current, compared with another discharge current  $I_n$  While node voltages  $V_{op}$  and  $V_{on}$  are below VDD-*Vthp*, M13 and M14 turn on, and the comparator enters the regenerative phase.

The regenerative delay *tlatch* is determined by two corss-coupled inverters, which are constructed by four transistors M13-M16. The delay can be given by

$$
t_{latch} = \frac{C_L}{g_{m,eff}} ln\left(2\frac{\Delta V_{out}}{V_o}\right)
$$
 (B.3)

where  $g_{m,eff}$  is the effective trans-conductance of the cross-coupled inverters. This delay depends (in a logarithmic manner) on the initial difference  $V_o$  between the outputs  $V_{op}$  and  $V_{on}$  at the beginning of regenerative phase (i.e.  $t = t_o$ ). Based on Equation (B.2),  $V_o$  can be calculated by

$$
V_o = V_{op}(t = t_o) - V_{on}(t = t_o)
$$
 (B.4)

$$
= V_{thp} - \frac{I_n t_o}{C_L} \tag{B.5}
$$

$$
= V_{thp} \left( 1 - \frac{I_n}{I_p} \right) \tag{B.6}
$$

The current difference  $\Delta I_o = I_p - I_n$  between two branches is much smaller than  $I_p$  or  $I_n$ . Therefore, we can approximate that  $I_p \approx I_n \approx I_o/2$ *, and* $V_o$  can be expressed as

$$
V_o = 2V_{thp} \frac{\Delta I_o}{I_o} \tag{B.7}
$$

The current  $I_0$  is the summation of three tail currents,  $I_{M4}$ ,  $I_{M8}$  and  $I_{M12}$ , controlled by *Vrp*, *Vrn* and *Vcm*. Assuming M4, M8 and M12 in the triode region and M1, M2, M5, M6, M9 and M10 are in the saturation region,  $I_{M4}$ ,  $I_{M8}$  and  $I_{M12}$  can be approximated as

$$
I_{M4} \approx 2\beta \left( V_{r_p} \left( V_{\text{thin}} \right)^2 \left( 1 - \frac{0.75}{\sqrt{1 + \frac{VD - V_{\text{thin}}}{V_{r_p} - V_{\text{thin}}}}} \right)^2 \right) \tag{B.8}
$$

$$
I_{M8} \approx 2\beta (V_{rn} - V_{thin})^2 \left(1 - \frac{0.75}{1 + \frac{VDD - V_{thin}}{V_{rn} - V_{thin}}}\right)^2
$$
 (B.9)

$$
I_{M12} \approx 2\beta (V_{cm} - V_{thn})^2 \left(1 - \frac{0.75}{1 + \frac{VDD - V_{thn}}{V_{cm} - V_{thn}}}\right)^2
$$
 (B.10)

In above three equations, all trans-conductance parameters  $\beta$  and all threshold voltages have been assumed to be equal. Moreover, for simplicity, these three above equations can be assumed to be equal to  $I_{M4}$  if the central comparator is considered. Furthermore,  $I_{M4}$ can be *Io/*2 if the offset issue is ignored. This may cause the deviation of the comparison time for different references, but the timing deviation should be smaller in the design margin.

By the source-couple pair operation, *∆I<sup>o</sup>* can be expressed as

$$
\Delta I_o = \sqrt{\beta I_o} V_{ida} \tag{B.11}
$$

where *β* refers to M1 and M2. *Vida* means the voltage difference between the input nodes of the differential pair (M1-M4). Actually it can be viewed as the resolution requirement of this comparator. With the current difference from Equation (B.11), the initial voltage difference of Equation (B.7) becomes

$$
V_o = V_{thp} \sqrt{\frac{4\beta}{I_o}} V_{ida}
$$
 (B.12)

Therefore the regenerative delay *tlatch* can be expressed as

$$
t_{latch} = \frac{C_L}{g_{m,eff}} ln\left(\frac{1}{V_{thp}}\sqrt{\frac{I_o}{\beta}} \frac{\Delta V_{out}}{V_{ida}}\right)
$$
 (B.13)

With Equation (B.2) and Equation (B.13), the total comparison time  $t_{cmp}$  can be expressed as

$$
t_{cmp} = \frac{4C_L V_{thp}}{3I_o} + \frac{C_L}{g_{meff}} ln \left( \frac{1}{V_{thp}} \sqrt{\frac{I_o}{\beta}} \frac{\Delta V_{out}}{V_{ida}} \right)
$$
 (B.14)

where  $I_o = I_{M4} + I_{M8}$ . Considering the effect of the serial-connected nMOS transistors M3 and M7, the output current  $I_o$  can be modified to be  $I_o = \alpha \cdot (I_{M4} + I_{M8})$ ,  $\alpha < 0.5$ . The actual value of  $\alpha$  depends on the bias voltage of  $V_{bx}$ . Instead of the output swing *∆V out*, mostly VDD*/*2 can be a better approximation. The output swing can be replaced  $b_y \Delta V_{out} = \text{VDD} - V_{thp} - \text{VDD}/2 = \text{VDD}/2 - V_{thp}$ .

The propagation delay  $t_{prop}$  from output nodes  $V_{op}$  and  $V_{on}$  to the backend latch or flipflop are necessary to be considered,  $t_{prop} = K \cdot t_{gate}$ . The total comparison time can be represented as

$$
t_{cmp} = \frac{4C_L V_{thp}}{3I_o} + \frac{C_L}{g_{m,eff}} ln\left(\frac{1}{V_{thp}} \sqrt{\frac{I_o}{\beta}} \frac{VDD/2 - V_{thp}}{V_{ida}}\right) + K \cdot t_{gate}
$$
 (B.15)

where  $t_{gate}$  is the intrinsic gate delay and  $K$  represents the propagation delay by the unit of *tgate*.

According to Equation (B.15),  $I<sub>o</sub>$  determines the discharging time if  $C<sub>L</sub>$  is designed for the minimum noise requirement and  $V_{thp}$  is the process parameter. To reduce the regenerative time, *gm,eff* can be larger if the wire loading dominants the capacitance of  $C_L$ .



Figure B.2: Comparator schematic for noise analysis.



Figure B.3: Large signal transient plot for three operating phase.

### B.2 Input-Referred Noise

To make sure enough resolution provided by the comparator, the input-referred noise must be evaluated at the design stage. Figure B.2 shows the comparator schematic for inputreferred noise analysis. This comparator circuitry is simplified to ignore all pull-high pMOS transistors and bias- controlled nMOS transistor for analysis convenience. Loading capacitors *C<sup>O</sup>* and *C<sup>X</sup>* are added to model the wire loading and parasitic capacitance of the transistors for output and X nodes respectively.

For the comparator operation, after the clock signal  $\phi_c$  goes high, these transistors operate in different regions: saturation, triode or off regions. According to different operating regions of these transistors, there are three phases defined to complete the analysis. To distinguish them by the timing intervals is not reasonable because the modeling of nanoscaled CMOS transistors are blurred at the boundaries between these regions. However, to provide simple analytical functions to describe the input-referred noise, this simplification can be a good candidate. In this section, the noise analysis is based on the stochastic differential equations to describe the equivalent input referred noise [61].

Figure B.3 shows three operation phases after the clock signal goes high. In these phases, the convention  $A_{i,j}$  denotes the parameter  $A$  of the transistor Mi in phase  $j$ .  $V_{CM}$ is the input common-mode voltage.  $V_{TPi}$  and  $V_{TNj}$  is the threshold voltage for pMOS transistor Mi and nMOS transistor Mj, respectively. For further simplification, we assume that all pMOS transistors have the same threshold voltage and all nMOS transistors also have the same threshold voltage, the bulk effect is neglected.

Phase 1 is defined as the time interval  $(t_1 - t_0)$  during which the clock signal goes high and only transistors M0, M1 and M2 operate at saturation region, as the following relation:

$$
v_X \ge \text{VDD} - V_{TN3} \tag{B.16}
$$

The initial instant  $t_0$  is defined as the time that M1 and M2 start conducting in saturation; and the final instant  $t_1$  occurs when  $v_X$  (can be represented as  $v_{xp}$  and  $v_{xn}$ ) discharge from VDD to VDD  $-V_{TN3}$  to turn on M3 and M4. The voltage change on nodes X after the first phase is approximated as

$$
\Delta v_X = v_X(t_0) - v_X(t_1)
$$
 (B.17)

$$
\approx V_{TN3} \tag{B.18}
$$

If the comparator input voltages are close to each other (only such critical condition is considered), the comparator is assumed to be balanced for noise analysis. At  $t_1$ , the variance of both nodes can be calculated to reach the following values:

$$
E[v_{dO}^2(t_1)] = \frac{2kT}{C_O}
$$
 (B.19)

$$
E[v_{dX}^{2}(t_{1})] = \frac{2kT}{C_{X}} + \frac{4kT\gamma g_{m1,1}t_{1}}{C_{X}^{2}}
$$
(B.20)

where  $\gamma$  is the noise factor of the transistor, its value can be 2/3 for long-channel devices or much higher value (e.g. 4) for nanoscaled CMOS transistors. The noise variance  $E[v_{dX}^2(t_1)]$  is the initial condition for output noise in phase 2.

Phase 2 is the time interval  $(t_2 - t_1)$  during which M1-M4 are all in the saturation **X** 1896 region and can be defined by

$$
v_X \ge V_{CM} - V_{TN1}
$$
 (B.21)

The above equation defines the final instant  $t_2$ , which is corresponding to M1 and M2 going out of saturation. The voltage change on nodes X in phase 2 is approximated by

$$
\Delta v_{X,2} = v_X(t_1) - v_X(t_2) \tag{B.22}
$$

$$
\approx \text{VDD} - V_{CM} \tag{B.23}
$$

At phase 2, a positive feedback starts to operate due to M3 and M4 conduction. The variance at output nodes is expressed as

$$
E[v_{dO}^{2}(t_2)] = \frac{C_X^2}{(C_O - C_X)^2} \frac{(t_2 - t_1)^2}{\tau_2^2} E[v_{dX}^{2}(t_1)] + \frac{(C_O - C_X + C_X \frac{t_2 - t_1}{\tau_2})^2}{(C_O - C_X)^2} E[v_{dO}^{2}(t_1)] + \frac{4kT\gamma g_{m3,2}(t_2 - t_1)}{C_O^2}
$$
 (B.24)

where  $\tau_2$  is the time constant for this positive feedback loop, and is defined as

$$
\tau_2 = \frac{C_X C_O}{g_{m3,2}(C_O - C_X)}\tag{B.25}
$$

Actually, the time interval  $(t_2 - t_1)$  is much smaller than the time constant  $\tau_2$ , Equation (B.24) can be further simplified.

Phase 3 is defined as the time interval  $(t_3 - t_2)$  during which only the cross-coupled inverters are active since the influence of the input pair can be neglected. This is because the nodes X are almost grounded to isolate the differential input pair.

$$
v_X \approx 0 \tag{B.26}
$$

The final instant  $t_3$  can be thought as the end of the exponential regenerative phase of the latch, when some transistors in the cross-coupled inverters enter triode and then off regions. **AMILIA** 

$$
E[v_{dO}^{2}(t_3)] = \left(\frac{2kT\gamma}{C_0} + E[v_{dO}^{2}(t_2)]\right) e^{\frac{2(t_3 - t_2)}{t_3}}
$$
(B.27)

where  $\tau_3$  is the regeneration time constant,  $\frac{1}{1896}$ 

$$
\tau_3 = \frac{C_0}{g_{m3,3} + g_{m5,3}}
$$
 (B.28)

By merging Equation (B.19), Equation (B.20) and Equation (B.24) into Equation (B.27), the total output variance at  $t_3$  can be obtained including contributions from all devices during all phases. To generate the input-referred noise, the global input-output equivalent gain *Geq* must be generated by

$$
G_{eq} = -\frac{g_{m1,1}t_{1}g_{m3,2}(t_{2}-t_{1})}{C_{X}C_{O}}e^{\frac{(t_{3}-t_{2})}{\tau_{3}}}
$$
(B.29)

With input-output equivalent gain  $G_{eq}$ , the input-referred noise  $\sigma_n^2$  can be represented as

$$
\sigma_n^2 = \frac{E[v_{d0}^2(t_3)]}{G_{eq}^2}
$$
 (B.30)

Considering Equation (B.30) with transistors parameters during three phases: the noise from the input pair transistors M1-M2 during phase 1 and 2; the noise from crosscoupled inverters transistors M3-M6 in phase 2 (M3-M4) and phase 3 (M3-M6); the noise
#### B.2. INPUT-REFERRED NOISE 163

sampled on the output and X nodes by the switches S1-S2 and S3-S4, respectively, at the onset of phase 1. The input-referred noise can be represented as

$$
\sigma_n^2 = \sigma_{M1}^2 + \sigma_{S1}^2 + \sigma_{M3-M5}^2 + \sigma_{S3}^2
$$
 (B.31)

The first term in Equation (B.31) denotes the contributions of M1 and M2 during phase 1,

$$
\sigma_{M1}^2 = \frac{4kT\gamma}{g_{m1,1}t_1}
$$
 (B.32)

The second term in Equation (B.31) denotes the contributions of S1 and S2 (M1 and M2 act as the switch, on the *CO*),

$$
\sigma_{S1}^2 = \frac{2kTC_X^2}{g_{m1,1}^2 t_1^2 C_O} + \frac{4kTC_X^2}{g_{m1,1}^2 t_1^2 C_O g_{m3,2}(t_2 - t_1)} + \frac{2kTC_X^2 C_O}{g_{m1,1}^2 t_1^2 C_O g_{m3,2}^2(t_2 - t_1)^2}
$$
(B.33)

The third term in Equation (B.31) denotes the contributions of M3-M6 during phase 2 and phase 3, **WILLET** 

$$
\sigma_{M3-M5}^2 = \frac{4kT\gamma C_X^2}{g_{m1,1}^2 t_1^2 g_{m3}^2 (t_2 - t_1)} \frac{2kT\gamma C_X^2 C_O}{g_{m1}^2 t_1^2 C_O g_{m3,2}^2 (t_2 - t_1)^2}
$$
(B.34)

The last term in Equation (B.31) denotes the contributions of S3 and S4 (M3 and M4 act WITHIN as the switch, on the  $C_X$ ),

$$
\sigma_{S3}^2 = \frac{2kTC_X}{g_{m1,1}^2 t_1^2}
$$
 (B.35)

Rearranging above four terms, by the following substitutions

$$
g_{m1,1}(t_1 - t_0) = \frac{2C_X \Delta v_{X,1}}{V_{ov1,1}}
$$
 (B.36)

$$
g_{m3,2}(t_2 - t_1) = \frac{2I_{D3,2}C_X \Delta v_{X,2}}{V_{ov3,2}(I_{D1,2} - I_{D3,2})}
$$
(B.37)

These four terms can be represented as follow,

$$
\sigma_{M1}^{2} = \frac{2kT\gamma}{C_{X}F}
$$
\n
$$
\sigma_{S1}^{2} = \frac{kT}{2C_{0}F^{2}} + \frac{kT}{2C_{X}F^{2}H} + \frac{kTC_{0}}{8C_{X}^{2}F^{2}H^{2}}
$$
\n
$$
\sigma_{M3-M5}^{2} = \frac{kT\gamma}{2C_{X}F^{2}H} + \frac{kT\gamma C_{0}}{8C_{X}^{2}F^{2}H^{2}}
$$
\n
$$
\sigma_{S3}^{2} = \frac{kT}{2C_{X}F^{2}}
$$
\n(B.38)

Table B.1: Comparison between predicted, simulated and measured input-referred noise *σn*

| Design          | $\sigma_{ean}$                  | $\sigma_{sim}$ | $\sigma_{meas}$  |
|-----------------|---------------------------------|----------------|------------------|
|                 | $0.18 \mu m$   0.93 mV   0.8 mV |                | $1.2 \text{ mV}$ |
| $90 \text{ nm}$ | $1.18$ mV                       | $1 \text{ mV}$ | $1.1 \text{ mV}$ |

where two factors *F* and *H* are defined as below,

$$
F = \frac{V_{TN3}}{V_{ov1,1}}
$$
 (B.39)

$$
H = \frac{\text{VDD} - V_{CM}}{V_{ov3,2}} \frac{I_{D3,2}}{I_{D1,2} - I_{D3,2}} \tag{B.40}
$$

With Equation (B.38), we can conclude that the input-referred noise caused by each device can be the usual format of *kT/C* with additional factors. To reduce the inputreferred noise, either increasing the capacitance (of  $C_X$  and  $C_O$ ) or making the factors *F* and *H* as high as possible. The factor *F*, which exists in all noise terms, is easier to adjust the overall noise. By Equation (B.39), smaller  $V_{ov1,1}$  can get higher value of *F*. It concludes that low input common-mode voltage is better to get lower input-referred noise. However, considering the body effect,  $V_{TN3}$  is also higher for high common-mode voltage to increase F but not effective as  $V_{ov1,1}$ . Another way to get higher F is to lower the discharging current or increase the size of transistors M1 and M2. The factor *H* exists only in some terms in Equation (B.38) during phase 2. It indicates that adjusting transistors M1 and M2 is better than M3 and M4 to get effectively low input-referred noise.

From above descriptions, we may conclude that the lower input-referred noise can be achieved by three design concepts for the comparator:

- (1) Lower input common-mode voltage.
- (2) Lower discharging current.
- (3) Larger transistor size for the input differential pair.

To evaluate the accuracy of Equation (B.38), a comparison table given in [61]. Table B.1 shows the comparison results between predicted, simulated and measured input noise *σ*. Compared with the measured result, the predicted data (*σeqn*) has 25 percent and 10 percent deviation for 0.18 *µ*m and 90 nm CMOS technologies respectively.



An input-referred noise estimation from Equation (B.31) is shown in Figure B.4, with the condition of  $C_0 = 3$  fF and  $C_X = 2$  fF. These two estimations are dependent on *F* and *H* respectively. The result shows the effect of *H* is not obvious, which the resultant inputreferred noise voltage is between 1.2 mV and 1.6 mV. However, smaller *F* can result in larger noise voltage, which can be up to 6 mV. That means  $V_{ov1,1}$  should be as small as possible. For low resolution comparators, the input-referred noise may not be an issue. But for higher resolution requirement, such as the comparator in SAR ADC, this noise can lower down the ADC's SNR. The simplest solution is to enlarge the capacitance of *C<sup>o</sup>* and  $C_x$ , but it also slow down the comparison speed. To mitigate the input-referred noise of the latch comparator, the proposed  $V_{bx}$ -controlled nMOS transistor can reduce  $V_{ov1,1}$  to get larger value of *F*.



## Appendix C

### Design Considerations

#### C.1 Sampling Clock Jitter

Considering a N-bit ADC with the differential inputs  $V_{ip}$  and  $V_{in}$ ,

$$
V_{ip} = A \cdot \sin(2\pi f_{in}t)
$$
 (C.1)

$$
V_{in} = \frac{1}{A \cdot \sin(2\pi f_{in}t)}
$$
 (C.2)

where 4*A* is the differential peak-peak input range and *fin* is the input frequency. The *VLSB* is then defined as

$$
V_{LSB} = \frac{4A}{2^N} = 2^{-N+2} \cdot A \tag{C.3}
$$

For the jitter-induced error, we can differentiate  $V_{ip}$  and  $V_{in}$  w.r.t. time *t* to get their maximum voltage deviations,  $\Delta V_p$  and  $\Delta V_n$ .

$$
\Delta V_p = A \cdot 2\pi f_{in} \times \Delta t \tag{C.4}
$$

$$
\Delta V_n = -A \cdot 2\pi f_{in} \times \Delta t \tag{C.5}
$$

where *∆t* is the jitter-induced random variable, *∆V<sup>p</sup>* and *∆V<sup>n</sup>* are also random variables due to clock jitter. Here we assume the clock jitter for  $V_{ip}$  and  $V_{in}$  are the same, By differential operation, the input deviation  $\Delta V$  is  $\Delta V_p - \Delta V_n$ . The standard deviation of the differential voltage  $\sigma(\Delta V)$  is defined as

$$
\sigma(\Delta V) = (4\pi f_{in} A) \times \sigma_t \tag{C.6}
$$

To reduce the error due to clock jitter, *∆V* must be less than *VLSB/*2. That means the standard deviation of *∆V* must less than *VLSB/*6 for 3*σ* confidential level.

$$
\sigma(\Delta V) < \frac{V_{LSB}}{6} \tag{C.7}
$$

With Equation (C.3), Equation (C.6) and Equation (C.7), the clock jitter specification can be represented as

$$
\sigma_t < \frac{1}{6} \cdot 2^{-N} \cdot (\pi f_{in})^{-1} \tag{C.8}
$$

where  $\sigma_t$  is the standard deviation of the clock jitter.

For example, for a 6-bit 2 GS/s ADC,  $\sigma_t$  is less than 830 fs at 1 GHz input frequency. For a 10-bit 200 MS/s ADC,  $\sigma_t$  is less than 520 fs at 100 MHz input frequency. For a 12-bit 100 MS/s ADC,  $\sigma_t$  is less than 260 fs at 50 MHz input frequency. In general, for higher resolution and higher sampling rate ADC, the clock jitter usually dominates ADC's dynamic performance.

### C.2 Distributed Input Track-and-Hold

For a nanoscaled CMOS ADC, with the digital enhancement, its power consumption is greatly reduced. But if the dedicated track-and-hold amplifier (THA) is applied, its power consumption will dominate the overall ADC power dissipation. Except that, the input dynamic range is also limited by the output range of the operation amplifier. To avoid above two issues, the distributed T/H is widely used in low power ADCs.

For subranging or two-step ADCs, even passive T/H does not consume static power, it basically has two issues to be considered: one is the sampling clock quality and the other is the variant group delay for all samplers. Considering the sampling clock quality for these ADC architectures, the resolution requirement ( $\leq 6$  bits) is easy to achieve. The sampling clock signal for distributed T/H has two concerns: skew and jitter. The clock skew can be suppressed by using tree-like network to control the sampling switches. The clock jitter is necessary to be concerned since more clock buffers will introduce extra amount of jitter. For resolution and input frequency requirements, the extra clock jitter introduced by clock buffers should be carefully considered, as mentioned in Appendix C.1.



Figure C.1: Distributed input track-and-hold network, (a) queuing network and (b) tree network.

For nanoscaled CMOS technologies, the wire loading dominates the overall timing delay. This cause the input trace necessary to be concerned in the distributed T/H network. Figure C.1 shows two different input signal traces, one is queuing network and the other is tree network. Here we assume all samplers have same switches and capacitors. Queuing network input trace is easy to implement with minimum wire loading. However, this trace causes variant group delay for every sampler in the sub-ADC. That means the sampled input voltages have larger deviations among  $V_{1R}$  and  $V_{1C}[i]$ , for  $i = 1 \cdots 32$ . For overall ADC linearity, variant group delay introduces severe distortion at the ADC output.

Different from Figure C.1(a), Figure C.1(b) represents another input trace, tree network. We can define levels of tree network as the number of common nodes the input signal pass through. For example, in Figure C.1(b), it is a 2-level tree network. More levels the tree has, more balanced the sampled input signal are. The sampled input deviation among  $V_{1R}$  and  $V_{1C}[i]$ , for  $i = 1 \cdots 32$  can be greatly reduced. Although this will introduce extra wire resistance and loading capacitance, it still has smaller group delay for all samplers in the sub-ADC.

For high-resolution ADC with high frequency input signal, the input traces for residue

amplifier and sub-ADC must be symmetrically placed to maintain the signal quality after sampled into the distributed T/H. If the samplers in sub-ADC is different from that in residue amplifier, the sampling switches and capacitors must be considered to keep the balanced RC delay. A comparison was made by both queuing and 2-level tree networks, the former will induce over 12 dB SFDR loss for 10-bit ADC operating at 100 MS/s and 10 MHz input frequency.



# Appendix D

### Nanometer CMOS Characteristics

For nanometer CMOS technologies, the characteristics of the transistors are worse for analog circuitry, for example, lower intrinsic gain, lower supply voltage, and gate leakage, et al. Here two major issues are discussed: one is the low supply voltage and the other is the gate leakage. Low supply voltage brings circuit designers more difficult to design analog circuit to achieve the target performance with low power consumption. Gate leakage brings the analog circuits potential problems which may cause severe function fail if it is not considered at the design stage.

#### D.1 Supply Voltage

The decreasing supply voltage has become a major issue to design analog circuits under CMOS evolution. Both the signal-to-noise ratio, SNR, or signal-to-noise-and-distortion ratio, SINAD, are greatly reduced due to the decreasing supply voltage. Although the supply voltage has dropped from 2.5 V to 1.0 V, most analog circuits can still be welldesigned to maintain their performance.

For traditional design, the output swing can be VDD- $2V_{dsat}$ ,  $V_{dsat}$  is the voltage to maintain the device under saturation region.  $V_{dsat}$  can be 200 mV for high supply voltage, and 100 mV for low supply voltage. For a 2.5 V supply voltage (VDD is 2.5 V), the output swing is 2.1 V. For a 1 V supply voltage, the output swing is lower down to 0.8 V. It means that the signal power is only one-seventh compared with 2.5 V design. The equivalent noise power must be lower down to maintain the same SNR and SINAD.

Considering a 12-bit ADC design, the supply voltage is changed from 2.5V to 0.5V due to CMOS scaling. For a linear amplifier, its output stage can be shown in Figure D.1.

The sampling capacitor  $C_s$  is connected to the output node  $V_o$ . The output signal swing is defined as VDD –  $2V_{dsat}$ . By differential operation, the signal power ( $\sigma_S^2$  $\binom{2}{S}$  and noise power  $(\sigma_n^2)$  are defined as

$$
\sigma_S^2 = (\text{VDD} - 2V_{dsat})^2 / 2 \tag{D.1}
$$

$$
\sigma_n^2 = 2kT/C_s \tag{D.2}
$$

According to SNR definition,

$$
SNR = \sigma_S^2/\sigma_N^2 = C_s \cdot (VDD - 2V_{dsat})^2/4kT
$$
 (D.3)

the sampling energy  $E_s$  can be represented as

$$
E_s = \frac{4 \cdot \text{SNR} \cdot (k \cdot \text{T})}{\left(\frac{1}{2} \cdot \frac{2 \cdot (V_{dsat}/\text{VDD})}{2}\right)^2}
$$
(D.4)

Figure D.2 shows how supply voltage affects  $C_s$  and  $E_s$ . To maintain enough SNR, the sampling capacitance will be greatly increased. The sampling energy also increases as the supply voltage decreases. In general, the overall ADC energy *E* is proportional to *E<sup>s</sup>* ,  $E = K \cdot E_s$ . The ratio *K* is dependent on the ADC architecture and circuit complexity. The result shows us how low supply voltage deteriorates the low power ADC design trend.

For low power ADC design, this situation is more severe. How to design analog circuits with less than 1 V supply voltage to maintain certain SNR or SINAD is a great challenge. Several possible solutions are described as follow.

One trend is to benefit the CMOS evolution, using comparator-based architectures to implement. For example, the SAR ADC becomes a popular architecture to achieve a low power ADC.

Another trend is to modify the amplifier circuits to provide high output swing with lower power consumption. For example, using a low gain amplifier with correlated level shifting to keep large enough signal swing [88].

Digital calibration techniques are also powerful to improve the ADC performance under low supply voltage. For example, the pipelined ADCs use amplifiers with nonidealities to construct the residue amplifier. To improve these non-idealities, the digital calibration processor is applied to maintain both low supply voltage and low power consumption.

However, to face the coming ultra low supply voltage request (less than 0.8 V), what is the best solution?

#### D.2 Gate Leakage

Gate leakage is another phenomena caused by the CMOS evolution. The gate current, due to direct tuning through the thin gate oxide, depends mainly on gate-source voltage  $(V_{gs})$ and gate area (*W L*). Figure D.3 shows a simplified MOS transistor acts as a sampling capacitor  $(C_G)$  with a gate current  $(i_G)$  or a tunnel conductance  $(g_{tunnel})$ . If the gate current is small enough to be neglected, the gate node of MOS transistor is capacitive, otherwise it is resistive. To determine the gate node of MOS transistor is capacitive or resistive, a factor  $f_{gate}$  is defined as [89]

$$
f_{gate} = \frac{g_{tunnel}}{2\pi C_G}
$$
  
\n
$$
\approx 1.5 \cdot 10^{16} \cdot \frac{10^{2} \cdot 18 \cdot e^{t_{ox}(v_{GS} - 13.6)} (nMOST)}{c_{GS} \cdot e^{t_{ox}(v_{GS} - 13.6)} (pMOST)}
$$
 (D.5)  
\n
$$
\approx 0.5 \cdot 10^{16} \cdot \frac{10^{2} \cdot 18 \cdot e^{t_{ox}(v_{GS} - 13.6)} (pMOST)}
$$

where  $t_{ox}$  is in [nm] and  $v_{GS}$  is in [V].  $f_{gate}$  can be treated as a process- and  $v_{GS}$ - dependent gate-leakage parameter. pMOS transistor can have smaller *fgate* than nMOS transistor. If signal frequencies are higher than this  $f_{gate}$ , the input impedance is mainly capacitive. Otherwise, the input impedence is mainly resistive. For  $0.18 \mu m$  CMOS, the gate node is capacitive for input frequencies higher than 0.1 Hz. There is no gate leakage problem unless the input frequencies are very lower than 0.1 Hz. However, for 65 nm CMOS, the gate node is resistive if the input frequencies lower than 1 MHz. The gate leakage problem becomes a severe issue for nanoscaled CMOS technologies. Therefore, high impedance nodes must be carefully reviewed to avoid low-frequency operation.

In ADC designs, MOS transistors are usually applied as capacitors to store charge. Gate leakage actually causes a nonzero droop rate of the voltage across MOS capacitor, shown in Figure D.3. In [89], the droop rate of the MOS capacitance is given by

$$
\frac{dV_h}{dt} \approx -f_{gate} \tag{D.6}
$$





 $0.1<sup>L</sup>_{0}$ 

 $0.5$ 

 $1.5$ 

**Supply Voltage (V)** 

1

 $2.5$ 

3

 $\overline{2}$ 

 $0.01<sup>L</sup>$ 

 $0.5$ 

 $\overline{\mathbf{1}}$ 

 $1.5$ 

**Supply Voltage (V)** 

 $\overline{2}$ 

 $2.5$ 

3



Figure D.3: Gate leakage on a high impedance node for (a) nMOS gate and (b) its equivalent circuit.

According to Equation (D.6), for track-and-hold circuits, the maximum hold time is given by

$$
\Delta t \approx \frac{\Delta V}{f_{gate}} \tag{D.7}
$$

For example, *∆V* is 1 mV, the maximum hold time is only nano second range for 65 nm CMOS. That means thin-oxide MOS transistors are not suitable to be capacitors for low and medium sampling rate ADC. Only thick-oxide transistors or metal-metal capacitors can be applied to avoid the gate leakage issue, but the area efficiency is lowered. For popular SAR ADC architectures, the gate leakage is a severe issue for higher resolution requirements.



### Bibliography

- [1] D. Buss, B. L. Evans, J. Bellay, W. Krenik, B. Haroun, D. Leipold, K. Maggio, and J.-Y. Y. anf Ted Morse, "SOC CMOS Technology for Personal Internet Products," *IEEE Trans. Electron Devices*, vol. 50, no. 3, pp. 546–556, Mar. 2003.
- [2] W. W. Yang, D. Kelly, I. Mehr, M. T. Sayuk, and L. Singer, "A 3-V 340-mW 14-b 75- Msample/s CMOS ADC With 85-dB SFDR at Nyquist Input," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1931–1936, Dec. 2001.
- [3] S. H. Lewis and P. R. Gray, "A Pipelined 5-Msample/s 9-bit Analog-to-Digital Converter," *IEEE J. Solid-State Circuits*, vol. SC-22, no. 6, pp. 954–961, Dec. 1987.
- [4] H.-C. Liu, Z.-M. Lee, and J.-T. Wu, "A 15-b 40-MS/s CMOS Pipelined Analog-to-Digital Converter With Digital Background Calibration," *IEEE J. Solid-State Circuits*, vol. 40, no. 5, pp. 1047–1056, May 2005.
- [5] C.-C. Hsu and J.-T. Wu, "A CMOS 33-mW 100-MHz 80-dB SFDR sample-and-hold amplifier," *IEICE Transactions on Electronics*, vol. E86-C, no. 10, pp. 2122–2128, Oct. 2003.
- [6] K. Kattmann and J. Barrow, "A Technique for Reducing Differential Non-Linearity Errors in Flash A/D Converters," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 1991, pp. 170–171.
- [7] H. Pan and A. A. Abidi, "Spatial filtering in flash A/D converters," *IEEE Trans. Circuits Syst. II*, vol. 50, no. 8, pp. 424–436, Aug. 2003.
- [8] K. Kusumoto, A. Matsuzawa, and K. Murata, "A 10-b 20-MHz 30-mW Pipelined Interpolating CMOS ADC," *IEEE J. Solid-State Circuits*, vol. 28, no. 12, pp. 1200– 1206, Dec. 1993.
- [9] B. P. Brandt and J. Lutsky, "A 75-mW, 10-b, 20-MSPS CMOS Subranging ADC with 9.5 Effective Bits at Nyquist," *IEEE J. Solid-State Circuits*, vol. 34, no. 12, pp. 1788–1795, Dec. 1999.
- [10] R. Poujois and J. Borel, "A low drift fully integrated MOSFET operational amplifier," *IEEE J. Solid-State Circuits*, vol. SC-13, no. 4, pp. 499–503, Aug. 1978.
- [11] S. Tsukamoto, W. G. Schofield, and T. Endo, "A CMOS 6-b, 400-MSample/s ADC with Error Correction," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 1939–1947, Dec. 1998.
- [12] C. Donovan and M. P. Flynn, "A 'Digital' 6-bit ADC in 0.25-*µ*m CMOS," *IEEE J. Solid-State Circuits*, vol. 37, no. 3, pp. 432–437, Mar. 2002.
- [13] M. P. Flynn, C. Donovan, and L. Sattler, "Digital Calibration Incorporating Redundancy of Flash ADCs," *IEEE Trans. Circuits Syst. II*, vol. 50, no. 5, pp. 205–213, May 2003.
- [14] C. Paulus, H.-M. Blüthgen, M. Löw, E. Sicheneder, N. Brüls, A. Courtois, M. Tiebout, and R. Thewes, "A 4GS/s 6b Flash ADC in 0.13*µ*m CMOS," in *Symposium on VLSI Circuits Digest of Technical Papers*, Jun. 2004, pp. 420–423.
- [15] C.-C. Huang and J.-T. Wu, "A Background Comparator Calibration Technique for Flash Aanlog-to-Digital Converters," *IEEE Trans. Circuits Syst. I*, vol. 52, no. 9, pp. 1732–1740, Sep. 2005.
- [16] P. M. Figueiredo, P. Cardoso, A. Lopes, C. Fachada, N. Hamanishi, K. Tanabe, and J. Vital, "A 90nm CMOS 1.2V 6b 1GS/s Two-Step Subranging ADC," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2006, pp. 568–569.
- [17] H. Yu and M.-C. F. Chang, "A 1-V 1.25-GS/S 8-Bit Self-Calibrated Flash ADC in 90-nm Digital CMOS," *IEEE Trans. Circuits Syst. II*, vol. 55, no. 7, pp. 668–672, Jul. 2008.
- [18] K. Deguchi, N. Suwa, M. Ito, T. Kumamoto, and T. Miki, "A 6-bit 3.5GS/s 0.9-V 98-mW Flash ADC in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 10, pp. 2303–2310, Oct. 2008.
- [19] B. Verbruggen, P. W. M. Kuijk, and G. V. der Plas, "A 7.6mW 1.75GS/s 5 bit Flash A/D converter in 90nm digital CMOS," in *Symposium on VLSI Circuits Digest of Technical Papers*, Jun. 2008, pp. 14–15.
- [20] C.-Y. Chen, M. Q. Le, and K. Y. Kim, "A Low Power 6-bit Flash ADC With Reference Voltage and Common-Mode Calibration," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1041–1046, Apr. 2009.
- [21] C. chen Liu, S.-J. Chang, G.-Y. Huang, Y.-Z. Lin, C.-M. Huang, C.-H. Huang, L. Bu, and C.-C. Tsai, "A 10b 100MS/s 1.13mW SAR ADC with Binary-Scaled Error Compensation," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2010, pp. 386–387.
- [22] S.-W. M. Chen and R. W. Brodersen, "A 6b 600MS/s 5.3mW Asynchronous ADC in 0.13-*µ*m CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2669–2680, Dec. 2006.
- [23] G. V. der Plas and B. Verbruggen, "A 150MS/s 133*µ*w 7b ADC in 90nm digital CMOS Comparator-Based Asynchronous Binary-Search sub-ADC," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2008, pp. 242–243.
- [24] V. Giannini, P. Nuzzo, V. Chironi, A. Baschirotto, G. V. der Plas, and J. Craninckx, "A 820*µ*W 9b 40MS/s Noise-Tolerant Dynamic-SAR ADC in 90nm Digital CMOS," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2008, pp. 238–239.
- [25] W. Liu, P. Huang, and Y. Chiu, "A 12b 22.5/45MS/s 3.0mW 0.059mm<sup>2</sup> CMOS SAR ADC Achieving Over 90dB SFDR," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2010, pp. 380–381.
- [26] D. J. Huber, R. J. Chandler, and A. A. Abidi, "A 10b 160MS/s 84mW 1V Subranging ADC in 90nm CMOS," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2007, pp. 454–455.
- [27] Y. Shimizu, S. Murayama, K. Kudoh, and H. Yatsuda, "A Split-Load Interpolation-Amplifier-Array 300MS/s 8b Subranging ADC in 90nm CMOS," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2008, pp. 552–553.
- [28] K. Ohhata, K. Uchino, Y. Shimizu, K. Oyama, and K. Yamashita, "Design of a 770-MHz, 70-mW, 8-bit Subranging ADC Using Reference Voltage Precharging Architecture," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 2881–2890, Dec. 2009.
- [29] H. van der Ploeg, G. Hoogzaad, H. A. H. Termeer, M. Vertregt, and R. L. Roovers, "A 2.5-V 12-b 54-Msample/s 0.25-um CMOS ADC in 1-*mm*<sup>2</sup> With Mixed-Signal Chopping and Calibration," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1859– 1867, Dec. 2001.
- [30] H.-W. Chen, I.-C. Chen, H.-C. Tseng, and H.-S. Chen, "A 1-GS/s 6-Bit Two-Channel Two-Step ADC in 0.13-*µ*m CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3051–3059, Nov. 2009.
- [31] Y.-H. Chung and J.-T. Wu, "A CMOS 6-mW 10-bit 100-MS/s Two-Step ADC," in *Proc. IEEE Asian Solid-State Circuits Conference*, Nov. 2009, pp. 137–140.
- [32] A. Panigada and I. Galton, "A 130 mW 100 MS/s Pipelined ADC With 69 dB SNDR Enabled by Digital Harmonic Distortion Correction," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3314–3328, Dec. 2009.
- [33] S. Devarajan, L. Singer, D. Kelly, S. Decker, A. Kamath, and P. Wilkins, "A 16 bit, 125 MS/s, 385 mW, 78.7 dB SNR CMOS Pipelined ADC," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3305–3313, Dec. 2009.
- [34] A. Verma and B. Razavi, "A 10b 500MHz 55mW CMOS ADC," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2009, pp. 18–19.
- [35] L. Brooks and H.-S. Lee, "A 12b, 50 MS/s, Fully Differential Zero-Crossing Based Pipelined ADC," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3329–3343, Dec. 2009.
- [36] Y.-C. Huang and T.-C. Lee, "A 10b 100MS/s 4.5mW Pipelined ADC with a Time Sharing Technique," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2010, pp. 300–301.
- [37] B. Murmann, "ADC Performance Survey 1997-2010," [Online]. Available: http://www.stanford.edu/ murmann/adcsurvey.html.
- [38] K. Uyttenhove and M. S. J. Steyaert, "Speed-Power-Accuracy Tradeoff in High-Speed CMOS ADCs," *IEEE Trans. Circuits Syst. II*, vol. 49, no. 4, pp. 280–287, Apr. 2002.
- [39] M. Choi and A. A. Abidi, "A 6-b 1.3-Gsample/s A/D Converter in 0.35-*µ*m CMOS," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1847–1858, Dec. 2001.
- [40] P. C. S. Scholtens and M. Vertregt, "A 6-b 1.6-Gsample/s Flash ADC in 0.18-*µ*m CMOS Using Averaging Termination," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1599–1609, Dec. 2002.
- [41] X. Jiang and M.-C. F. Chang, "A 1-GHz Signal Bandwidth 6-bit CMOS ADC With Power-Efficient Averaging," *IEEE J. Solid-State Circuits*, vol. 40, no. 2, pp. 532– 535, Feb. 2005.
- [42] H. Kimura, A. Matsuzawa, T. Nakamura, and S. Sawada, "A 10-b 300-MHz Interpolated-Parallel A/D Converter," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 438–446, Apr. 1993.
- [43] R. Poujois, B. Baylac, D. Barbier, and J. M. Lttel, "Low-Level MOS Transistor Amplifier using Storage Techniques," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 1973, pp. 152–153.
- [44] B. Razavi and B. A. Wooley, "Design Techniques for High-Speed, High-Resolution Comparators," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1916–1926, Dec. 1992.
- [45] J. Mulder, C. M. Ward, C.-H. Lin, D. Kruse, J. R. Westra, M. Lugthart, E. Arslan, R. J. van de Plassche, K. Bult, and F. M. L. van der Goes, "A 21-mW 8-b 125MSample/s ADC in 0.09-*mm*<sup>2</sup> 0.13-*µ*m CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2116–2125, Dec. 2004.
- [46] C. Sandner, M. Clara, A. Santner, T. Hartig, and F. Kuttner, "A 6-bit 1.2-GS/s Low Power Flash-ADC in 0.13-*µ*m Digital CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 7, pp. 1499–1504, July 2005.
- [47] A. Nikoozadeh and B. Murmann, "An Analysis of Latch Comparator Offset Due to Load Capacitor Mismatch," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 12, pp. 1398–1402, Dec. 2006.
- [48] B. Verbruggen, J. Craninckx, M. Kujik, P. Warnbacq, and G. V. der Plas, "A 2.6mW 6b 2.2GS/s 4-times Interleaved Fully Dynamic Pipelined ADC in 40nm Digital CMOS," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2010, pp. 296–297.
- [49] Y. Tamba and K. Yamakido, "A CMOS 6b 500MSample/s ADC for a Hard Disk Drive Read Channel," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 1999, pp. 324–325.
- [50] M.-J. Choe, B.-S. Song, and K. Bacrania, "A 13-b 40-MSample/s CMOS Pipelined Folding ADC with Background Offset Trimming," *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1781–1790, Dec. 2000.
- [51] H. Okada, Y. Hashimoto, K. Sakata, T. Tsukada, and K. Ishibashi, "Offset Calibrating Comparator Array for 1.2-V, 6-bit, 4-Gsample/s Flash ADCs using 0.13 *µ*m generic CMOS technology," in *European Solid-State Circuits Conference*, Sep. 2003, pp. 711–714.
- [52] R. C. Taft, C. A. Menkus, M. R. Tursi, O. Hidri, and V. Pons, "A 1.8-V 1.6- GSample/s 8-b Self-Calibrating Folding ADC With 7.26 ENOB at Nyquist Frequency," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2107–2115, Dec. 2004.
- [53] M. Miyahara, Y. Asada, D. Paik, and A. Matsuzawa, "A Low-Noise Self-Calibrating Dynamic Comparator for High-Speed ADCs," in *Proc. IEEE Asian Solid-State Circuits Conference*, Nov. 2008, pp. 269–270.
- [54] Z. Cao, S. Yan, and Y. Li, "A 32mW 1.25GS/s 6b 2b/step SAR ADC in 0.13*µ*m CMOS," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2008, pp. 542–543.
- [55] M. Kijima, K. Ito, K. Kamei, and S. Tsukamoto, "A 6b 3GS/s Flash ADC with Background Calibration," in *Proceedings of the IEEE Custom Integrated Circuits Conference*, Jun. 2009, pp. 283–286.
- [56] E. Alpman, H. Lakdawala, L. R. Carley, and K. Soumyanath, "A 1.1V 50mW 2.5GS/s 7b Time-Interleaved C-2C ADC in 45nm LP Digital CMOS," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2009, pp. 76–77.
- [57] M. Yoshioka, K. Ishikawa, and T. Takayama, "A 10b 50MS/s 820*µ*w SAR ADC with On-Chip Digital Calibration," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2010, pp. 384–385.
- [58] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, "A current-controlled latch sense amplifier and a static power-saving input buffer for low-power architecture," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 523–527, Apr. 1993.
- [59] Y.-T. Wang and B. Razavi, "An 8-Bit 150-MHz CMOS A/D Converter," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 308–317, Mar. 2000.
- [60] P. M. Figueiredo and J. C. Vital, "Kickback noise reduction techniques for CMOS latched comparators," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 7, pp. 541–545, Jul. 2006.
- [61] P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. V. der Plas, "Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures," *IEEE Trans. Circuits Syst. I*, vol. 55, no. 7, pp. 1441–1454, Jul. 2008. WWW.
- [62] B.-S. Song, M. F. Tompsett, and K. R. Lakshmikumar, "A 12-bit 1-Msample/s capacitor error-averaging pipelined A/D conversions," *IEEE J. Solid-State Circuits*, vol. 23, no. 12, pp. 1324–1333, Dec. 1988.
- [63] Y. Chiu, "Inherently Linear Capacitor Error-Averaging Techniques for Pipelined A/D Conversion," *IEEE Trans. Circuits Syst. II*, vol. 47, no. 3, pp. 229–232, Mar. 2000.
- [64] Y.-M. Lin, B. Kim, and P. R. Gray, "A 13-bit 2.5-Mhz Self-Calibrated Pipelined A/D Converter in 3-*µ*m CMOS," *IEEE J. Solid-State Circuits*, vol. 26, no. 4, pp. 628–636, Apr. 1991.
- [65] A. N. Karanicolas, H.-S. Lee, and K. L. Bacrania, "A 15-b 1-Msample/s Digitally Self-Calibrated Pipeline ADC," *IEEE J. Solid-State Circuits*, vol. 28, no. 12, pp. 1207–1215, Dec. 1993.
- [66] D. Fu, K. C. Dyer, S. H. Lewis, and P. J. Hurst, "A Digital Background Calibration Technique for Time-Interleaved Analog/Digital Converters," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 1904–1911, Dec. 1998.
- [67] X. Wang, P. J. Hurst, and S. H. Lewis, "A 12-Bit 20-Msample/s Pipelined Analog-Digital Converter Digital With Nested Digital Background Calibration," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1799–1808, Nov. 2004.
- [68] E. Siragusa and I. Galton, "A Digitally Enhanced 1.8-V 15-bit 40-MSsample/s CMOS Pipelined ADC," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2126– 2138, Dec. 2004.
- [69] Y. Chiu, C. W. Tsang, B. Nikolić, and P. R. Gray, "Least Mean Square Adaptive Digital Background Calibration of Pipelined Analog-to-Digital Converters," *IEEE Trans. Circuits Syst. I*, vol. 51, no. 1, pp. 38–46, Jan. 2004.
- [70] B. Murmann and B. E. Boser, "A 12-bit 75-MS/s Pipelined ADC Using Open-Loop Residue Amplification," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2040–2050, Dec. 2003.
- [71] C. R. Grace, P. J. Hurst, and S. H. Lewis, "A 12-bit 80-MSample/s Pipelined ADC With Bootstrapped Digital Calibration," *IEEE J. Solid-State Circuits*, vol. 40, no. 5, pp. 1038–1046, MAY 2005.
- [72] B. D. Sahoo and B. Razavi, "A 12-Bit 200-MHz CMOS ADC," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, pp. 2366–2380, Sep. 2009.
- [73] U.-K. Moon and B.-S. Song, "Background Digital Calibration Techniques for Pipelined ADCs," *IEEE Trans. Circuits Syst. II*, vol. 44, no. 2, pp. 102–109, Feb 1997.
- [74] I. Galton, "Digital Cancellation of D/A Converter Noise in Pipelined A/D Converters," *IEEE Trans. Circuits Syst. II*, vol. 47, no. 3, pp. 185–196, Mar. 2000.
- [75] P. C. Parks, "A. M. Lyapunov's stability theory 100 years on," vol. 9, pp. 275–303, 1992.
- [76] M. Dessouky and A. Kaiser, "Very Low-Voltage Digital-Audio *∆Σ* Modulator with 88-dB Dynamic Range Using Local Switch Bootstrapping," *IEEE J. Solid-State Circuits*, vol. 36, no. 3, pp. 349–355, Mar. 2001.
- [77] B. Razavi, "Principles of Data Conversion System Design," *IEEE Press*, 1995.
- [78] M. Burns and G. W. Roberts, "An Introduction to Mixed-Signal IC Test and Measurement," *Oxford University Press*, 2001.
- [79] M. Yoshioka, M. Kudo, T. Mori, and S. Tsukamoto, "A 0.8V 10b 80MS/s 6.5mW Pipelined ADC with Regulated Overdrive Voltage Biasing," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, Feb. 2007, pp. 452–453.
- [80] K. Honda, M. Furuta, and S. Kawahito, "A Low-Power Low-Voltage 10-bit 100- MSample/s Pipeline A/D Converter Using Capacitance Coupling Techniques," *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 757–765, Apr. 2007.
- [81] H.-C. Choi, Y.-J. Kim, S.-W. Yoo, S.-Y. Hwang, and S.-H. Lee, "A Programmable 0.8-V 10-bit 60-MS/s 19.2-mW 0.13-*µ*m CMOS ADC Operating Down to 0.5V," *IEEE Trans. Circuits Syst. II*, vol. 55, no. 4, pp. 319–323, Apr. 2008.
- [82] S.-K. Shin, Y.-S. You, S.-H. Lee, K.-H. Moon, J.-W. Kim, L. Brooks, and H.-S. Lee, "A Fully-Differential Zero-Crossing-Based 1.2V 10b 26MS/s Pipelined ADC in 65nm CMOS," in *Symposium on VLSI Circuits Digest of Technical Papers*, Jun. 2008, pp. 218–219.
- [83] J. Hu, N. Dolev, and B. Murmann, "A 9.4-bit, 50-MS/s, 1.44-mW Pipelined ADC Using Dynamic Source Follower Residue Amplification," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1057–1066, Apr. 2009.
- [84] D. A. Johns and K. Martin, "Analog Integrated Circuit Design," *John Wiley and Sons, Inc.*, 1997.
- [85] B. Wicht, T. Nirschi, and D. Schmitt-Landsiedel, "Yield and Speed Optimization of a Latch-Type Voltage Sense Amplifier," *IEEE J. Solid-State Circuits*, vol. 39, no. 7, pp. 1148–1158, Jul. 2004.
- [86] J. He, S. Zhan, and R. L. G. Degang Chen, "Analyses of Static and Dynamic Random Offset Voltages in Dynamic Comparators," *IEEE Trans. Circuits Syst. I*, vol. 56, no. 5, pp. 911–919, May 2009.
- [87] J. Kim, B. S. Leibowitz, J. Ren, and C. J. Madden, "Simulation and Analysis of Random Decision Errors in Clocked Comparators," *IEEE Trans. Circuits Syst. I*, vol. 56, no. 8, pp. 1844–1856, Aug. 2009.
- [88] B. R. Gregoire and U.-K. Moon, "An Over-60 dB True Rail-to-Rail Performance Using Correlated Level Shifting and an Opamp With Only 30 dB Loop Gain," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2620–2630, Dec. 2008.
- [89] A.-J. Annema, B. Nauta, R. van Langevelde, and H. Tuinhout, "Analog Circuits in Ultra-Deep-Submicron CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 132–143, Jan. 2005.



BIBLIOGRAPHY



# 目像



Yung-Hui Chung received B.S. and M.S. degrees in control engineering from National Chiao-Tung University, Hsin-Chu, Taiwan, in 1992 and 1994, respectively. He is currently working towards the Ph.D. degree in electronics engineering at National Chiao-Tung University.

From 1994 to 1997 he was at OES/ITRI, working on system emulation of optical disk drive. From 1997 to 1998 he was at ERSO/ITRI, working on analog circuits design. From 1998 to 1999 he was at Global Unichip Corp., working on PLL circuits design. From 1999 to 2003 he was at Faraday Technology Corp, working on PLL and DLL circuits design. His research interests include high-speed data converters and clock generation circuits.

住址: 新竹市東區大學路 68 號 4 樓之 2

本論文使用 LTEX<sup>1</sup> 系統排版.

<sup>&</sup>lt;sup>1</sup>LMEX 是 TEX 之下的 macros 集. TEX 是 American Mathematical Society 的註册商標. 本論文 macros 的原始作者是 Dinesh Das, Department of Computer Sciences, The University of Texas at Austin. 交大中文版的作者是吴介琮, 交通大學電子工程學系, 新竹, 台灣.

# Publication List

- Journal Paper:
	- Y.-H. Chung and J.-T. Wu, "A CMOS 6mW 10-bit 100-MS/s Two-Step ADC, " accepted by *IEEE Journal of Solid-State Circuits*.
- Conference Paper:
	- Y.-H. Chung and J.-T. Wu, "A CMOS 6mW 10-bit 100-MS/s Two-Step ADC, " in *Proc. IEEE Asian Solid-State Circuits Conference*, Nov. 2009, pp.137- 140.

