# A CMOS 6-Bit 16-GS/s Time-Interleaved ADC Using Digital Background Calibration Techniques Chun-Cheng Huang, Member, IEEE, Chung-Yi Wang, and Jieh-Tsorng Wu, Senior Member, IEEE Abstract—An 8-channel 6-bit 16-GS/s time-interleaved analog-to-digital converter (TI ADC) was fabricated using a 65 nm CMOS technology. Each analog-to-digital channel is a 6-bit flash ADC. Its comparators are latches without the preamplifiers. The input-referred offsets of the latches are reduced by digital offset calibration. The TI ADC includes a multi-phase clock generator that uses a delay-locked loop to generate 8 sampling clocks from a reference clock of the same frequency. The uniformity of the sampling intervals is ensured by digital timing-skew calibration. Both the offset calibration and the timing-skew calibration run continuously in the background. At 16 GS/s sampling rate, this ADC chip achieves a signal-to-distortion-plus-noise ratio (SNDR) of 30.8 dB. The chip consumes 435 mW from a 1.5 V supply. The ADC active area is $0.93 \times 1.58 \ \mathrm{mm}^2$ . *Index Terms*—Analog-digital conversion, calibration, clocks, comparators, flash ADC, offset, time-interleaved ADC, time interleaving, timing circuits, timing skew. #### I. INTRODUCTION N A TYPICAL flash analog-to-digital converter (ADC), a group of comparators compare the sampled analog input with a set of known references simultaneously to determine the magnitude of the sampled input. The references are usually generated by a resistor string. Crucial ADC performance specifications, such as resolution, speed, power, are mainly determined by the comparators. For a given technology, there exists a power-speed-accuracy limitation for comparators [1]. There are techniques, such as capacitor-offset storage [2], [3] and spatial filtering [4]–[6], that overcome the inherent device limitation for the comparators. For a given technology, the flash ADCs achieve the fastest sampling rate among various single-channel ADC architectures. The input sampling interval of a flash ADC is determined by the speed of its comparators. The time-interleaved (TI) architecture is often used to increase the ADC sampling rate, i.e., reduce the effective input sampling interval without changing the ADC original sampling interval. A TI ADC comprises several analog-to-digital (A/D) channels, and coordinates their opera- Manuscript received August 24, 2010; revised November 28, 2010; accepted December 21, 2010. Date of publication March 10, 2011; date of current version March 25, 2011. This paper was approved by Guest Editor Makoto Nagata. This work was supported by the National Science Council (Grant NSC-98-2221-E-009-131-MY2) of Taiwan, R.O.C., and the MediaTek Research Center at National Chiao-Tung University. The authors are with the Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan (e-mail: cchuang@icplus.com.tw; cchuang0730@gmail.com; jtwu@mail.nctu.edu.tw). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2011.2109511 tion so that the analog input is sampled and converted sequentially by different A/D channels. To achieve the desired resolution, there are additional requirements for a TI ADC including conversion offset matching among channels, conversion gain matching among channels, and sampling interval uniformity. TI ADCs with multiple A/D channels can achieve a sampling rate higher than 10 GS/s [7]–[11]. Different types of A/D channels have been used, including pipelined ADCs [7], [9], SAR ADCs [8], [10], and flash ADCs [11]. The designs of [7], [8], [10] calibrate the offset/gain/phase mismatches in the foreground by applying a single-tone sine wave to the analog inputs of the TI ADCs. The design of [9] calibrates the offset/gain mismatches by applying a test signal generated on-chip. This design adds a redundant ADC in each A/D channel to enable background calibration. Thus, when one ADC is under calibration, the other can continue the normal A/D operation. In the intended application of this TI ADC, the sampling phase of each A/D channel is adjusted by a digital timing recovery circuit [12]. The design of [11] calibrates the phase mismatches in the background by using the timing information embedded in the analog input signal. However, the input timing information may not be reliable in some applications. This design calibrates the offsets of the comparators in the foreground. This paper describes a 6-bit 16-GS/s TI ADC [13]. It demonstrates our proposed design techniques for flash ADCs and TI ADCs [14]–[16] and their circuit implementations. This TI ADC comprises 8 flash A/D channels. Each flash ADC consists of 63 latch-type comparators. To save power, preamplifiers are not used. Digital offset calibration is then used to remove the offsets of the latches. The inter-channel offset mismatches are eliminated once the offsets of the latches are removed. To minimize inter-channel gain mismatches, we make sure that the flash ADCs are supplied with identical voltage references. In addition, we use digital timing-skew calibration to ensure sampling interval uniformity. Both the offset calibration and the timing-skew calibration are realized on-chip and can run continuously in the background. The rest of this paper is organized as follows. Section II describes the architecture of this TI ADC. Section III describes the design of the flash A/D channel and its embedded comparator offset calibration technique. Section IV describes the multi-phase clock generation scheme of the TI ADC and the associated timing-skew calibration technique. Section V shows the experimental results. Section VI draws conclusions. ## II. TI ADC ARCHITECTURE Fig. 1 shows the architecture of the TI ADC. It comprises eight time-interleaved identical A/D channels, ADC<sub>1</sub> to ADC<sub>8</sub>. Fig. 1. Time-interleaved ADC architecture. Fig. 2. Flash ADC architecture. The A/D channels are driven respectively by 8 different clocks with equally-spaced phases, $\phi_1$ to $\phi_8$ . The ADC analog input s(t) is sequentially sampled and digitized by the 8 A/D channels. The clock frequency is $f_c$ . The 6-bit digital streams from the 8 A/D channels, $s_1[k]$ to $s_8[k]$ , are then demultiplexed to construct the final ADC digital output, s[l]. The equivalent sampling rate of the TI ADC is $8\times f_c$ . For this TI ADC, the clock frequency is $f_c=2$ GHz, the clock period is $T_c=1/f_c=500$ ps, and the sampling interval is $T_s=T_c/8=62.5$ ps. All A/D channels are identical flash ADCs. Each flash ADC includes a resistor string to generate reference voltages to be compared with the analog input. The resistor strings of the 8 flash ADCs share the same terminal voltages, $V_{RT}$ and $V_{RB}$ in Fig. 2. Thus, all A/D channels have the same conversion offset and conversion gain. The flash ADC design is covered in Section III. In this TI ADC, a delay-locked loop (DLL) receives a reference clock $\phi_r$ of frequency $f_c$ and generates 8 sampling clocks, $\phi_1$ to $\phi_8$ , of the same frequency. The 8 sampling clocks have phases that equally divide one $T_c$ clock period. They are delivered to the analog samplers in the A/D channels through clock buffers, $B_1$ to $B_8$ . Due to device variations in the DLL and the clock buffers, and also due to mismatches among the clock distribution routes, the clocks may reach their respective samplers at different delay. This phenomenon is called timing skew. Because of timing skew, the phases of the sampling clocks re- Fig. 3. Background-calibrated comparator (BCC) block diagram. ceived by the samplers in the A/D channels may no longer be equally spaced. As a result, the TI ADC experiences a periodic variation of sampling intervals. The proposed TI ADC includes a timing-skew calibration processor that automatically adjusts the delay of the 8 clock buffers to ensure sampling interval uniformity. The multi-phase clock generation and the timing-skew calibration are described in Section IV. #### III. FLASH ADC #### A. Architecture Fig. 2 shows the block diagram of a single A/D channel. It is a flash ADC consisting of 63 background-calibrated comparators (BCCs) [14]. Each BCC comprises a random-chopping latch (RCL) and a calibration processor (CP). Preceding the comparators is a p-channel MOSFET M1 that functions as an analog input sampler for the ADC. The reference voltages $V_{R,1}$ to $V_{R,63}$ are generated by using a resistor string. In fully-differential configuration, the top reference $V_{RT}$ and the bottom reference $V_{RB}$ are $+0.4~\rm V$ and $-0.4~\rm V$ respectively. Thus, the reference spacing or the ADC quantization step size is $\Delta V_R = 800/62 \approx 12.9~\rm mV$ , and the ADC differential input range is $V_{FS} = 63\Delta \rm V_R \approx 0.813~\rm V_{pp}$ . The digital outputs from all BCCs, $D_{c,1}[k]$ to $D_{c,63}[k]$ , are fed into a thermometer-code edge detector (TCED) to generate an edge code, $D_{e,1}[k]$ to $D_{c,63}[k]$ . The edge code indicates the location of the 1-to-0 transition edge in thermometer code $D_{c,1}[k]$ to $D_{c,63}[k]$ . The TCED is a collection of two-input AND gates, so that $D_{e,j}[k] = 1$ only if $D_{c,j}[k] = 1$ and $D_{c,j+1}[k] = 0$ , otherwise $D_{e,j}[k] = 0$ . The TCED is followed by an encoder, which converts the edge code into a Gray code and then converts the Gray code into a binary code. Its binary-number output s[k] represents the magnitude of ADC input sample $V_i[k]$ . Fig. 3 shows the BCC block diagram. This comparator employs only a regenerative latch for the comparison function. A conventional high-speed comparator usually comprises a regenerative latch preceded by a preamplifier. The latch is power efficient for the comparison function. However, it exhibits an input-referred offset due to device mismatch. The gain of the preamplifier relaxes the offset requirement for the latch. Sometimes offset cancellation techniques, such as capacitor offset storage [3] or spacial filtering [6], are used to reduce the preamplifier offsets. In our design, preamplifiers are removed to save Fig. 4. Statistical comparator offset detection. power. The offsets of the latches are eliminated by a statistics-based offset calibration technique [14]. As shown in Fig. 3, a BCC includes an RCL and a CP. The RCL consists of a latch and two random choppers CHP1 and CHP2. The two choppers are placed before and after the latch respectively to facilitate background calibration. They are controlled by the same binary random sequence $q[k] \in \{+1, -1\}$ . When q[k] = +1, the chopper passes its two inputs to its two corresponding outputs directly. When q[k] = -1, the signal paths to the two outputs are interchanged. The chopper CHP1 consists of 4 analog switches. The chopper CHP2 consists of digital logic gates. Regardless of the q[k] value, the RCL works like a normal comparator, detecting the polarity of $V_i - V_R$ and generating a digital output $D_c[k] \in \{0,1\}$ accordingly. In Fig. 3, the input-referred offset of the latch, denoted as $V_{OS}[k]$ , is adjustable. The CP detects the polarity of $V_{OS}[k]$ and then try to minimize $|V_{OS}[k]|$ . Fig. 4 shows the principle of a statistical comparator offset detection technique [14]. It shows the probability density function (PDF) of $V_i[k]$ . The voltage $V_{OS,j}$ denotes the offset of the j-th BCC shown in Fig. 2. The plots assume $V_{OS,j}>0$ . For the j-th BCC, the probability of its output $D_{c,j}[k]=1$ is illustrated in the upper plot of Fig. 4. When q[k]=+1, the probability is the area P. When q[k]=-1, the probability is the area P. Thus, the polarity of $V_{OS,j}$ can be detected by finding the polarity of $\Delta P$ . Fig. 5 shows the CP signal path and its operation. It includes an accumulation-and-reset (AAR) to detect the $\Delta P$ polarity. Its input $U[k] = D_c[k] \times q[k]$ is a correlation of $D_c[k]$ and q[k]. The average of U[k] is proportional to $\Delta P$ . The AAR uses an accumulator (ACC0) to accumulate U[k], yielding R[k]. A bilateral peak detector (BPD) constantly compares R[k] against a positive integer $+N_C$ and a negative integer $-N_C$ . If $-N_C$ $R[k] < +N_C$ , its output is S[k] = 0. Whenever, $R[k] \ge +N_C$ , it issues an output S[k] = +1 for one clock cycle and then resets R[k] to 0. Whenever, $R[k] \leq -N_C$ , it issues an output S[k] = -1 for one clock cycle and also resets R[k] to 0. The AAR operation is illustrated in Fig. 5. If the latch offset $V_{OS}[k]$ in Fig. 3 is positive, then R[k] tends to decrease. When R[k]reaches $-N_C$ , S[k] becomes -1 for one cycle, and R[k] is reset to 0. When S[k] = -1, the accumulator ACC is decreased by 1. The ACC output T[k] controls the offset of the latch shown in Fig. 5. Accumulation-and-reset operation of the calibration processor. Fig. 3, $V_{OS}[k]$ . The offset control step size is $\Delta V_{OS}$ . As $V_{OS}[k]$ approaches 0, it takes longer to activate S[k]. Eventually, $V_{OS}[k]$ fluctuates around 0. This $V_{OS}$ fluctuation can be treated as a noise imposed on the ADC input. The standard deviation of $V_{OS}$ should be less than that of the ADC quantization noise, i.e., $\sigma(V_{OS}) < \Delta V_R/\sqrt{12}$ . It can be shown that the magnitude of $V_{OS}[k]$ fluctuation is related to the area P illustrated in Fig. 4 [14]. When the input sample $V_i[k]$ is in the P region, U[k] is equal to the random sequence q[k], which contains no $V_{OS}$ information and only causes $V_{OS}[k]$ fluctuation. Consider the j-th BCC. If its CP input is the RCL output $D_{c,j}[k]$ , then its P region stretches from $V_{R,j}+V_{OS,j}$ to $V_{R,63}$ . To reduce the P region, we use the TCED output $D_{e,j}[k]$ as the CP input. As shown in Fig. 4, the P region for $D_{e,j}[k]$ is from $V_{R,j}+V_{OS,j}$ to $V_{R,j+1}$ only. Consider the ADC shown in Fig. 2. Under normal conditions, only one $D_{e,j}$ signal out of the TCED is 1. Thus, only one CP is activated at a given time, i.e., only one BCC is calibrated by one step for every comparison cycle. Before calibration, the initial offsets of the BCCs are random and maybe larger than the $V_R$ spacing, $\Delta V_R$ . When the calibration begins, large offsets can cause neighboring BCCs to influence one another. The interferences may trigger some BCCs to perform unnecessary calibrations while stop the calibration process for other BCCs. In a worst-case scenario, the 63th BCC, the one connected to $V_{R.63}$ , converges first, since its calibration cannot be disrupted by other BCCs. Once the 63th BCC settles, the 62th BCC can follow and converge, and so on. Thus, main effect of initial large offsets is increasing the ADC overall calibration time. In order to reduce the interferences, two uncorrelated random sequences, $q_1[k]$ and $q_2[k]$ , are supplied to the RCLs, and adjacent RCLs use different random sequences for random chopping. The behavior of the offset calibration is determined by the AAR threshold $N_C$ , the offset control step size $\Delta V_{OS}$ , and the statistics of the $V_i[k]$ samples. If the input sample $V_i[k]$ appears uniformly across the entire ADC input range, $V_{FS}$ , the calibration loop in each BBC can be modeled as a single-pole feedback system [14] with a time constant of $$\tau_{c,os} = N_C \times \frac{V_{FS}}{\Delta V_{OS}} \times T_c \tag{1}$$ where $T_c$ is the clock period. In our 6-bit ADC design, $V_{FS}=2^6\times \Delta V_R$ , $N_C=16$ , and $\Delta V_{OS}=0.25\Delta V_R$ . This design results in a time constant of $\tau_{c,os}=4096T_c$ and an offset fluctuation standard deviation of $\sigma(V_{OS})=0.13\Delta V_R$ . If it takes $4\tau_{c,os}$ for all BCCs to settle, a calibration time of $4\times 4096T_c\approx 8.2~\mathrm{nsec}$ is required. The time constant $\tau_{c,os}$ can be reduced by using a smaller $N_C$ and/or a larger $\Delta V_{OS}$ , but at the expense of larger $\sigma(V_{OS})$ . ## B. Circuit Implementation The analog signal path of the ADC is realized with a fully-differential configuration. The resistor string shown in Fig. 2 that generates reference voltages $V_{R,1}$ to $V_{R,63}$ is composed of two parallel resistor strings with currents flowing in opposite direction. Each resistor string includes 62 P+ diffusion resistors placed in a guarded N-well. Each resistor has a resistance of 10 $\Omega$ . The terminal voltages for each resistor string are externally supplied. They are 1.1 V and 1.5 V. Fig. 6 shows the schematic of the random-chopping latch (RCL) of the BCC. The latch shown in Fig. 3 is in fact a cascade of three pipelined latches. The chopper CHP1 in front of the latches consists of p-channel MOSFET switches. Fig. 6 also shows the schematic of the first latch. There are two source-coupled pairs to receive the two differential inputs, $V_1$ and $V_2$ . When clock CK is high, M1 and M3 provide a constant current for the latch so that its input common-mode sensitivity is reduced. The second and the third latches are similar to the first latch. They are added to provide signal gain to suppress metastability. They have only one input source-coupled pair and do not have the variable-offset control. As shown in Fig. 6, the latch's variable-offset control is achieved by changing the loading and pulling strength on nodes $V_{a1}$ and $V_{a2}$ [17]. There are 16 equally-weighted n-channel MOSFET varactor pairs (M17–M18) for fine control and 4 equally-weighted pulling sources (M19–M22) for coarse control. The varactors have their gate terminals connected to nodes $V_{a1}$ and $V_{a2}$ in the latch. Their capacitance is varied by switching the voltage on the source and drain nodes. The source and drain diffusion capacitance of the varactors are not added to nodes $V_{a1}$ and $V_{a2}$ . The pulling sources for coarse control are similar to the input source-coupled pairs but consist of MOSFETs of smaller sizes. The offset control signals are $T_{ca}$ and $T_{cb}$ for each coarse control pulling source and $T_{fa}$ and $T_{fb}$ for each fine control varactor pair. The control signals come from two digital shift registers controlled by the CP. Their voltages are either $V_{DD}$ or $V_{SS}$ . During the power-on phase, the ADC input is applied with a full-range sine wave for an initial offset calibration of the BCCs. The CPs in the BCCs adjusts the coarse controls and then the fine controls. After the power-on phase, the ADC starts its normal operation. The offset calibration then runs in the background and the CPs adjust only the fine controls. From Monte Carlo simulations, the offset variation of the latch is Gaussian and has a standard deviation of $\sigma(V_{OSL})=28.5$ mV. The range of offset control is required to be wider than $\pm 4\sigma(V_{OSL})$ to achieve a yield of 96.86%, in which all 63 $\times$ 8 BCCs in a TI ADC can be successfully calibrated. The offset coarse control can change $\pm 4$ steps. From simulations, it has a Fig. 6. Random-chopping latch (RCL) schematic. control step size of 32 mV. Thus, the coarse control can adjust the latch offset by $\pm 128$ mV = $\pm 4.5\sigma(V_{OSL})$ . Note that the step size of the coarse control itself has a standard deviation of 3 mV. Thus, the fine offset control must cover a range at least wider than $\pm 44$ mV. In our design, the step size of the offset fine control is $0.25\Delta V_R = 3.2$ mV. The fine control can change $\pm 16$ steps, thus covers an offset range of $\pm 51$ mV. To reduce power dissipation, the CPs in all BCCs operate at 1/64 of the $f_c$ frequency. Thus, the effective calibration time constant is $64\tau_{c,os}$ . #### IV. MULTI-PHASE CLOCK GENERATOR # A. Architecture In Fig. 1, a delay-locked loop (DLL) generates 8 sampling clocks, $\phi_1$ to $\phi_8$ , for input s(t) sampling. The clocks are then sent to the 8 A/D channels respectively. Timing skews occur due to device variation in the DLL and in the 8 clock buffers, $B_1$ to $B_8$ , and also due to mismatches among the clock distribution routes. Because of timing skews, the phases of the sampling clocks received by the samplers in the A/D channels may no longer be equally spaced within one clock period. As a result, the TI ADC exhibits a periodic variation in sampling intervals. For our 6-bit 8-channel TI ADC, the clock frequency is $f_c = 2$ GHz, the clock period is $T_c = 500$ ps, the nominal sampling interval is $T_s = T_c/8 = 62.5$ ps. To achieve a signal-to-distortion-plus-noise ratio (SNDR) better than 36 dB, the standard deviation of the sampling interval variation must be less than 0.31 ps. Analogous to the ADC quantization resolution $\Delta V_R$ , we define the ADC timing resolution as $\Delta T_R = T_s/2^6 \approx 1$ ps. We employ a timing-skew calibration technique to ensure sampling interval uniformity [16]. To facilitate background calibration, an on-chip clock generator x(t) is added to the TI ADC. In each A/D channel, a replica sampler similar to the one in the flash ADC samples x(t). A comparator is then used to determine the the polarity of the sample $x_j[k]$ , yielding $c_j[k]$ . A timing-skew calibration processor (TSCP) collects the data $c_1[k]$ to $c_8[k]$ from all A/D channels. It detects the sampling intervals among the sampling clocks, $\phi_1$ to $\phi_8$ . Its outputs, $T_1[k]$ to $T_8[k]$ , adjust the delay of the clock buffers, $B_1$ to $B_8$ , such that all sampling intervals can maintain uniformity. The principle of timing-skew detection is based on zero-crossing (ZC) detection [15], [16]. Fig. 7 shows the concept of ZC. The signal x(t) is sampled by clocks $\phi_1$ to $\phi_4$ , yielding samples $x_1[k]$ to $x_4[k]$ . A ZC occurs when x(t) changes polarity. In Fig. 7, there is a ZC between $x_2[0]$ and $x_3[0]$ , and another one between $x_2[1]$ and $x_3[1]$ . It can be shown that, if x(t) is a periodic signal and is not synchronized with the sampling clocks, the probability of ZC between two adjacent x(t) samplers, $x_j[k]$ and $x_{j+1}[k]$ , is proportional to their sampling interval and also proportional to the x(t) frequency. Thus, the sampling interval can be found by detecting ZC. Fig. 8 shows a simple ZC detector, ZCD1. The signal x(t) is sampled by the two samplers controlled by clocks $\phi_j$ and $\phi_{j+1}$ respectively, yielding $x_j[k]$ and $x_{j+1}[k]$ . The polarities of $x_j[k]$ and $x_{j+1}[k]$ are determined by two comparators, yielding $c_j[k] \in \{0,1\}$ and $c_{j+1}[k] \in \{0,1\}$ . The output $z_j[k]$ is 1 Fig. 7. Concept of zero crossing (ZC). Fig. 8. A simple zero-crossing detector, ZCD1. Fig. 9. An improved zero-crossing detector, ZCD2. when $c_j[k] \neq c_{j+1}[k]$ , i.e., when a ZC occurs. ZCD1 is sensitive to comparator offsets. If the two internal comparators exhibit offsets, detection errors may occur. Fig. 9 shows a ZC detector, ZCD2, that is less sensitive to the comparator offsets [16]. Each of the comparator outputs, $c_j[k]$ and $c_{j+1}[k]$ , first goes through an one-bit high-pass filter, $1-z^{-1}$ , yielding $r_j[k] \in \{-1,0,+1\}$ and $r_{j+1}[k] \in \{-1,0,+1\}$ . The ZC logic determines the $z_j[k]$ output as follows. The output $z_j[k] = 1$ if $r_j[k] \times r_{j+1}[k] \leq 0$ , otherwise $z_j[k] = 0$ . In other words, $z_j[k] = 0$ if both $r_j[k]$ and $r_{j+1}[k]$ are +1 or both are -1. ZCD2 is no longer a simple detector of ZCs in x(t). It detects certain events that cause $z_j[k] = 1$ . However, its behavior in the timing-skew calibration is similar to that of a ZCD1. Fig. 10. Timing-skew calibration processor (TSCP) Fig. 10 shows the TSCP block diagram. It generates digital control signals $T_1[k]$ to $T_8[k]$ to adjust the delays of clock buffers $B_1$ to $B_8$ respectively. It chooses $\phi_1$ as the reference phase by setting $T_1[k] = 0$ . All other clocks are aligned to $\phi_1$ . There are 7 calibration channels controlling the phases of clocks $\phi_2$ to $\phi_8$ . Consider the $\phi_2$ calibration channel that generates $T_2[k]$ . Its internal ZC detector receives two bit streams, $c_1[k]$ and $c_2[k]$ , which are originated from A/D channels ADC<sub>1</sub> and ADC<sub>2</sub> respectively. The ZCs between $\phi_1$ and $\phi_2$ are detected based on the ZCD2 shown in Fig. 9, yielding $z_1[k]$ . The average of $z_1[k]$ represents the sampling interval between $\phi_1$ and $\phi_2$ . The measured sampling interval is compared against the nominal sampling interval, which is the average of m[k] generated from a ZC recorder. The polarity of the averaged $(m[k]-z_1[k])$ is extracted by an accumulation-and-reset (AAR) similar to the one shown in Fig. 5, but with a threshold of $N_T$ . The AAR output $S_1[k]$ updates the following accumulator (ACC) which contains $T_2[k]$ . The signal $T_2[k]$ controls the delay of clock buffer $B_2$ with a delay-control step size of $\mu_t$ . The m[k] sequence represents the average of the ZC occurrences among all sampling intervals. The m[k] is generated by the ZC recorder shown in Fig. 11. It counts every ZC in x(t), and issues an m[k]=1 every 8 ZCs. The recorder accumulates all ZCs from all ZC detectors. The ZC detector in the top-left corner of Fig. 10 is added to detect the ZCs in the missing interval between $\phi_1$ and the $\phi_8$ immediately before $\phi_1$ . A comparator compares the accumulation result a[k] with integer 8, yielding a binary $m[k] \in \{0,1\}$ every clock cycle. Whenever $a[k] \geq 8$ , the comparator issues m[k] = 1, and an amount of 8 is subtracted from a[k] during the following clock cycle. The digital stream m[k] is a sequence of 0 and 1. Its mean value represents the nominal sampling interval. The proposed ZC recorder is simple and its hardware cost is low. The behavior of this timing-skew calibration is determined by the AAR threshold $N_T$ , the delay-control step size $\mu_t$ , and the x(t) frequency $f_x$ . The calibration loop in each calibration Fig. 11. Zero-Crossing (ZC) recorder. Fig. 12. Multi-phase clock generator. channel can be modeled as a single-pole feedback system [16] with a time constant of $$\tau_{c,ts} = N_T \times \frac{1}{2f_x \times \mu_t} \times T_c \tag{2}$$ For this 6-bit 8-channel TI ADC, the clock frequency is $f_c=2$ GHz, the clock period is $T_c=500$ ps, the normal sampling interval is $T_s=62.5$ ps, the timing resolution is defined as $\Delta T_R=T_s/2^6\approx 1$ ps. We choose x(t) frequency $f_x=0.25f_c=500$ MHz, $N_T=2^{10}$ , $\mu_t=0.25\Delta T_R$ . This design results in a time constant of $\tau_{c,ts}=2^{22}T_c$ and an averaged timing fluctuation standard deviation of $\sigma(\tau_T)=0.22\Delta T_R$ . If it takes $4\tau_{c,ts}$ for all calibration channel to settle, a calibration time of $4\times\tau_{c,ts}\approx 8.4$ msec is required. The time constant $\tau_{c,ts}$ can be reduced by using a smaller $N_T$ and/or a larger $\mu_t$ , but at the expense of larger $\sigma(\tau_T)$ . # B. Circuit Implementation Fig. 12 shows the multi-phase clock generator that generates the 8 sampling clocks, $\phi_1$ to $\phi_8$ , based on a reference clock $\phi_r$ . It contains a DLL that consists of a phase detector (PD), a charge pump (CP), and 8 identical variable-delay delay cells, D1 to D8. Fig. 13 shows a schematic of the delay cell. It is simply a cascade of two current-controlled inverters. The inverters' currents are adjusted by $V_{cn}$ and $V_{cp}$ . Voltage $V_{cn}$ is generated by the CP. Voltage $V_{cp}$ is generated by the current mirror MC1–MC2, which is shared with other delay cells. MOSFETs M2, M6, M8, and M12 limit the range of adjustable delay to prevent false lock. Fig. 14 shows the PD and CP schematics. The PD is an arbiter-type design. It compares the timing difference between the rising edges of input clocks CK1 and CK2. The clocks CK1d and CK2d are delayed CK1 and CK2 respectively. The CP is reset when both CK1 and CK2 are low. The drain voltage on M5 and M6 in the reset mode reduces the dead zone of timing Fig. 13. DLL delay cell. Fig. 14. DLL phase detector (PD) and charge pump (CP). Fig. 15. Digitally-controlled variable-delay clock buffer. comparison. The up and dn signals from the PD control the CP to charge or discharge capacitor $C_L$ , yielding voltage $V_{cn}$ . A voltage buffer, Buf1, replicates $V_{cn}$ and distributes the voltage to the drains of M15 and M16 when up and/or dn signals are not active. Buf1 is a single-stage opamp consisting of a source-coupled pair and a current mirror. In Fig. 12, the 8 sampling clocks from the DLL are delivered to the A/D channels through 8 variable-delay clock buffers, $B_1$ to $B_8$ . The delays of the clock buffers are controlled by the TSCP shown in Fig. 10. Fig. 15 shows a variable-delay clock buffer. It is a cascade of 3 inverters. Its delay is adjusted by changing the capacitance of the MOS varactors attached to the outputs of the first and second inverters. Both n-channel and p-channel MOSFETs are used as the varactors. The capacitance of the varactors are binary weighted. They are controlled by a 7-bit digital control signal $T_j[k]$ generated from the TSCP. Its most significant bit is connected to the $T_c$ input of the coarse control, while the other 6 bits are connected to the 6 $T_f$ inputs Fig. 16. Simulated output waveforms of the variable-delay clock buffer. Fig. 17. x(t) generator for timing-skew calibration. of the fine control. Fig. 16 shows the simulated output waveforms of the clock buffer at various delay settings. The clock frequency is 2 GHz. The delay control has a step size of 0.4 ps. The signal x(t) for timing-skew calibration is generated on-chip. Fig. 17 shows the x(t) generator. It is a ring oscillator consisting of 7 NAND gates. Its oscillation frequency can be varied by changing voltage $V_{XC}$ . The proposed timing-skew calibration does not require x(t) having an accurate frequency or a specific waveform shape. The jitter in x(t) also does not affect the calibration accuracy. In our implementation, $V_{XC}$ is manually adjusted so that the x(t) frequency is approximately 400 MHz. In actual applications, $V_{XC}$ can be digitally controlled by a random number generator to introduce jitter in x(t). The jitter can prevent synchronization between x(t) and the sampling clocks. In each A/D channel, the polarities of the x(t) samples are determined by a comparator. The comparator is a latch similar to the one shown in Fig. 6. It has only one input source-coupled pair and do not have the variable-offset control. To reduce power dissipation, the TSCP operates at 1/64 of the $f_c$ frequency. Thus, the effective calibration time constant is $64\tau_{c,ts}$ . ## V. EXPERIMENTAL RESULTS This TI ADC was fabricated using a 65 nm CMOS technology. All ADC circuits are realized with standard MOSFETs. The supply voltage is raised to 1.5 V to obtain a better SNDR performance out of the s(t) samplers and to increase the speed of the comparators. Fig. 18 shows the chip micrograph. The ADC active area is $0.93 \times 1.58~\mathrm{mm}^2$ . To ensure that all A/D channels exhibit identical conversion gain, the reference voltages $V_{RT}$ and $V_{RB}$ shown in Fig. 2 must be the same when received by each A/D channel. Fig. 19 shows the floorplan for the $V_{RT}$ and $V_{RB}$ routes. The $V_{RT}$ and $V_{RB}$ routes are realized with multi-layer metals to reduce resistances. The TI ADC also Fig. 18. ADC chip micrograph. Fig. 19. Floorplan for voltage reference $V_{RT}$ and $V_{RB}$ . Fig. 20. Floorplan for signal routes of s(t) and x(t). requires that all A/D channels receive the same analog input, the signal s(t) shown in Fig. 1. Fig. 20 shows the floorplan for the s(t) signal routes. The differential s(t) is first directed to the center of the chip through two metal lines. It is then sent to each A/D channel through routes of identical length and shape. The multi-phase clock generator, including DLL and clock buffers, is placed near the center of the chip. The on-chip timing-skew calibration processor can correct timing skews among the clocks caused by devices mismatches and clock routes mismatches. However, the calibration requires that the x(t) samplers in all A/D channels receive the same calibration signal x(t). As shown in Fig. 20, the x(t) generator is located near Fig. 21. Measured DNL and INL of a single flash A/D channel. the center of the chip. The differential x(t) is sent to each A/D channel through routes similar to those for s(t). The x(t) and s(t) routes are shielded separately to avoid coupling between the two signals. Finally, the timing-skew calibration dictates that the s(t) and s(t) samplers in the same A/D channel have the same turn-off instants. They are placed in close proximity. To avoid leaking s(t) to the s(t) sampler, both samplers are surrounded by separate guard rings. The two samplers are driven by the same clock driver, whose output impedance is made low to minimize s(t) leakage. The reference clock $\phi_r$ with a frequency $f_c=2~\mathrm{GHz}$ is generated from an off-chip signal generator. It is converted into a differential signal using a 180° power splitter. The ADC test input signal s(t) is generated from another signal generator synchronized with the $\phi_r$ clock generator. The ADC chip is mounted directly on a printed circuit board. Digital outputs from all A/D channels, $s_1[k]$ to $s_8[k]$ , are first downsampled by a ratio of 1/64 and then sent off-chip to a logic analyzer. The final TI ADC digital output stream s[l] is constructed by resampling the acquired data. The equivalent down-sampling ratio is 1/64.125. Fig. 21 shows the measured differential nonlinearity (DNL) and integral nonlinearity (INL) of a single A/D channel. Before activating the calibration, the DNL is -1.0/+4.9 LSB and the INL is -4.3/+5.4 LSB. There are missing codes. After activating the offset calibration, the DNL becomes -0.5/+0.6 LSB and the INL is reduced to -0.4/+0.7 LSB. Fig. 22 is the measured ADC output spectra with and without the timing-skew calibration. The sampling rate is 16 GS/s. The input signal is a full-swing 2.9 GHz sine wave. Without the calibration, there are many spurious tones caused by clock timing skews. When the calibration is turned on, most of skew-related spurious tones are eliminated. Note that the locations of the skew-related spurious tones are shuffled because of downsampling and resampling of the output codes. The remaining harmonic tones in the spectrum is mainly due to the non-ideal input signal paths, including the distortion of the power splitter and the mismatches of the wire parasitics and the sample switches. The harmonic distortion of the ADC can be improved by employing a chip layout of better symmetry and using better signal sources. Fig. 22. Measured TI ADC output spectra. Sampling rate is 16 GS/s. Input frequency is $f_{in}=2.9~\mathrm{GHz}.$ Fig. 23. Measured TI ADC SNDR versus input frequencies. Fig. 23 shows the measured TI ADC signal-to-distortion-plus-noise ratio (SNDR) versus input frequencies. The sampling rate is 16 GS/s. The effective resolution bandwidth (ERBW) is 3 GHz, which is limited by the bandwidth of the ADC input sampling switches. At frequencies near ERBW, the SNDR is improved from 19.8 dB to 28.0 dB by the timing-skew calibration. Table I summarizes the measured specifications of this TI ADC chip. The input capacitance is 1.8 pF for each input pin. The power consumption is 435 mW, excluding I/O. Each A/D channel consumes 54 mW. Most of the dissipated power is dynamic power. Table II compares this work with other recently published TI ADCs having a sampling rate over 10 GS/s. In the table, the ADC figure-of-merit (FOM) is defined as $$FOM = \frac{Power}{2^{ENOB} \times 2 \times ERBW}$$ (3) where ENOB is the effective number of bits at low input frequencies, and ERBW is the effective resolution bandwidth at which ENOB drops by 0.5 bit. FOM for this work is 2.6 pJ/conversion-step. The competitive FOM of this chip is obtained by using the latch-type comparators with automatic offset calibration. Better FOM can be achieved if the s(t) samplers are realized with bootstrapped switches to improve ERBW. In addition, | TABLE I | |----------------------------| | TI ADC PERFORMANCE SUMMARY | | Technology | 65nm CMOS | | | |-----------------------------------|------------------------|--|--| | Resolution | 6 Bit | | | | Input Loading | 1.8 pF | | | | Supply Voltage | 1.5 V | | | | Sampling Rate | 16 GS/s | | | | Differential Input Range | $0.813 \text{ V}_{pp}$ | | | | SNDR $(f_{in} = 170 \text{ MHz})$ | 30.8 dB | | | | SNDR $(f_{in} = 3 \text{ GHz})$ | 28.0 dB | | | | SFDR $(f_{in} = 170 \text{ MHz})$ | 37.4 dB | | | | SFDR $(f_{in} = 3 \text{ GHz})$ | 40.4 dB | | | | Power Consumption | 435 mW | | | | Active Area | 1.47 mm <sup>2</sup> | | | TABLE II COMPARISON OF HIGH-SPEED TI ADCS | Publication | This Work | [11] | [10] | [9] | [8] | [7] | |------------------|-----------|------|---------|------|---------|------| | Technology (nm) | 65 | 65 | 65 | 90 | 90 | 180 | | Resolution (Bit) | 6 | 5 | 6 | 6 | 6 | 8 | | TI channels | 8 | 8 | 16 | 8 | 16 | 80 | | Speed (GS/s) | 16 | 12 | 40 | 10.3 | 24 | 20 | | Supply (V) | 1.5 | 1.1 | 1.0/2.5 | N.A. | 1.0/2.5 | N.A. | | Power (mW) | 435 | 81 | 1500 | 1600 | 1200 | 9000 | | ERBW (GHz) | 3 | 6 | 7 | 4 | 6 | 2 | | ENOB (Bit) | 4.9 | 4.3 | 5.5 | 5.8 | 5.5 | 6.5 | | FOM (pJ/step) | 2.6 | 0.35 | 2.4 | 3.6 | 2.0 | 24.8 | this chip includes a timing-skew calibration that can continuously operate in the background. ## VI. CONCLUSIONS An 8-channel 6-bit 16-GS/s time-interleaved ADC was fabricated using a 65 nm CMOS technology. The chip demonstrates our proposed digital background calibration techniques, including comparator offset calibration and timing-skew calibration. The calibrations relax the matching requirements for devices and layout, and also provide robustness against process-voltage-temperature (PVT) variations. The calibrations can automatically adapt to the variations that are slower than the calibration time constants. They provide design choices that can be applied to improve circuit performances. #### ACKNOWLEDGMENT The authors thank Taiwan Semiconductor Manufacturing Company (TSMC), Hsinchu, Taiwan, for chip fabrication. # REFERENCES - K. Uyttenhove and M. S. J. Steyaert, "Speed-power-accuracy tradeoff in high-speed CMOS ADCs," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 280–286, Apr. 2002. - [2] S. Tsukamoto, W. G. Schofield, and T. Endo, "A CMOS 6-b, 400-MSample/s ADC with error correction," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 1939–1947, Dec. 1998. - [3] C. Sandner, M. Clara, A. Santner, T. Hartig, and F. Kutter, "A 6-bit 1.2-GS/s low-power flash-ADC in 0.13-μm digital CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 7, pp. 1499–1505, Jul. 2005. - [4] M. Choi and A. A. Abidi, "A 6-b 1.3-GSample/s A/D converter in 0.35-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1847–1858, Dec. 2001. - [5] X. Jiang and M.-C. F. Chang, "A 1-GHz signal bandwidth 6-bit CMOS ADC with power-efficient averaging," *IEEE J. Solid-State Circuits*, vol. 40, no. 2, pp. 532–535, Feb. 2005. - [6] A. Ismail and M. Elmasry, "A 6-bit 1.6-GS/s low-power wideband flash ADC converter in 0.13-μm CMOS technologh," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 1982–1990, Sep. 2008. - [7] K. Poulton, R. Neff, B. Setterberg, B. Wuppermann, T. Kopley, R. Jewett, J. Pernillo, C. Tan, and A. Montijo1, "A 20 GS/s 8 b ADC with a 1 MB memory in 0.18 μm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2003, pp. 318–496. - [8] P. Schvan, J. Bach, P. F. C. Falt, R. Gibbins, Y. Greshishchev, N. Ben-Hamida, D. Pollex, J. Sitch, S.-C. Wang, and J. Wolczanski, "A 24 GS/s 6 b ADC in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2008, pp. 544–545. - [9] A. Nazemi, C. Grace, L. Lewyn, B. Kobeissy, O. Agazzi, P. Voois, C. Abidin, G. Eaton, M. Kargar, C. Marquez, S. Ramprasad, F. Bollo, V. A. Posse, S. Wang, and G. Asmanis, "A 10.3 GS/s 6 bit (5.1 ENOB at Nyquist) time-interleaved/pipelined ADC using open-loop amplifiers and digital calibration in 90 nm CMOS," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2008, pp. 18–19. - [10] Y. M. Greshishchev, J. Aguirre, M. Besson, R. Gibbins, C. Falt, P. Flemke, N. B. Hamida, D. Pollex, P. Schvan, and S.-C. Wang, "A 40 GS/s 6 b ADC in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2010, pp. 390–391. - [11] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-bit time-interleaved flash ADC with background timing-skew calibration," in Symp. VLSI Circuits Dig. Tech. Papers, 2010, pp. 157–158. - [12] O. Agazzi, M. Hueda, D. Crivelli, H. Carrer, A. Nazemi, G. Luna, F. Ramos, R. Lopez, C. Grace, B. Kobeissy, C. Abidin, M. Kazemia, M. Kargar, C. Marquez, S. Ramprasad, F. Bollo, V. Posse, S. Wang, G. Asmanis, G. Eaton, N. Swenson, T. Lindsay, and P. Voois, "A 90 nm CMOS DSP MLSD transceiver with integrated AFE for electronic dispersion compensation of multimode optical fibers at 10 Gb/s," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2939–2957, Dec. 2008. - [13] C.-C. Huang, C.-Y. Wang, and J.-T. Wu, "A CMOS 6-bit 16-GS/s time-interleaved ADC with digital background calibration," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2010, pp. 159–160. - [14] C.-C. Huang and J.-T. Wu, "A background comparator calibration technique for flash analog-to-digital converters," *IEEE Trans. Circuits Syst. I*, vol. 52, no. 9, pp. 1732–1740, Sep. 2005. - [15] C.-Y. Wang and J.-T. Wu, "A background timing-skew calibration technique for time-interleaved analog-to-digital converters," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 4, pp. 299–303, Apr. 2006. - [16] C.-Y. Wang and J.-T. Wu, "A multiphase timing skew calibration technique using zero crossing detection," *IEEE Trans. Circuits Syst. I*, vol. 56, no. 6, pp. 1102–1114, Jun. 2009. - [17] G. Van der Plas, S. Decoutere, and S. Donnay, "A 0.16 pJ/conversionstep 2.5 mW 1.25 GS/s 4 b ADC in a 90 nm digital CMOS process," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2006, pp. 2310–2312. **Chun-Cheng Huang** (S'03–M'08) was born in Chia-Yi, Taiwan, in 1970. He received the B.S. degree in electrophysics from National Chiao Tung University, Taiwan, in 1992. From 1992 to 1994, he served as an ordnance officer in R.O.C. Army. From 1994, he has worked in the area of analog circuit design and has kept pursuing related degrees. In 1999, he received the M.S. degree in electrical engineering from National Don Hwa University, Taiwan. In 2010, he received the Ph.D. degree in electronics engineering from National Chiao Tung University, Taiwan. From 2010, he has been with IC Plus Corporation, where he is responsible for the design of high-speed data converters now. Dr. Huang is a member of Phi Tau Phi. Chung-Yi Wang was born in Tai-Chung, Taiwan. He received the B.S., the M.S. and the Ph.D. degrees in electronics engineering from National Chiao Tung University, Taiwan, in 2002, 2003, and 2010, respectively. In 2009, he joined Mediatek Inc., where he is responsible for analog and mixed signal circuit design now **Jieh-Tsorng Wu** (M'87–SM'06) was born in Taipei, Taiwan. He received the B.S. degree in electronics engineering from National Chiao Tung University, Taiwan, in 1980, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1983 and 1988, respectively. From 1980 to 1982 he served in the Chinese Army as a Radar Technical Officer. From 1982 to 1988, at Stanford University, he focused his research on high-speed analog-to-digital conversion in CMOS VLSI. From 1988 to 1992 he was a Member of Technical Staff at Hewlett-Packard Microwave Semiconductor Division in San Jose, CA, and was responsible for several linear and digital gigahertz IC designs. Since 1992, he has been with the Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan, where he is now a Professor. His current research interests are high-performance mixed-signal integrated circuits. Dr. Wu is a member of Phi Tau Phi. He has served as an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS.