## **Chapter 3**

# A 1.25 Gb/s Clock and Data Recovery Design

## **3.1 Introduction**

This chapter discusses the circuit design more detailed in transistor level. The design method could be applied to a CDR with the input data rate of 1.25 Gb/s. The process and model used for the circuit design is the TSMC CMOS  $0.35 \,\mu$  m 2P4M process. We simulate the CDR with HSPICE to acquire the detail electrical behavior. Moreover, the process variations and the temperature effects should be taken into account. We have to simulate the CDR in high temperature, slow process and fast process besides the normal temperature and typical process. After that, the post-layout simulation that uses the extracted netlist and addition including the parasitic resistances and capacitances must be simulated.

## **3.2 Circuit Description**

#### 3.2.1 Input and Output Interface

Traditionally, high-speed links in Gb/s range have been implemented in GaAs MESFET or Si bipolar technologies. The primary advantage provided by those technologies is faster intrinsic device speed. However, the CMOS technology is more widely available and allows higher integration than other technologies. In order to

recovery the high-speed signals reliably, the circuit at the receiving front end must be able to resolve the small inputs at very high data rate, and gives the sampling strobe correctly.

#### 3.2.1.1 Preamplifier

In this work, a CDR system requires a full-swing NRZ data for correct operation. Moreover, when the differential data enter the chip, they will be distorted due to the inductance and capacitance resonance caused by bonding wire and pad. Figure 3-1 shows the schematic of the preamplifier, which detect the input signal with a Schmitt trigger M1~M6 and produce full swing at output node [27]. It is actually an open-loop comparator. To meet the common mode voltage range, the circuit is implemented with PMOS input differential pairs (M1~M2) with a constant current source and using NMOS crossed-coupled pairs as the load (M3~M6).



Fig. 3-1 Schematic of preamplifier

The preamplifier needs to be able to detect the received signal which were noisy and swing limited and amplify the signal to obtain the nearly full swing voltage level at the output. Hence, the gain and bandwidth of the preamplifier should be designed carefully to meet the requirement. Furthermore, the offset voltage of the preamplifier also affects the correct operation of the CDR. The offset voltage is not only introduced by the mismatched in the input devices but also mismatches (both device and capacitance mismatch) within the positive-feedback structure. These errors are referred back to the input as the input-offset voltage. There is a on-chip termination resistor to match the characteristic impedance of the channel to reduce the parasitic effect due to the packages and reflections.

Figure 3-2 is the frequency response of the preamplifier. It can be shown that within the full data rate of the transmitted NRZ signal, it still 25.1dB gain. The advantage of this hysteresis comparator is noise immunity and noise is cut off by the threshold voltage as figure 3-3. If the size ration  $A = \frac{(W/L)_4}{(W/L)_3}$  and bias current through M1 is IB, the threshold voltage is derived as

$$\pm V_{th} = \pm \sqrt{\frac{I_B}{K}} \cdot \frac{(\sqrt{A} - 1)}{\sqrt{1 + A}}$$
(3.1)

The threshold voltage depends on not only the bias current but also the size ratio of the lower two current mirrors. If A<1, there is no hysteresis in transfer function, when A>1, hysteresis will result as figure 3-3. Figure 3-4 shows the corresponding output signal of the preamplifier, the limited received signals which are about 200mV are being amplified to the full scale. The NRZ data streams are sent to the following core of the clock and data recovery circuit to obtain the data value.



Fig. 3-2 Frequency response of the preamplifier





Fig. 3-3 Hysteresis window of the preamplifier



Fig. 3-4 The output of preamplifier with input of 2<sup>7</sup>-1 PRBS

441111

#### 3.2.1.2 Output Driver

A source follower circuit is used as an output driver in the receiver chip for the convenience of measurement, as shown in figure 3-5 [28]. It shows that even with the consideration of package parasitic and the loading effect of  $50\Omega$ , the output signal could be measured off-chip. In this design, we employ the solution, which achieve low output impedance while saving power with a local feedback made of the current source and the transistor M2, around the transistor M1. The single-ended closed-loop output resistance is given by the open-loop resistance  $\frac{1}{\text{gm1}}$  divided by 1+T. where

$$T = gm_2 \cdot R \tag{3.2}$$



Fig. 3-5 Output Driver by the source follower

#### 3.2.2 Frequency Detector

The frequency detector primarily consists of DFFs and XORs. Figure 3-6 shows the TSPC (True Single Phase Clock) D Flip-Flop [29], which is the only part operating at the maximum frequency component. These kinds of DFFs have been widely used in many digital circuits because of its feature of high operating speed and simple circuit structure. In addition, to operate high speed, it is important to reduce the effective capacitance of internal and external nodes, which lead to the reduction of the power consumption as well as the propagation delay. Therefore, we take care about the layout techniques much more carefully for the high-speed circuits. Besides, there are two XOR gates in the design of frequency detector. Since the sampling D Flip-Flops provide a full digital swing, a digital XOR gate can be considered [30]. Figure 3-7 shows the schematic of the XOR circuit. When A=0, the transmission gate is switched into conduction state so that the output is equal to B. When A=1, the transmission gate could be disabled and active the inverter composed of Mp and Mn so that the output is equal to  $\overline{B}$ . In brief, the Boolean expression is given by



Fig. 3-6 Schematic of TSPC DFF



Fig. 3-7 Schematic of the XOR gate

(3.3)

#### 3.2.3 Phase Detector

The schematic of a half-rate four step bang-bang phase detector and its characteristics are shown in figure 3-8 [21]-[23]. Five D Flip-Flops triggered by clock 1 to 5, phase shifted by  $\pi$ /4, sample the input data stream. The sampling is performed in sequence and the parasitic delays of the sampling circuits track each other, therefore any variations in parasitic delays due to changes in environment is automatically compensated. The optimum sampling point is when clock3 and the data transitions coincide. The pulses PU1, PU2, PD1 and PD2 represent the occurrence of the data transitions between the adjacent sampling clocks. The widths of these controlling pulses are fixed at half of the clock period (line5). The delay on the clock line 5 is equal to a D Flip-Flop propagation delay.

The phase detector output pulses control the charge pump to create four different current steps. The sign of the current steps depends on the early or late occurrences of the data transition versus clock 3. The currents flow through the loop filter and generate the voltage steps  $\pm \frac{I_L}{4} \cdot Kh$  and  $\pm \frac{3I_L}{4} \cdot Kh$ , where the Kh is the DC gain of the loop filter. Consequently, the voltage steps result in proportional clock frequency steps  $\pm \frac{I_L}{4} \cdot Kh \cdot Kvco$  and  $\pm \frac{3I_L}{4} \cdot Kh \cdot Kvco$ , where the Kvco is the VCO conversion gain. Due to the four-step bang-bang phase detector, we can imitate the analysis of a CDR with a linear phase detector. Therefore, the gain of the phase detector, Kpd, is given by [21]

$$K_{PD} = \frac{2 \cdot I_L}{p} \tag{3.4}$$

where the  $I_L$  is the nominal value of charging current at the occurrence of the maximum phase error. Actually, there are only two kinds of current values the charge

pump provided,  $\frac{I_L}{4}$  and  $\frac{3I_L}{4}$ . According to the equation (3.4), we can decide the

CDR parameters as the previous chapter we mentioned.



Fig. 3-8 (a) Schematic of the phase detector (b) Phase detector characteristics

#### 3.2.4 Charge Pump

There are two charge pumps in this work as shown in figure 2-1. One of the charge pumps provides four-step currents to control the frequency of the VCO according to the control signals in the half-rate phase detector, and the other one provides larger currents to enhance the frequency acquisition. In this work, there are two different structure charge pumps employed in our design. We will discuss more detailed in the following subsection.

#### 3.2.4.1 Charge Pump 1

Figure 3-9 shows the schematic of the charge pump [19], [31], which overcomes the charge injection produced by the overlap capacitance of the switch devices and by the capacitance at the intermediate node between the current source and switch in the typical charge pump [32]. In order to attenuate any switching errors that may reach the sensitive output node, switch devices are placed on the side of the current source devices. When switch devices are off, the intermediate nodes between each switch



Fig. 3-9 Schematic of the charge pump 1

devices and current devices will be charged toward the output voltage only by the gate overdrive of the current source devices, Vgs-Vt, and amount independent of the output voltage. The control voltage "**out**" is isolated from the switching noise.

A common problem of many charge pumps is charge sharing. For the charge pump in figure 3-10(a) (Type A), charge sharing is caused by the parasitic capacitance in nodes pcs and ncs [32]. When Idn is active, node pcs is charged to Vdd. When deactivating Idn, some of the charge stored in node pcs will leak through the current source device. The two transistors Mp and Mn in the Type B charge pump in figure 3-10(b) will remove the charge from the nodes pcs and ncs when PD and PU are deactivated [33]. The charge removal transistor Mp and Mn eliminate the current tails resulting in a well-balanced PU and PD activation time such that Iup and Idn cancel each other, reducing the loop filter ripple.



### Fig. 3-10 (a) Charge-pump suffering from charge sharing (Type A) (b) Charge removal transistors eliminate charge sharing (Type B)

#### 3.2.4.2 Charge Pump 2

In this subsection, we propose another charge-pump technique, which makes use of partial positive feedback to increase the switching speed and current reuse to minimize the power consumption, to provide larger current to obtain fast frequency acquisition [34]. Figure 3-11 shows the proposed circuit. It makes use of the current source IB when IB is steered through M1, reusing this current to charge the node A as shown in figure 3-11(a). Due to the injection current, M4 is turned-off faster. However, the existence of a slow-path still remains, because the problem has been translated to node B on the other branch. To solve this, pull-up transistor M7 is added as shown in figure 3-11(b). The final topology is a simple positive feedback amplifier, which uses the gain enhancement techniques proposed by Harjani [35]. The amount of positive feedback and gain will be determined by the ratio of cross-coupled and diode-connected transistors, which are given by

$$a = \frac{\left(\frac{W}{L}\right)_{6}}{\left(\frac{W}{L}\right)_{5}} \qquad Av = \sqrt{\frac{m_{n}(W/L)_{1}}{m_{p}(W/L)_{5}}} \cdot \frac{1}{1-a}$$
(3.5)

Switching speed of the circuit depends on the magnitude of the current source IB and the stray capacitance at node A. Capacitance at node A is given by

$$C_A \approx C_{GS3} + C_{GS4} + C_{GS7} \tag{3.6}$$

Of equation (3.5), it can be deducted that the maximum value for  $\alpha$  must be 1, or the hysteresis phenomenon occurs. A practical value is 0.75 [36]. The whole circuit is shown in figure 3-12, which is composed of two current reuse cells and a 1:1 current converter.



Fig. 3-11 (a) Original current reuse topology (b) Final current reuse



Fig. 3-12 Schematic of the charge-pump 2 in this work

#### 3.2.5 Voltage Controlled Oscillator

#### 3.2.5.1 The Fundamental of VCO

At first, a voltage controlled oscillator (VCO) is the most sensitive building block in a clock and data recovery circuit as far as supply and substrate noise is concerned. Consequently, careful design is necessary in order to reduce noise and frequency drift. Many issues affect VCO noise. However, as it will be clear later, the noise is limited by the system design and not as much in the VCO design. Therefore, many VCO circuit design techniques can be implemented but will not improve performance greatly.

The second requirement is to accomplish high-frequency operation with reasonable power consumption. The comparison between the LC-tank oscillators and the ring oscillators is listed in table 3-1. The CMOS LC-tank oscillator shows an excellent phase noise performance with low power consumption because of a relatively high quality factor. However, there are several disadvantages in LC-tank oscillators. Firstly, the tuning range of LC-tank oscillator is usually much less than one of the ring oscillators. A typical value of the tuning range of the LC-tank oscillator is about 10~20% that might not be wide enough to encompass the process and the temperature variations in CMOS technology. This limited tuning range makes it difficult to acquire the desired frequency without fabrication iterations. Secondly, the LC-tank oscillator occupies a lager chip area because of the implementation of the on-chip spiral inductor and therefore it is not suitable for integration. In addition, a larger chip area also means higher cost. So the cost of the LC-tank oscillator is higher than the cost of the ring oscillator. Besides, the ring oscillator presents several attractive features such as the wide tuning range, the excellence of integration with digital CMOS process, the small occupied chip area and the low cost. Moreover, ring oscillators generate both in-phase and quadrature-phase outputs with an even number

of delay cells.

| Types of VCO       | LC-tank oscillator                 | Ring oscillator    |
|--------------------|------------------------------------|--------------------|
| Operation speed    | Technology dependent 1-10's of GHz |                    |
| Phase noise        | Good                               | Poor               |
| Tuning range       | Narrow                             | Wide               |
| Power consumption  | Low                                | Medium             |
| Process Monolithic | Poor                               | Excellent          |
| Cost               | High                               | Low                |
| other              | 1896                               | Multi-phase clocks |

Table 3-1 Comparison between LC-tank oscillator and ring oscillator

The building block of the VCO include a four stages ring oscillator and a self-biased replica-feedback bias generator [19], [31]. Figure 3-13 shows the schematic of the four stages VCO and the delay cell. In order to have the low jitter characteristics of the output clock, the delay cell used in voltage controlled oscillator (VCO) should have low sensitivity and high noise rejection capability of the supply and substrate voltage. The supply noise can be categorized into static and dynamic noise. The architecture of the VCO used in this work can greatly improved the static and dynamic supply noise [31].

The delay cell of the VCO contains a source-coupled pair with diode-connected PMOS devices as resistive loads in shunt with an equally sized PMOS device. They are called symmetric loads because their I-V curve is symmetric about the center of



the voltage swing, as shown in figure 3-14.

Fig 3-14 I-V curve of the symmetric load

Basically, to achieve the high noise rejection capability over the supply and substrate noise, the load of the differential pair should have a linear I-V characteristic. In practice, this is difficult to use MOS technology to achieve it. But the symmetric load can cancel the first order of the common mode voltage noise. Therefore, the symmetric load here, though nonlinear, could be used to have high dynamic supply noise immunity. The control voltage, Vbp, is the bias voltage for the PMOS device. In order to provide a bias current that is independent of the static supply noise, the bias voltage of the NMOS current source, Vbn, will be continuously adjusted. As the supply voltage changes, the drain voltage of the NMOS current source also changes. However, the gate bias is adjusted by the replica-feedback bias generator to keep the output current constant. It seems that it makes the output resistance of the NMOS current source higher. Therefore, the static supply noise is greatly improved.

Based on the analysis of the I-V curve, it can be shown that the effective resistance of a symmetric load ( $R_{eff}$ ) is directly proportional to the small signal resistance at the ends of the swing range which is just one over the transconductance (gm) for one of the two equally sized PMOS biased at Vctrl. Therefore, the buffer delay is given by

$$t_d = \mathbf{R}_{eff} \cdot C_{eff} = \frac{1}{gm} \cdot C_{eff}$$
(3.7)

where  $C_{eff}$  is the effective buffer output capacitance. The drain current for one of the two equally sized devices bias at Vctrl is

$$Id = \frac{kp}{2} [(Vdd - V_{ctrl}) - |V_{tp}|]^2$$
(3.8)

Taking derivation with respect to Vctrl, the transconductance gm is given by

$$gm = kp[(Vdd - V_{ctrl}) - |V_{tp}|]$$
(3.9)

The buffer delay is then given by

$$t_d = \frac{C_{eff}}{kp[(Vdd - V_{ctrl}) - \left|V_{tp}\right|]}$$
(3.10)

Therefore, for N stages of the VCO, the oscillator frequency is given by

$$f_{osc} = \frac{1}{2N \cdot t_d} = \frac{kp[(Vdd - V_{ctrl}) - |Vtp|]}{2N \cdot C_{eff}}$$
(3.11)

The gain of the VCO is given by

$$K_{vco} = \frac{\partial f_{osc}}{\partial V_{ctrl}} = -\frac{kp}{2N \cdot C_{eff}}$$
(3.12)

As a result,  $K_{vco}$  is independent of the buffer bias current and the VCO has first order tuning linearity.

The self-biased replica-feedback bias generator of the VCO delay cell is shown in figure 3-15. It provides the output bias voltage Vbp and Vbn from input signal Vctrl. The primary function is to continuously adjust the VCO delay buffer bias current to provide the correct lower swing limit Vctrl for the VCO delay buffer stages. As a result, it builds up a current that is held constant and independent of supply voltage.



Fig. 3-15 Schematic of self-biased replica-feedback bias generator

The self-biased replica-feedback bias generator consists of a PMOS source coupled differential pair, a half-buffer replica, and a control voltage buffer. The differential amplifier is actually a unity-gain buffer which forces the voltage of node Va in figure 3-15 equal to Vctrl, a condition required for correct symmetric load swing limits, and provide the bias voltage Vbn for the NMOS current source. Besides, the bias voltage, Vbn, is dynamically adjusted by the differential amplifier to increase the supply noise immunity. With the half-buffer replica, the bet result is that the output current of the NMOS current source is established by the load element and is independent of the supply voltage. If the supply voltage changes, the amplifier will adjust to keep the swing and the bias current constant. Because the differential amplifier employs the self-biased architecture, there are two stable states, one of which is unbiased. As a result, a star-up circuit is necessary to bias the amplifier when power-on.

Since the differential amplifier and the half-buffer replica form a two stage negative feedback loop, frequency response issue must take into consideration. Figure 3-16 shows the frequency response of the self-biased replica-feedback bias generator.



Fig. 3-16 Frequency response of the self-biased replica-feedback bias generator

Basically, there are two poles in the loop. One is at the amplifier output, and the other one is at the half-buffer replica output. Since the pole at the amplifier output is the dominant, it can be removed toward origin to increase the phase margin of the loop by the capacitive load Cc of the NMOS current source gates in the VCO buffer chain. Moreover, in order to track any supply and substrate noise that affect the VCO jitter performance, the bandwidth of the self-biased circuit is usually ser equal to the operation frequency of the VCO. The bias circuit also provides a buffered version of control voltage Vctrl using an extra control voltage buffer. This can isolate the control voltage Vctrl from capacitive coupling in the VCO buffer chain.

The differential oscillator output is converted to 50% duty cycle single-ended converter used as input to the phase detector and frequency detector with the differential-to-single-ended converter as shown in figure 3-17. The two differential amplifier of the differential-to-single-ended converter use the same current source bias voltage, Vbn, generated by the self-biased replica-feedback bias generator for the VCO. According to Vbn, the circuit corrects the input common-mode voltage level and provides signal amplification.



Fig. 3-17 Schematic of differential-to-single-ended converter with 50% duty cycle

#### 3.2.5.2 Practical Design

In practical design, the frequency range might not encompass the process and the temperature variations in CMOS technology. We should consider that the operation of the frequency detector is correct and the magnitude of Kvco is a reasonable value. The additional capacitive loads added at the output of delay cell are shown in figure 3-18. This method resolves the frequency range issue to overcome the process and the temperature variations.



Fig. 3-18 Schematic of the delay cell with additional capacitive loads

Thanks to the restricted capture range of the frequency detector, a circuit to confine the frequency range of the VCO output is necessary. A linearization circuit is used and shown in figure 3-19 [37]. The input controls voltage, Vctrl, is not directly applied to the VCO, but is converted to another voltage, Vtun, with a scaling-linear characteristic. The product of this transfer curve with the VCO tuning sensitivity should be as constant as possible to achieve a linear overall tuning range. The output voltage, Vtun, changes with the input voltage, Vctrl, which cover the linear gradation characteristic of the VCO. Figure 3-20 shows the characteristic transfer curve of the linear circuit.



Fig. 3-19 Schematic of linearization circuit



Fig. 3-20 Transfer curve of the linear circuit

Figure 3-21(a) shows the simulated transfer curve of the VCO without additional capacitive loads and linearization circuit. Figure 3-21(b) shows the transfer curve of the VCO with additional capacitive loads but without linearization circuit. Figure 3-21(c) shows the transfer curve with both additional capacitive loads and linearization circuit. The VCO uses four delay buffer stages with the output frequency at 625MHz. The supply voltage is 3.3V. The gain of the VCO is -65MHz/Volt and the

transfer curve is monotonic. The tuning range of the VCO is 550MHz~715MHz for typical condition, which falls inside the capture range of the frequency detector. It means that when power turns on, the CDR can act correctly.



(b)



Fig 3-21 The transfer curve of the VCO (a) without additional capacitive loads and linearization circuit (b) without linearization circuit (c) with both trimming circuits

WHU I

## 3.3 System Simulation Result

CDR closed-loop simulation results are given in this section. Firstly, the acquisition process can be verified form figure 3-22 where the initial oscillation frequency of VCO is higher than the desired value. The control voltage and the frequency detector outputs, UP and DOWN are also shown in figure 3-22. The top graph shows that the frequency-training loop can bring the frequency close enough for the CDR loop to lock within  $3.5 \,\mu$  s. It shows the loop can capture and lock very fast. On the other hand, if the initial oscillation of VCO is lower than the desired frequency, the acquisition process can be verified from figure 3-23. Secondly, Figure 3-24 shows the retimed NRZ data with  $2^7$ -1 PRBS (Pseudo Random Binary Sequence) and the retimed clock at 1.25 GHz. It shows that the loop can tolerate a slight frequency offset, which is much larger than the frequency variation between the VCO output and input NRZ data, and can lock under the noisy random data stream.



Fig. 3-22 Acquisition process for initial fvco > desired frequency



Fig. 3-23 Acquisition process for initial fvco < desired frequency



Fig. 3-24 Retimed data and retimed clock

Finally, jitter of the VCO output signal resulting from pattern-dependent noise can be simulated. The waveform of VCO output signal can be folded as illustrated in figure 3-25. The resulting peak-to-peak jitter is calculated to be 38.2ps.



Fig. 3-25 jitter of the VCO output for input data with 2<sup>7</sup>-1 PRBS