# 國 立 交 通 大 學

電機與控制工程研究所

## 碩 士 論 文

每秒一百億筆資料傳輸之新型樹狀序列器傳輸端 EESA A Novel Tree-Type Serializer for 10Gbps Transmitter *THURSE* 

研 究 生:陳冠宇

指導教授:蘇朝琴 教授

中 華 民 國 九 十 五 年 九 月

# 每秒一百億筆資料傳輸之新型樹狀序列器傳輸端 A Novel Tree-Type Serializer for 10Gbps Transmitter

研 究 生:陳冠宇 Student : Guan Yu Chen

指導教授:蘇朝琴 教授 Advisor : Chau Chin Su

國 立 交 通 大 學

電機與控制工程研究所



Submitted to Department of Electrical and Control Engineering  $T_{\rm H\,111}$ College of Electrical Engineering and Computer Science

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Master

in

Electrical and Control Engineering

September 2006

Hsinchu, Taiwan, Republic of China

中華民國九十五年九月

每秒一百億筆資料傳輸之新型樹狀序列器傳輸端

研究生 : 陳冠宇 指導教授 : 蘇朝琴 教授

### 國立交通大學電機與控制工程研究所

#### 摘 要



本論文提出一個可用在每秒一百億位元傳輸的序列輸出入端的新型樹狀序列器,利用 九十度相位差異的時脈來作為類似開關控制,以排除一般設計中對於時序再重置方法的需 要,因此在功率消耗與電路面積上能有顯著降低。模擬結果顯示在每秒一百億位元傳輸, 相較於一般的序列器的序列器消耗百分之七十的功率以及百分之二十二的面積。

在本篇論文中,我們設計了一個每秒一百億位元傳輸器。使用台積電 0.13µm 2P8M CMOS 製程來實現,此傳輸電路在1.2 伏特的電源供應下消耗功率 27 毫瓦。

 另外,我們也提出在晶片內部傳輸的通道模型以及所對應一公分通道長的低功率驅動 器設計。

關鍵字: 序列器, 多工器, 解序列器, 解多工器, 高速序列連結, 新型樹狀序列器, 九十度相位差異的時脈

## A Novel Tree-Type Serializer for 10Gbps Transmitter

Student: Guan Yu Chen Advisor: Chau Chin Su

### Department of Electrical and Control Engineering

National Chiao Tung University

### **Abstract**

This thesis proposes a novel tree-type serializer for 10Gbps serial I/O. It uses quadrature clocks as switch controls to eliminate the need for retiming in a conventional design. As a result, power consumption and circuit area is significantly reduced. Simulation results show that at 10Gbps the proposed serializer consumes 0.7 of power and occupies 0.22 of area as compared to a conventional one. 1896

In this thesis, a 10 Gbps transmitter has been designed. It is implemented in TSMC  $0.13 \mu$ m 2P8M CMOS process., the transmitter circuit consumes 27mW on a 1.2V power supply.

Besides, we analyze the on-chip channel model and design a low power driver for the 1cm channel.

**Keyword: Serializer, Multiplexer, Deserialize, DeMultiplexer, High-speed serial links, novel tree-type serializer, quadrature clocks** 

#### 致 謝

我首先要感謝我的指導教授 蘇朝琴老師,感謝老師指導我的研究以及做研究的精神。

接著要感謝大師兄 鴻文學長,您總是很有耐心的指點學弟;還要感謝丸子學長,在您 的維持下,實驗室才能正常運作,打 AOC 才不會 lag,嗯…,我是說 Layout 才不會 lag; 當然還要謝謝仁乾和盈杰學長的指教和建議。

還要謝謝王照勳學長,感謝您在 TSMC0.13um 製程上不厭其煩的解答。還有汝敏在 TSMC0.13um 製程上的鼎力相助和洪老師實驗室的鼎鈞與振綱與我討論製程的問題以及 CIC 的 TSMC0.13 負責人張文旭先生解答任何我在製程上的使用問題,感謝你們。當然不能忘記 中央的各位學長們,包括顯元學長,毅山學長,育凱學長幫我下探針,常常陪我一忙就是 一整天,真是讓我萬分感激也十分不好意思,謝謝你們。另外還要謝謝煜輝學長,阿亮學 長,阿達學長,瑛佑學長,Cgu 學長,Ku 學長,阿銘學長的指導,還有志龍跟大姐與我分 享量測上的經驗,以及忠傑,順閔,TOTORO,小朱的幫忙。

再來要感謝智琦無私的解答,以及招牌般的笑容,恭喜老大你脫團了(哼)~~~,果然 好心有好報。還有匡良, 楙軒,宗諭, 大家彼此之間的鼓勵打氣與互相扶持。還有祥哥, 教主,皇如,村鑫,小馬,奶油哥,方董,威翔,存遠,大家一起烤肉打球聊天的時光是 美好的回憶。

當然,還有助理依萍,雅雯,俊秀,感謝妳們通知我們新消息以及…嗯,開會通知和 簽到表(Bad Dream~~)。

還要感謝士豪,教練,螞蟻,小 z,還有大學同學們的鼓勵,謝謝大家

最後當然是要感謝我的家人-爸爸,媽媽,沒有您們的鼓勵和支持,也不會有今天的我, 感謝你們。

アンディア アクセス しゅうしょう しゅうしょう かいじょう 陳冠字 2006/7/23

# **List of Contents**







# **List of Tables**



# **List of Figures**









## **Chapter 1**

## **Introduction**



## **1.1 CMOS High-Speed Serial Links**

High-speed serial links in Gbps range are usually implemented in bipolar or GaAs technologies. The primary reason is the higher bandwidth of those devices. However, CMOS transistors process technology has grown exponentially in recent years. It results in a remarkable improvement in the operating speed and integration level. [1]

Figure 1.1 is a conventional serial link system. It comprises three primary components: a transmitter, a channel, and a receiver. The high-speed data sent by a transmitter are analog signal. These analog signals known as *non-return-to-zero* (NRZ) use either a HIGH-level or a LOW-level to represent data bits. For an optical transmission system, these levels are different amounts of optical power. For electrical systems, these levels are different signal voltage or current pulses.

A transmitter includes a serializer and an output driver. The serializer converts

parallel bits into a serial bit stream. The timing information is embedded in this serial data. The output drivers drive the signal from serializer to the channel.

The channel is the medium of the data transmission system. There are many types of channels, such as unshielded twisted-pairs, printed-circuit boards (PCB) transmission lines, chip packages, coaxial cables and optical fibers. There are two high-speed links, copper cables and optical fibers. The first one as for short distance transmission and the second as for long distance ones. The most significant advantage provided by optical fibers is high bandwidth over long distances. But the drawback is the cost since the optical fiber and the necessary components as expensive. To replace optical fiber, the less expensive solution for high-speed communication is using cooper cables. But the cable length limits the bandwidth of transmission.[2]

The receiver receives and converts this analog signal back into binary data. It includes a front end amplifier, a deskew buffer or a *clock and data recovery* (CDR) and a deserializer. To recover the signal from transmitter, the analog waveform is amplified by front end amplifier. The data is resampled by the deskew buffer or the CDR. Finally, the serial data send into deserializer to converter serial high -speed data into parallel low speed data.



Figure 1.1 Conventional transceiver

 In advanced design case, there is a Pseudo Random Bit Sequence (PRBS) generator and verifier. The function is to check the correction of the data received from receiver by comparing to the data in transmitter. This is a build in self test (BIST) system. Phase lock loop (PLL) provides both transmitter and receiver a clock source.

 The CMOS high-speed serial links have been widely used in many applications such as data transmission with multiple processors, communication within computers, routers, etc. Also, there are many standard specification for CMOS high-speed serial links, like Gigabit Ethernet, IEEE1394, SONET, Fiber Channel. Table 1.1 is the table of standards

| Standard             | Data Rate    |  |
|----------------------|--------------|--|
| $OC-12/STM-4$        | 622.08Mbps   |  |
| FC1063               | 1.0625Gbps   |  |
| <b>SATA</b>          | 1.5Gbps      |  |
| OC-48/STM-16         | 2.48832Gbps  |  |
| PCI-Express          | 2.5Gbps      |  |
| SATA <sub>2</sub>    | 3Gbps        |  |
| <b>XAUI</b>          | 3.125Gbps    |  |
| 4G FC                | 4.25Gbps     |  |
| 8GFC                 | 8.5Gbps      |  |
| OC-192               | 9.95328Gbps  |  |
| 10GbE                | 10.3125Gbps  |  |
| <b>Fiber Channel</b> | 10.51875Gbps |  |
| G.709                | 10.66423Gbps |  |
| G.975                | 10.70923Gbps |  |
| OC-768               | 39.81gbps    |  |

Table 1.1 High-Speed Communication Standard

## **1.2 Motivation**

Advanced integrated circuit technologies are able to integrate muilti-million gates into a single chip. Operating frequency and data throughput have been increased significantly. Conventionaly, parallel buses and serial links are two approaches for high-speed signaling. For parallel buses, many bus lines are needed in a system to make the total transmission data rate arrive the specification. The drawback of the large buses is the increased power consumption and the explosion of circuit area. Also, the pads numbers is increased. Unfortunately, the number of I/O pins cannot grow proportionally. As a result, high-speed serial I/O is needed to solve the communication

bottleneck. PCI-Express and Serial ATA are two prominent examples. For serial transmission links, it maximizes the communication bandwidth and distance in a single transmission line. Serial links offer a high-speed and low-cost solution to multi-gigabit per second rates over long distance. Applications such as computer-to-computer or computer-to-peripheral interconnection can reach several meters. A key component is a serialier that converts low-speed parallel data into high-speed serial output stream.

In this thesis, a novel tree-type serializer circuit is proposed. We implement this transmitter architecture using non-return-to-zero (NRZ) signal techniques. A 10 Gbps novel tree-type serializer with output driver and PRBS (Pseudo Random Bit Sequence) has been designed. We also analyze the on-chip channel mode and design a low power driver for the channel with 1cm length.

### **1.3 Thesis organization**

The rest of the paper is organized as follows.

In Chapter 2, we describe and analyze the conventional structure of serializer. In Chapter 3, we introduce the proposed novel tree-type serializer architecture and analyze and compare to other conventional architecture. The simulation results of  $u_{\rm true}$ comparation are also showed.

In Chapter 4, the chip implementation is presented. We show the full architecture of this transmitter. We also show the detail circuit of each block. Finally, we present the simulation results, layout, and measurement consideration of the design.

In Chapter5, the measurement results are presented. It includes off-chip measurement and on-wafer measurement by a probe station. The results include eye diagrams, jitters (Pk-Pk)(RMS), power consumptions.

In Chapter 6, we show an on-chip channel model analysis and a low power driver.

The research is concluded in Chapter 7.

## **Chapter 2**

## **Background Study**



## **2.1 Other Structure of Serializer**

Serializer, also called Multiplexer or MUX, has the function of converting parallel low speed input data into serially high-speed output data stream. As Figure 2.1 shows, a conceptual block diagram of a serializer. In Figure 2.1, there is a N-to-1 multiplexer.  $D_{i1}$  to  $D_{in}$  are n-bit parallel low speed input data. Selected by  $ck_1$  to  $ck_n$ ,  $D_{i1}$ ,  $D_{i2}$  and  $D_{in}$  are serialized into high-speed output, DO. Its data rate is n times of Di. In many applications, the number of inputs of serializer is power of two, like 2, 4, 8, 16. Some system like PCI-Express may encode the output data. Thus, the number of input of serializer may be changed to another number. For example as 8B/10B scrambler need a 10 to 1 multiplexer.

There are three principal structures of serializer. They are shift-register type, single-stage type, and tree-type serializer. The architecture is shown in Figure 2.2, 2.3, and 2.4. There are other special architectures, like CML (Current Mode Logic) MUX



as shown in Figure 2.5. We will explain the structures in the next chapter.





Figure 2.2 Shift-register type serializer



Figure 2.3 Single-stage type serializer



Figure 2.4 Conventional tree-type serializer



Figure 2.5 Serializer of CML

## **2.2 Shift-Register Type Serializer**

Figure 2.2 shows the shift-register type serializer. The main function of this architecture is parallel load and serial shift. Both work of different frequencies. Parallel load works of low data rate. It uses CK2 as the clock. The parallel data inputs load in the *D Flip Flop* (DFF). Serial shift works of high-speed data rate It uses CK1 as function clock. The high data rate DFF trigged by CK1 sends data into a sequenced stream. The data in the serial shift register have been sent out entirely. CK3 loads the data from parallel load register into serial shift register. CK1 has the highest clock rate. It is divided to produce CK2, CK3. Refer to the timing diagram of the clock and data in Figure 2.2, this serializer works as follows.

The shift-register type serializer is a straightforward implementation. It can process arbitrary number of parallel data by increasing the number of DFFs and adjusting clock rate. The jitter is small with an ideal clock. However, there are several drawbacks. First, the maximum operating speed of this circuit is limited by the device performance. [3]. According to [4], only 3gbps transmission can be achieved even with 0.15um CMOS transistors technology. Second, it needs an extreme high speed and low jitter global clock. The DFF of serial shift work at the highest rate. This causes a large power consumption.



### **2.3 Single-Stage Type Serializer**

Figure 2.6 Circuit of 4-to-1Single-Stage Type

Figure 2.3 is the structure of a single-stage type serializer. Figure2.6 is the basic circuit diagram of this structure. The multiplexer needs to input the clock with the same frequency as the parallel input data. As show in Figure 2.3, the data is sent out when two specific clocks with different phases overlap. For example, d0 is transmitted when Φ0 and Φ1b overlap (both are 1). The data period of d0 is from Φ0 positive edge to Φ1b negative edge. The other data are transmitted by the same rule.

There is also one point that should be remarked in Figure 2.6. Many papers show that the device of data input is just a NMOS transistors. [5~10]. But, in [11] [12], we know that adding an extra PMOS transistors of data input has a benefit. When data is low, the PMOS transistors turns on and drives current to precharge the internal node to a high level. In other words, this technique can reduce the charge sharing effect and alleviates data jitter.

In order to have large output swing, the pull-up PMOS transistors must be weakly sized to reduce the driving capability. This makes the low to high transient time larger and the unbalance of rising and falling times. To achieve higher speed, we should reduce the output swing. The analyses of output swing and delay time to pull-up PMOS transistors size are shown in [10].

Basically, it is a multiplexer controlled by the phases of a multi-phase low-speed clock. The power consumption is small. This serializer can also handle arbitrary number of parallel data. It sends out one bit of data at each phase interval. The most significant drawback is the large self parasitic capacitance at the outputs that limit the bandwidth performance.[1][9] Furthermore, phase imbalance of the clock may also create jitters.

### **2.4 Conventional Tree-Type Serializer**

Figure 2.4 shows a 8-to-1 tree-type serializer for high-speed applications. It is composed of three stages of 2-to-1 multiplexers organized as a tree. A high-speed clock, normally at half the data rate, is divided to control the successive stages. However, due to the two inputs need to be out of phase, retiming mechanism is required  $[13~15]$ . We describe the 2-to-1 MUX in detail in Figure 2.4. We use  $CK/2(0)$ to retime DFF. D0 is latched by one positive triggered DFF. D1 is latched by one positive triggered DFF and one negative triggered DFF. After the retiming, D0 and D1 have a 180 degree phase shift. Then those two data as sent into a 2-to-1 MUX and we use CK/2(90) to select data out of the MUX. Notice the timing diagram of Figure 2.4, using CK/2(90) to select data during 1/4 to 3/4 the data period ensures enough setup **ANTIBERTY** time and hold time.

The conventional tree-type serializer is able to operate at a high frequency due to the low output parasitic capacitance and retiming mechanism. This architecture can only convert power of two of parallel input data, such as 2, 4, 8, and 16. It is able to achieve higher speed than a single-stage serializer. However, its hardware overhead and power consumption is higher.

Figure 2.5 and Figure 2.7 are the conventionally circuits of 2-to-1 MUX block. Figure 2.5 shows a CML of 2-to-1 MUX. It has a current source NMOS transistor biased by Vb to support a biasing current. The select S and inversion SN decide either d1 or d2 to be transmitted. As CMOS process technology scaled fast in recent years, supply voltage is lowed. The implementation of CML is harder due to three stages of NMOS transistors. Figure 2.7 is much alike a single-stage serializer and has lower parasitical capacitance at output node. Figure 2.7(a) is a 2-to-1 single-stage circuit and Figure 2.7(b) adds a PMOS transistor data input to reduce charge sharing effect as describe before.



Figure 2.7 Circuit of 2-to-1 MUX in tree-type serializer

Table 2.1 is a comparison of three types of MUX. The advantage is that this structure can work using a ring oscillator type phase lock loop (PLL). This means that the needed clock rate is 1/N of the transmission data rate.

Tree-type serializer is composed of multiple stages. This makes the number of input in each stage as well as the parasitical capacitance at output node be reduced. For this reason, the bandwidth of tree-type serializer is the highest among the three structures. The shortcoming is the requirement of a higher clock rate.

|                            | Tree                       | Single-stage              | Shift-register             |  |  |  |
|----------------------------|----------------------------|---------------------------|----------------------------|--|--|--|
| <b>Multiplex</b><br>number | $2^{\rm N}$                | N                         | N                          |  |  |  |
| Power                      | Low                        | Medium                    | High                       |  |  |  |
| Bandwidth                  | High                       | Low                       | medium                     |  |  |  |
| External clock<br>property | High freq, single<br>phase | Low freq, multi-<br>phase | High freq, single<br>phase |  |  |  |

Table 2.1 Comparison of three kinds of MUX

## **Chapter 3**

# **The Novel Tree-Type Serializer**



### **3.1 Functional Blocks**

 In this chapter, we will introduce a new serializer structure which consumes less power and area. First, we explain the 2-to-1 MUX and the control clock. Second, we show the configuration of 4-to-1 and 8-to-1 MUX. Finally, we describe the design issue. Figure 3.1 shows the conventional and proposed novel tree-type 4-to-1 serializer (multiplexer) cells. Three retiming D-type Flip-Flops (DFF), as shown in Figure 2.4, are removed. Instead, quadrature clocks are used for the switch control in the previous stage. The first stage is controlled by the original clock to switch and output data at two times the clock rate. The second stage is controlled by two divide-by-two clocks with phase difference of  $90^\circ$  degree. As one can see, with quadrature clocks, retiming can be waived. Moreover, data is ready one half period before being switched in. Therefore, there is no data dependent jitter. The overall jitter is determined by the output control clock.

 Figure 3.1 also shows the timing diagram without propagation delay. Therefore, there is no timing variation at the output. Figure 3.2 and Figure 3.3 are 4-to-1 and 8-to-1 MUX with quadrature clocks. The circuit structure is simple and regular. Without propagation delay, each stage of this serializer will have the setup time which is half of the input clock period and have no hold time.

 The propagation delay is a design issue in chip implementation. Figure 3.4 shows the case that considers the propagation delay. T1 is the delay of clock divide; T2 is the delay of the MUX; T3 is one-bit time. As one can see, the setup timing margin is T3-T1-T2; and the hold time margin is T1+T2. In general, they are more than enough for the MUX to operate reliably.



Figure 3.1 The original and proposed tree-type multiplexers.

Figure 3.5 shows the architecture of the novel tree-type serializer with  $2^N$  to 1. The circuit structure is simple and regular. The novel tree-type serializer embeds data retiming in the previous stage of MUX. Due to this, hardware overhead and power consumption are expected to be lower.



Figure 3.3 8-to-1 Novel Tree-Type Serializer



Figure 3.4 Timing Diagram of The Proposed MUX.



Figure 3.5 Architecture of The Novel Tree-Type Serializer with  $2^N$  to 1.

## **3.2 Comparison of Three Structures**

We compare our novel tree-type serializer to the single-stage and the conventional tree-type serializer in this section. In section 3.2.1, we analyze the required number of PMOS transistors in single-stage and novel tree-type serializer. This could help we understand the speed limitation of them. We can also know the difference of size in the two architectures when both of them work in the same transient time and boundary conditions. Section 3.2.2, we compare the three architectures by using HSPICE for simulation. The simulations of the three architectures are with the same boundary conditions to ensure the fair of comparison. Section 3.2.3, we show the comparison results as figures and tables.

### **3.2.1 Analysis of Novel Tree-Type and Single-Stage Serializer**

Considering the chip design issue described in Chapter 4, we need four 2.5Gbps 8-to-1 serialzer and one 10Gbps 4-to-1 serializer. Therefore, the analysis and simulation focus on the 2.5Gbps 8-to-1 serializer. Figure 3.6 shows the 8-to-1 single-stage serializer with dummy PMOS transistor which alleviate the charge sharing effect.



We consider a basic inverter in TSMC 0.13μm technology. The design rule for smallest width is  $0.3 \mu$ m. The basic inverter is shown in Figure 3.7.



Figure 3.7 Basic Inverter.

The average of  $C_d$  and  $C_g$  of PMOS transistor and NMOS transistor of the inverter is as follows.



Thus,





Figure 3.8 Half Circuit of Single-Stage and Novel Tree-Type Serializer.

Figure 3.8 show the half circuit of these two architectures. Since three stage of 2–to-1 MUX compose a 8-to-1 novel tree-type serializer, we only need to show the last 2-to-1 MUX which dominates the output capacitance and bandwidth. The boundary conditions we assume are (1) for  $C_{\text{out}}$ , we consider PMOS transistor drain capacitance and up level NMOS transistors drain capacitance. (2) the dummy transistor is not considered. (3) the swing in each architecture is from 0.25V to 1.2 V.



Figure 3.9 Equivalent R,C Circuit of Serializer.

We calculate the delay time from the output resistance and capacitance of serializer. We use equivalent RC circuit shown in Figure 3.9 to simplify the calculation. The calculations is are

$$
V_o = V_{DD} \frac{\frac{R_{O_N}}{I + SR_{O_N}C_O}}{R_P + \frac{R_{O_N}}{I + SR_{O_N}C_O}} = V_{DD} \frac{R_{O_N}}{R_P + SR_{O_N}R_{O_P}C_O + R_N}
$$
  
=  $V_{DD} \frac{R_{O_N}}{SR_{O_N}R_{O_P}C_O + (R_P + R_N)} = V_{DD} \frac{I}{R_{O_P}C_O} \frac{I}{S + \frac{R_{O_N} + R_{O_P}}{R_{O_N}R_{O_P}C_O}}$   

$$
\Rightarrow V_o(t) = V_{DD} \frac{I}{R_{O_P}C_O} e^{-\frac{t(R_{O_N} + R_{O_P})}{R_{O_N}R_{O_P}C}} \Rightarrow Time\_delay = \frac{R_{O_N}R_{O_P}C_O}{R_{O_N} + R_{O_P}}
$$
(2)

Now we calculate the output resistance and capacitance of each architecture. Then, we substitute the results into  $(2)$ .

Assume

 $m_{ns}$  is the number of parallelly connected NMOS in 8 - to -1 single - stage MUX  $C_{dP}$  is the equivalent drain capacitance of the PMOS in a basic inverter  $C_{\text{dN}}$  is the equivalent drain capacitance of the NMOS in a basic inverter  $C_{\rm gp}$  is the equivalent gate capacitance of the PMOS in a basic inverter  $C_{\rm gN}$  is the equivalent gate capacitance of the NMOS in a basic inverter  $R_{N}$  is the equivalent resistance of the NMOS in a basic inverter By (1),  $C_L = 16(C_{gN} + C_{gP})$  which means a fanout of 16 inverters.  $C_{L} = 40$  fF is the output load capacitance  $m_{p2}$  is the number of parallelly connected PMOS in 8 - to -1 novel tree - type MUX  $m_{n2}$  is the number of parallelly connected NMOS in 8 - to -1 novel tree - type MUX  $m_{ps}$  is the number of parallelly connected PMOS in 8 - to -1 single - stage MUX

For a single-stage MUX:

$$
m_{n\delta} = k_{\delta} m_{p\delta} \tag{3}
$$

$$
C_{out} = 8k_s C_{dN} m_{p8} + C_{dp} m_{p8} + 16(C_{gN} + C_{gP})
$$
\n(4)

$$
R_{O_{N}} = 3R_{N} \frac{1}{m_{n\delta}} = 3 \frac{1}{k_{\delta} m_{p\delta}} R_{N}, \ R_{O_{N}} = R_{N} \frac{1}{m_{p\delta}}
$$
(5)

$$
Delay\_time = \frac{24k_s R_N C_{dN}}{3 + k_s} + \frac{3R_N C_{dP}}{3 + k_s} + 48 \frac{R_N (C_{gN} + C_{gP})}{(3 + k_s)m_{p8}}
$$
(6)

For a novel tree-type MUX:

$$
m_{n2} = k_2 m_{p2} \tag{7}
$$

$$
C_{out} = 2C_{dN}k_2m_{p2} + C_{dP}m_{p2} + 16(C_{gN} + C_{gP})
$$
\n(8)

$$
R_{O_N} = 2R_N \frac{I}{m_{n2}} = \frac{2}{k_2 m_{p2}} R_N, R_{O_P} = R_N \frac{I}{m_{p2}}
$$
(9)

$$
Delay\_time = \frac{4k_2}{2 + k_2} R_N C_{dN} + \frac{2}{2 + k_2} R_N C_{dP} + \frac{32R_N (C_{gN} + C_{gP})}{(2 + k_2) m_{p2}}
$$
(10)

When (6) is equal to (10)

$$
\frac{24k_s R_N C_{dN}}{3+k_s} + \frac{3R_N C_{dP}}{3+k_s} + \frac{48R_N (C_{gN} + C_{gP})}{3+k_s} =
$$
\n
$$
\frac{4k_s}{2+k_s} R_N C_{dN} + \frac{2}{2+k_s} R_N C_{dP} + \frac{32R_N (C_{gN} + C_{gP})}{(2+k_s)m_{p2}}
$$
\n(11)

 $=4k_2(3+k_8)m_{p8}m_{p2}R_NC_{dN}+2(3+k_8)m_{p8}m_{p2}R_NC_{dP}+32(3+k_8)m_{p8}R_N(C_{gN}+C_{gP})$  $24k_{g}m_{p8}(2+k_{2})m_{p2}R_{N}C_{dN}+3m_{p8}(2+k_{2})m_{p2}R_{N}C_{dP}+48(2+k_{2})m_{p2}R_{N}(C_{gN}+C_{gP})$ *Multiply both sides of the equal sign by*  $(3 + k_s) m_{ps} (2 + k_2) m_{p2}$ *,* 

 $(12)$ 

From (1):  
\n
$$
C_{dN} = 1.5C_{gN}, C_{gP} = 4C_{gN}, C_{dP} = 5.3C_{gN}
$$
\nThen, (12) is  
\n
$$
36k_8(2 + k_2)m_{p8}m_{p2} + 15.9(2 + k_2)m_{p8}m_{p2} + 240(2 + k_2)m_{p2} = 6k_2(3 + k_8)m_{p8}m_{p2} + 10.6(3 + k_8)m_{p8}m_{p2} + 160(3 + k_8)m_{p8}
$$
\n(13)

*Now, for the same swing*  $k_8 = 6, k_2 = 4.5$ ,

$$
1404m_{p8}m_{p2} + 103.35m_{p8}m_{p2} + 1560m_{p2} = 243m_{p8}m_{p2} + 95.4m_{p8}m_{p2} + 1440m_{p8} \quad (14)
$$

$$
\Rightarrow 1168.95 m_{p8} m_{p2} + 1560 m_{p2} = 1440 m_{p8}
$$
\n(15)

$$
m_{p8} : m_{p2} = m_{p8} : \frac{1440m_{p8}}{1168.95m_{p8} + 1560} \wedge m_{p8} : m_{p2} = \frac{-1560m_{p2}}{1168.95m_{p2} - 1440} : m_{p2}
$$
 (16)

$$
m_{p8} : m_{p2} = m_{p8} : \frac{1440m_{p8}}{1168.95m_{p8} + 1560}
$$
 (17)

$$
m_{p8} : m_{p2} = \frac{-1560m_{p2}}{1168.95m_{p2} - 1440} : m_{p2}
$$
 (18)

In (17), when  $m_{p8}$  approximates infinite,  $m_{p2}$  is 1.232. This is because the transition time in single-stage MUX will converge no mater how  $m_{\text{p8}}$  increase. So  $m_{p2}$  converge to the significant calue. In (18), when  $m_{p2} > 1.232$  m<sub>p8</sub> < 0. This implies if  $m_{p2}$  is large than 1.232, it is impossible to find a solution for  $m_{p8}$ . Because if the transition time of  $m_{p2}$  is too short, the increasing of  $m_{p8}$  can not achieve the same transition time of  $m_{p2}$ .

#### **3.2.2 Compare three architectures by HSPICE Simulation**

In order to verify the low power and low area overhead advantages over the single-stage and conventional tree-type serializers, we design all three of them. We compare these three architectures in three ways (1) power consumption, (2) area overhead, (3) power area product. For solution by HSPICE, the boundary conditions are

- (1) The same  $C_{\text{L}}$  of 40fF
- (2) the same risin g time
- (3) The same data skew DFF shown in Figure 3.10.
- (4) The same of clock generator shown in Figure 3.11.
- (5) The same input level  $(1.2V \sim 0.25V)$



Figure 3.10 Static DFF.

In every architecture, we simulate the cases with the rise time of  $300ps \cdot 275ps \cdot$ 250ps、225ps、200ps、175ps、150ps、125ps、100ps. We simulate additional cases for the rise time of 170ps  $\cdot$  165ps  $\cdot$  160ps  $\cdot$  155ps for single-stage MUX only. These extra points would make the simulation result more complete.



Figure 3.11 DFF of Clock Generator.

#### **.3.2.2.1 Single-Stage Serializer**

We design and simulate this architecture as shown in Figure 3.6 according to Figure 3.12.



Figure 3.12 Design steps for single-stage serializer.

We can optimize the simulation and ensure each block consuming appropriated power by the steps shown in Figure 3.12. The rule of the size choosing in the design steps should conform to the TSMC design rule. In step 1, we obtain the size of PMOS transistors and NMOS transistors that match the rise time specification by using the command ".alter" in HSPICE that carefully increases the size of MOS transistors. The results are shown in Table 3.1 and sizes that match the rise time specification are boldfaced.

|                | $\left(\frac{W}{L}\right)_p = \frac{1.3}{0.13}$ $\left(\frac{W}{L}\right)_n = \frac{0.3}{0.13}$ |           |    | $\left(\frac{W}{L}\right)_p = \frac{1.3}{0.13}$ $\left(\frac{W}{L}\right)_n = \frac{0.3}{0.13}$ |           |
|----------------|-------------------------------------------------------------------------------------------------|-----------|----|-------------------------------------------------------------------------------------------------|-----------|
| mp             | mn                                                                                              | <b>Tr</b> | mp | mn                                                                                              | <b>Tr</b> |
| 1.0            | 6                                                                                               | 302       | 12 | 72                                                                                              | 157       |
| 1.1            | 6.6                                                                                             | 292       | 13 | 78                                                                                              | 157       |
| 1.2            | 7.2                                                                                             | 2761896   | 14 | 84                                                                                              | 157       |
| 1.3            | 7.8                                                                                             | 265       | 15 | 90                                                                                              | 155       |
| 1.4            | 8.4                                                                                             | 257       | 16 | 96                                                                                              | 154       |
| 1.5            | 9                                                                                               | 250       | 17 | 102                                                                                             | 154       |
| 1.6            | 9.6                                                                                             | 245       | 18 | 108                                                                                             | 154       |
| 1.7            | 10.2                                                                                            | 240       | 19 | 114                                                                                             | 153       |
| 1.8            | 10.8                                                                                            | 234       | 20 | 120                                                                                             | 152       |
| 1.9            | 11.4                                                                                            | 228       | 21 | 126                                                                                             | 150       |
| 2.0            | 12                                                                                              | 223       | 22 | 132                                                                                             | 149       |
| $\mathbf{3}$   | 18                                                                                              | 198       | 30 | 180                                                                                             | 148       |
| $\overline{4}$ | 24                                                                                              | 181       | 40 | 240                                                                                             | 147       |
| 5              | 30                                                                                              | 175       | 50 | 300                                                                                             | 146       |

Table 3.1 Rise Time versus size of the MOS Transistors in Single-Stage Serializer.



We describe the details of the cases for 300ps and 150ps.



### **300 ps case :**

Figure 3.13 Eye Diagram of Rise Time of 300ps for Single-Stage Serializer.

$$
(\frac{W}{L})_P = \frac{1.3 \mu m}{0.13 \mu m}, \ (\frac{W}{L})_N = \frac{0.3 \mu m}{0.13 \mu m}, \ m_P = 1, m_N = 6
$$

 $Power = 1.3163mW$ 

Area

 $= 117.8 \times 0.13 \mu m^{2}$  $=(W_p \times NO. \times m_p + W_N \times NO. \times m_N) \times L$  $= (1.3 \times 2 \times 1 + 0.3 \times 64 \times 6) \times 0.13$ 

Here, the area is referred to the total gate area.

#### **150ps case**:



Figure 3.14 Eye Diagram of Rise Time of 150ps for Single-Stage Serializer.

 $(\frac{W}{L})_P = \frac{1.3 \mu m}{0.13 \mu m}, (\frac{W}{L})_N = \frac{0.3 \mu m}{0.13 \mu n}, m_P = 21, m_N =$  $p = \frac{1.5 \mu \text{m}}{0.12 \text{cm}}$ ,  $\left(\frac{W}{I}\right)_{N} = \frac{0.5 \mu \text{m}}{0.12 \text{cm}}$ ,  $m_p = 21, m_N = 126$  $Power = 17.614mW$ Area  $=(W_p \times NO. \times m_p + W_N \times NO. \times m_N) \times L$  $=(1.3 \times 2 \times 21 + 0.3 \times 64 \times 126) \times 0.13$  $= 2473.8 \times 0.13 \mu m^{2}$  $u_{\rm HII}$ 

### **.3.2.2.2 Novel Tree-Type Serializer**

We design and simulate this architecture according to Figure 3.15



Figure 3.15 Design steps for Novel Tree-Type Serializer.

In Step 1, we get the size of first stage 2-to-1 serializer that match the rising time specification by using the command .alter in HSPICE and carefully increase the size of MOS transistors. The result is shown in Table 3.2 and sizes that match the rising time specification are boldface.

Table 3.2 Rise Time versus size of the MOS transistors in Novel Tree-Type **Serializer.** 

|      | $\left(\frac{W}{L}\right)_p = \frac{1.3}{0.13}$ $\left(\frac{W}{L}\right)_n = \frac{0.3}{0.13}$ |         | $\left(\frac{W}{L}\right)_p = \frac{1.3}{0.13}$ $\left(\frac{W}{L}\right)_n = \frac{0.3}{0.13}$ |       |           |
|------|-------------------------------------------------------------------------------------------------|---------|-------------------------------------------------------------------------------------------------|-------|-----------|
| mp   | mn                                                                                              | Tr (ps) | mp                                                                                              | mn    | <b>Tr</b> |
| 0.6  | 2.7                                                                                             | 320     | 2.0                                                                                             | 9     | 130       |
| 0.65 | 2.925                                                                                           | 297     | 3.0                                                                                             | 13.5  | 106       |
| 0.7  | 3.15                                                                                            | 280     | 3.7                                                                                             | 16.65 | 100       |
| 0.75 | 3.375                                                                                           | 261     | $\overline{4}$                                                                                  | 18    | 95.9      |
| 0.8  | 3.6                                                                                             | 248     | 5                                                                                               | 22.5  | 90.9      |
| 0.85 | 3.825                                                                                           | 235     | 6                                                                                               | 27    | 89.1      |


We describe the detail of the case of 300ps and 100ps for example.





Figure 3.16 Eye Diagram of Rise Time of 300ps for Novel Tree-Type Serializer.

$$
(\frac{W}{L})_P = \frac{1.3 \mu m}{0.13 \mu m}, (\frac{W}{L})_N = \frac{0.3 \mu m}{0.13 \mu m}, m_P = 0.65, m_N = 2.925
$$

 $Power = 1.3136$ mW

Area

 $=\sum(W_{p} \times NO \times m_{p} + W_{N} \times NO \times m_{N}) \times L$  $+(0.15\times 2\times 1+0.156\times 12\times 1)\times 0.13\times 4$  $(0.15 \times 2 \times 1 + 0.156 \times 12 \times 1) \times 0.13 \times 2$  $=(1.3 \times 2 \times 0.65 + 0.3 \times 12 \times 2.925) \times 0.13 +$ 

 $= 25.252 \times 0.13 \mu m^{2}$ 

**100 ps case** :



Figure 3.17 Eye Diagram of Rise Time of 100ps for Novel Tree-Type Serializer.

$$
\left(\frac{W}{L}\right)_P = \frac{1.3\mu m}{0.13\mu m}, \quad \left(\frac{W}{L}\right)_N = \frac{0.3\mu m}{0.13\mu m}, \quad m_P = 3.7, \quad m_N = 16.65
$$
\n
$$
\text{Power} = 2.7900 \text{mW}
$$
\n
$$
\text{Area}
$$
\n
$$
= \sum_{mux=1}^7 (W_p \times NO. \times m_p + W_N \times NO. \times m_N)_{max} \times L
$$
\n
$$
= (1.3 \times 2 \times 3.7 + 0.3 \times 12 \times 16.65) \times 0.13 + (0.15 \times 2 \times 1 + 0.156 \times 12 \times 1) \times 0.13 \times 2 + (0.15 \times 2 \times 1 + 0.156 \times 12 \times 1) \times 0.13 \times 4
$$

 $= 82.592 \times 0.13 \mu m^{2}$ 

#### **.3.2.2.3 Conventional Tree-Type Serializer**

The 8-to-1 conventional tree-type architecture is shown in Figure 3.18. We design and simulate this architecture according to Figure 3.15 which is the same to novel tree-type MUX.



Figure 3.18 8-to-1 conventional tree-type architecture.

As before, we show the rising time versus size of MOS transistors in Table 3.3 and sizes that match the rising time specification are boldface.

Table 3.3 Rise Time versus Size of the MOS Transistors in Conventional Tree-Type Serializer.

| $\left(\frac{W}{L}\right)_p = \frac{1.3}{0.13}$ $\left(\frac{W}{L}\right)_n = \frac{0.3}{0.13}$ |         |    |       |                                                                                                 |
|-------------------------------------------------------------------------------------------------|---------|----|-------|-------------------------------------------------------------------------------------------------|
| mn                                                                                              | Tr (ps) | mp | mn    | Tr (ps)                                                                                         |
| 2.5                                                                                             | 308     | 21 | 87.5  | 62.7                                                                                            |
| 2.875                                                                                           | 269     | 24 | 100   | 62.5                                                                                            |
| 3.375                                                                                           | 240     | 27 | 112.5 | 62.3                                                                                            |
|                                                                                                 |         |    |       | $\left(\frac{W}{L}\right)_p = \frac{1.3}{0.13}$ $\left(\frac{W}{L}\right)_n = \frac{0.3}{0.13}$ |



We describe the detail of the case of 300ps and 100ps for example.

#### **300 ps case** :



Figure 3.19 Eye Diagram of Rise Time of 300ps for Conventional Tree-Type Serializer.

$$
\left(\frac{W}{L}\right)_P = \frac{1.3\mu m}{0.13\mu m}, \quad \left(\frac{W}{L}\right)_N = \frac{0.3\mu m}{0.13\mu m}, \quad m_P = 0.6, \quad m_N = 2.5
$$
\n
$$
\text{Power} = 3.2329 \text{mW}
$$
\n
$$
\text{Area}
$$
\n
$$
= \sum_{mux=1}^7 (W_p \times NO \times m_p + W_N \times NO \times m_N)_{\text{mix}} \times L + \sum_{\text{DFF}} 88(W_p + W_N) \times L
$$
\n
$$
= (1.3 \times 2 \times 0.6 + 0.3 \times 12 \times 2.5) \times 0.13 + (1.3 \times 2 \times 0.6 + 0.3 \times 12 \times 2.5) \times 0.13 \times 0.2 \times 6
$$
\n
$$
+ (88 \times 1.3 + 88 \times 0.3) \times 0.13 \times 0.25 \times 9
$$
\n
$$
= 340.032 \times 0.13 \mu m^2
$$

**100 ps case :**



Figure 3.20 Eye Diagram of Rise Time of 100ps for Conventional Tree-Type Serializer.

$$
(\frac{W}{L})_P = \frac{1.3 \mu m}{0.13 \mu m}, (\frac{W}{L})_N = \frac{0.3 \mu m}{0.13 \mu m}, m_P = 2.7, m_N = 11.25
$$

 $Power = 3.9895mW$ 

Area

$$
= \sum_{mux=1}^{7} (W_p \times NO. \times m_p + W_N \times NO. \times m_N)_{mux} \times L + \sum_{DFF} 88(W_p + W_N) \times L
$$
  
= (1.3×2×2.7+0.3×12×11.25)×0.13+(1.3×2×2.7+0.3×12×11.25)×0.13×0.045×6  
+ (88×1.3+88×0.3)×0.13×0.25×9  
= 377.150×0.13µm<sup>2</sup>



#### **3.2.3 The Comparison Results as Figures and Tables.**

Table 3.4 Power of three architectures versus Rising Time.

|                    | Single-Stage | Conventional | Novel Tree  |  |
|--------------------|--------------|--------------|-------------|--|
| <b>Rising Time</b> | <b>MUX</b>   | Tree         | <b>Type</b> |  |
| 300 <sub>ps</sub>  | 1.3163       | 3.2329       | 1.3136      |  |
| 275 <sub>ps</sub>  | 1.6726       | 3.2618       | 1.3328      |  |
| 250 <sub>ps</sub>  | 1.8246       | 3.2955       | 1.3706      |  |
| 225 <sub>ps</sub>  | 2.0895       | 3.3722       | 1.4089      |  |
| 200 <sub>ps</sub>  | 2.2480       | 3.3743       | 1.4479      |  |
| 175 <sub>ps</sub>  | 3.5854       | 3.4486       | 1.5257      |  |
| 150 <sub>ps</sub>  | 17.614       | 3.5068       | 1.5934      |  |
| 125 <sub>ps</sub>  | X            | 3.6624       | 1.8391      |  |







Table 3.5 Area of three architectures versus Rising Time.





Power v.s. Rising Time





Figure 3.23 Area v.s. Rising Time.











Power x Area v.s Rising Time



Figure 3.24 Power X Area v.s. Rising Time.



Power x Area v.s. Rising Time

Figure 3.25 Power X Area v.s. Rising Time of Two Structures.

We use the data from Table 3.1, Table 3.2, and Table 3.3 to plot the Figure 3.21. From this figure, we can know the rising time limitation of each architecture due to the uncharged rising time of rapidly increased area. The rising time limitation of single-stage MUX is 150ps. The rising time limitation of novel tree-type MUX is 80ps. The rising time limitation of conventional tree-type MUX is 65ps. It also means the bandwidth limitation of each architecture and we can see the bandwidth of novel tree- type MUX is larger than single-stage MUX and a little less than conventional tree -type MUX.

Table 3.4 shows the power versus rising time of three architectures. Figure 3.22 is plotted from the data of Table 3.4. Table 3.5 shows the area overhead versus rising time of three architectures. Figure 3.23 is plotted from the data of Table 3.5. Table 3.6 shows the power area product versus rising time of three architectures. Figure 3.24 is plotted from the data of Table 3.6. Figure 3.25 shows only the power-area comparison of two tree structures. Here, area is referred to the total gate area.

Figure 3.22 and Figure 3.23 show that single-stage serializers can only go up to 6.5Gbps. Beyond 5Gbps, power and area increase significantly. Conventional tree-type and proposed tree-type serializers are able to reach 10Gbps with relatively constant power and area overhead. Due to these results of comparison, the advantages of low power and low area overhead over single-stage and conventional tree-type serializers are verified.



Figure 3.26 Analysis vs. Simulation.



As following, we compare the result of simulations and analysis as Eq 6. We

arrange the result as Table 3.7 and Figure 3.26 and verify the analysis in section 3.2.1 is matched the simulation.

## **3.3 Summary**

In this chapter, we finish the analysis and comparison.

Table 3.8 shows the numerical data of power and area comparisons for five different rise time. Table 3.9 standardizes the performance using conventional tree as the reference. As one can see, the proposed design consumes 0.43 power and occupies 0.09 area of the conventional tree at 5Gbps (200ps rise time). Together, it is 25.84 times better than the conventional one. At 10 Gbps (100ps), the power and area ratio is 0.70 and 0.22. Performance wise, it is 6.49 times better.

| Rise<br>Time      | Single-Stage MUX |        | Conventional<br>Tree |       | Novel Tree |       |
|-------------------|------------------|--------|----------------------|-------|------------|-------|
|                   | Power            | Area   | Power                | Area  | Power      | Area  |
| 100 <sub>ps</sub> |                  |        | 3.99                 | 49.03 | 2.79       | 10.74 |
| 150 <sub>ps</sub> | 17.61            | 321.59 | 3.51                 | 46.08 | 1.59       | 5.60  |
| 200 <sub>ps</sub> | 2.25             | 45.94  | 3.37                 | 45.22 | 1.45       | 4.14  |
| 250 <sub>ps</sub> | 1.82             | 22.97  | 3.30                 | 44.69 | 1.37       | 3.65  |
| 300 <sub>ps</sub> | 1.32             | 15.31  | 3.23                 | 44.20 | 1.31       | 3.28  |

Table 3.8 Power and Area Comparisons.





# **Chapter 4**

# **Transmitter Circuit Design**



### **4.1 Introduction**

This chapter will describe the detail circuit design of the chip implementation. Note that, 5GHZ VCO (voltage-controlled oscillator) is difficult to implement using 0.13um technology unless using a LC tank type oscillator. Without 5GHZ clock, the final stage is a 4-to-1 multiplexer, as will be shown in later. Since the test chip contains the serializer and a driver. There is no PLL on chip. Hence, the clock source is a 5GHZ clock. It is divided into a 4-phase 2.5GHz clock to emulate the 2.5GHZ PLL.

## **4.2 Circuit Design**

Figure 4.1 is the whole architecture of this chip. It includes four 8-to-1 serializers, one 4-to-1 serializer and multi-stage driver. In this section, we describe the design of each block in detail and show the circuit.



Figure 4.1 Whole architecture of the chip

#### **4.2.1 MUX 8-to-1**

There is no consideration about the propagation delay of each stage in Figure 3.3. But this is actually a design issue and we consider it here. In  $0.13 \mu m$  technology, a simple inverter with FO4 has 60ps propagation delay. Taking this delay into the 8-to-1 MUX of 2.5Gbps data rate, as Figure 3.4, we can see the timing diagram is shown in Figure 4.2. The Pn[1] and Pn[1]b are 1.25 GHz. The Pn[2], Pn[2]b, Pn[3] and Pn[3]b are through the first stage of frequency divider and have 625MHz. The Pn[4], Pn[4]b, Pn[5], Pn[5]b, Pn[6], Pn[6]b, Pn[7], and Pn[7]b which are through the second



Figure 4.2 Clock Diagram of 8-to-1 MUX with Propagation Delay

stage of frequency divider have 312.5MHz. These clock frequency have the 60ps delay and so does the serializer. Figure 4.3 is the structure of 8-to-1 MUX and the data skew DFF as shown in figure. In conventional, we can use a positive and a negative trigger DFF to implement and we need twelve DFFs in 8-to-1 MUX. However, there is a more efficient way to implement. From  $[14~15]$   $[17~23]$ , we can use *Master-Slave-Master Flip Flop*(MSM FF) to replace the positive and negative trigger DFF. The  $90^\circ$  phase shift between the inputs of serializer is achieved by adding an MSM-FF (extra latch) to one path.



**Proposed Novel Tree MUX 8-to-1**

Figure 4.3 Proposed Novel Tree-Type Serializer



Figure 4.4 Circuit of Differential DFF



Figure 4.5 Structure of 8-to-1 MUX and Clock Gen

We use a new differential DFF as shown in Figure 4.4 for our clock generator and data skew DFF. This DFF has higher bandwidth and smaller area overhead than original one, as shown in Figure 3.11. This is because the fewer MOS transistors and less output node capacitance in this new differential DFF. The other reason we use this differential DFF is the requirement of  $0^\circ$ ,  $90^\circ$ ,  $180^\circ$ ,  $270^\circ$  phase of clock.

Figure 4.5 show the structure of 8-to-1 serializer and clock generator. The circuit of each 2-to-1 MUX is shown in Figure 2.7(b). The corresponding data and clock diagram of each node is in Figure 4.6. Like Figure 4.2, Figure 4.6 adds the propagation delay time. The third stage 2-to-1 MUX outputs of novel tree-type MUX are net1, net2, net3, and net4 which are 625Mbps. The net1 multiplexes D1 and D5. The net2 multiplexes D2 and D6. The net3 multiplexes D3 and D7. The net4 multiplexes D4 and D8. The second stage 2-to-1 MUX outputs of novel tree-type MUX are net5, and net6 which are 1.25Gbps. The net5 multiplexes net1 and net3. The net6 multiplexes net2 and net4. The first stage 2-to-1 MUX outputs of novel tree-type MUX are 2.5Gbps data rate. The out multiplexes net5 and net6..



Figure 4.6 Data and Clock Diagram of 8-to-1 MUX with Delay

#### **4.2.2 Mux 4-to-1**

Since the highest frequency of input clock rate is four phases 2.5GHZ, we use a 4-to-1 single-stage serializer to convert the four 2.5Gbps data rate to 10Gbps data rate. In [25], we know we can add an inductor in circuit to increase the bandwidth. This is called inductive peaking. The idea is to make the capacitance that limits the bandwidth resonate with the inductor. We describe the conception in detail as following. In Figure 4.7, the two circuits are common source stage with and without inductor peaking. If we have a input step pulse in Vi, the inductor in Figure 4.7 (d) serves as an open circuit since the components of high frequency in the transition of

input step pulse. This causes the current all flow through the load C rather than through the resistor R. Thus, the output voltage level changes faster in Figure  $4.7(c)$ than in Figure 4.7(a). The application is shown in  $[14~15][19][21][26~28]$ .

As described above, inductive peaking can increase bandwidth substantially. But the area overhead due to inductor also increase rapidly. Thus, we overcome the drawback of low bandwidth of single-stage type by use active inductive peaking [24].



Figure 4.7 (a) CS Circuit with Load C (b) Small Signal Equivalent Circuit of (a) (c) CS Circuit with additional inductor (d) Small Signal Equivalent Circuit of (c)



Figure 4.8 Circuit of 4-to-1 MUX



Figure 4.9 Data and Clock Diagram of 4-to-1 MUX with Delay

The circuit of this 4-to-1 MUX is shown in Figure 4.8. We add an additional NMOS transistor as current source in each output node to enhance the inductance of active inductive peaking. . Figure 4.9 is the data and clock diagram of 4-to-1 MUX with delay. 一番 新聞 かんじょう

896

#### **4.2.3 MUX 32-to-1**



Figure 4.10 Architecture of 32-to-1 MUX

Figure 4.10 shows the overall circuit structure for the proposed 32-to-1 serializer for 10Gbps serial I/O. The module will be integrated into a 0.13um chip with an 8-phase 2.5GHz PLL.

#### **4.2.4 Driver**

For the requirement of measurement, we design a frequency divider to divide the input 5GHZ into four phases 2.5GHZ. And we design a multi-stage driver to drive the signal from 32-to-1 serializer. The circuit diagram is shown in Figure 4.11. This current mode logic (CML) driver is a conventional way in driver design [18] [20]  $[26~27]$   $[29~30]$ . And the design skill is in [31]. This architecture has good immunity to SSN.



Figure 4.11 Architecture of Multi-Stage Driver

#### **4.3 Simulation Result**

Figure 4.12 is simulation result of the multi-phase generator output. It generates Pn[1] with 1.25GHZ, Pn[2], Pn[3] with 625MHZ, and Pn[4],Pn[5], Pn[6],Pn[7] with 312.5MHZ. These clocks are for 8-to-1 serializer and the simulation result is matched Figure 4.2. Figure 4.13 is the simulation of 8-to-1 serializer. It includes multi-phase clock, net1, net2, net3, net4 with 625Mbps data, net5, net6 with 1.25Gbps data, and out with 2.5Gbps data. This result is matched Figure 4.6. Figure 4.14 is the simulation result of 4-to-1 serializer. This serializer serializes four 2.5Gbps data from four 8-to-1 serializers to 10Gbps.

Figure 4.15 is the eye diagram of data through 32-to-1 MUX and multi-stage

driver. The data rate is 10Gbps. The output swing is 300mV. And the jitter is 3.66ps. Table 4.1 is the power consumption of each part of circuit. The total power consumption is 27.06mW. Figure 4.16 shows the effect of ground bounce to VDD and GND. The noise(P-P) is 40mV.



Figure 4.13 Simulation Result of 8-to-1 Serializer.





Figure 4.15 The Eye Diagram of 10Gbps Transmitter





#### **Ground Bounce ~ 40mV (pk-pk)**

Figure 4.16 The Effect of Ground Bounce to VDD and GND

## **4.4 Implementations**

The chip has been implemented using TSMC 0.13um 2P8M CMOS process. It contains a 32-to-1 serializer, a 10Gbps driver, and a 32-bit PRBS (pseudo random bit sequence) generator. The diagrams of layout are shown in Figure 4.17 to Figure 4.23. The core area for the serializer is only 200um X 150um. The driver area is 360um X 110um. The total area of this chip is 1.14mm X 0.99mm.



Figure 4.17 Layout of MUX2 and DFF



Figure 4.18 Retiming and PRBS DFF



 Figure 4.20 (a) Clock Generator for 8-to-1 MUZ (b) 4-to-1 MUX (c) 5GHZ to four phase 2.5GHZ Clock divider



Figure 4.21 32-to-1 Serializer



Figure 4.22 Layout of Whole Chip (Without Dummy)



#### Figure 4.23 Layout of Whole Chip (With Dummy)

In the consideration of the measurement, we need an additional 5GHZ clock and the measurement of 10Gbps output data rate. In conventional off-chip measurement, the parasitical capacitance in PCB is too large too limits the bandwidth of transmission lime. We suspect the ISI effect would be too serious. And this causes the difficult and impossibility of chip measurement. Because this reason, we use the on-chip measurement.

By using on-chip measurement, we need the Wenworth probe station and the size of pad in chip is 80um x 80um. The pitch (center to center) is150um. The whole measurement environment is as show in Figure 4.24. We use Agilent N4901B to generate 5GHZ clock and a single-end 800mV swing. This clock would pass through the transmission line with 50ohm characteristic impedance and the five pins probe of GSGSG to send into chip.

The output data sends to Agilent 86100B to measure the eye diagram. We expect that the output data is 10Gbps with 300mV swing. The left and right sides of this chip are the clock input and data output. The up and down sides of this chip are the power supply inputs. There are total four power supplies in this chip. One is for the driver measurement. The other three are the measurement of PRBS, 32-to-1 serializer, 2.5GHZ clock generator. The instrument we use to measure power consumption is Keithley 2400 Source Meter.



Figure 4.24 The Whole Measurement Environment

## **4.5 Summary**

We use TSMC 0.13um technology to implement a 10Gbps transmitter. There are four phase 2.5GHZ input clock. Thus we implement the 32-to-1 serializer by using four 8-to-1 novel tree-type and one 4-to-1 single-stage type serializer.

# **Chapter 5**

# **Measurement**



# **5.1 Off Chip**

We do an off-chip measurement first. Figure 5.1 is the chip photo. Figure 5.2 is the core photo. The PCB we use is four layers Roger stack. It is for high-speed application. Its structure is shown in Figure 5.3. The two inside layers are ground and power. The off-chip PCB is shown in Figure 5.4. Figure 5.5 is the measurement environment. Agilent N4901B is the clock generator. Agilent 86100B measures the eye diagram and the jitter.

Figure 5.6(a) is the 1.25Gbps eye diagram. The jitter is 17.8ps(P-P) and 2.7ps(RMS). Figure 5.6(b) is the reset diagram. We design the reset of PRBS as 01010101. Figure 5.7 is the 2.5Gbps eye diagram. The jitter is 24.44(P-P) and 3.63(RMS).



Figure 5.1 The Whole Chip Photo.



Figure 5.2 The Core Photo.



Figure 5.3 Structure of Four layers PCB



 $(a)$  (b)

Figure 5.4 Photo of Off-Chip Measurement PCB



Figure 5.5 Environment of Off-Chip Measurement



Figure 5.6 (a) 1.25Gbps Data Eye Diagram (b) Reset



Figure 5.7 2.5Gbps Data Eye Diagram

# **5.2 On Chip**

We use Wenworth probe station to do on-chip measurement. The environment of measurement is shown in Figure 5.8. The probe station is on the anti-shock table which alleviates the shock by air pressure. First, we should caliber the five pin probe to make sure all five pin in the same plane. Then, we use the microscope to aim the chip and the image would show on the screen as shown in Figure 5.9(b). The key point in the on-chip PCB is the fewer the components soldered on the PCB, the easier the measurement we could do. This point is shown in Figure 5.9(a). The five pins GSGSG probe on the chip is shown in Figure 5.10.

Figure 5.11 is the measurement result. We only measure the 7.5Gbps as shown in Figure 5.11(a). The swing is 65mV. The jitter is 47.78ps (p-p) and 7.64ps (RMS). This result is shown in Figure 5.11(b). We use Keithley 2400 to measure power consumption. The power of 32-to-1 serializer is 4.8mW. The power of four phase 2.5GHZ clock generator is 4.31mW. The power of multi-phase clock generator and PRBS is 10.21mW. The multi-stage driver consumes 17.57mW. The total power consumption is 36.89mW. Table 5.1 is the arrangement of measurement result.



Figure 5.8 Wenworth probe station



Figure 5.9 (a) On-Chip Measurement (b) Probe Photo in Screen



Figure 5.10 Probe Photo



(a)



Figure 5.11 (a) Eye Diagram 1 (b) Eye Diagram 2

![](_page_68_Picture_65.jpeg)

![](_page_68_Picture_66.jpeg)

# **Chapter 6 On-Chip Channel Model Analysis and Low Power Driver**

![](_page_69_Picture_2.jpeg)

# **6.1 On-Chip Channel Model Analysis**

We analyze the on-chip channel model and choose one to be the on-chip channel with 10000 μm long. The characteristic impedance of channel should be equal to that of transmitter and receiver to prevent the reflection of the signal. As shown in [32], the longer the channel, the lower the bandwidth, and the more distortion is in the signal amplitude. The separation between signal line and shielded ground line is another issue for the crosstalk. In [32], the increasing of separation would decrease the crosstalk.

There are two types of transmission line structures. They are microstrip and

stripline as shown in Fig 6.1. We choose microstrip structure because this structure need one layer less of metal than the stripline and has higher characteristic impedance.

![](_page_70_Figure_2.jpeg)

where  $L_{_{UL}}$  is unit length inductor, and  $C_{_{UL}}$  is unit length capacitor

The design flow is shown in Figure 6.2. We use MATLAB to calculate the characteristic impedance of each MX-MY by (19). The limitation of the characteristic impedance of each MX-MY would be known. Here we choose M8 as MX since we may increase the wire cross section for higher data rate.[32] The limitation is shown in Table 6.1. We choose M8-M4 with Z0 50ohm. Thus, the width of signal transmission line is  $3.31 \,\text{\mu m}$ . And we leave metal 1, 2, and 3 for other use. The next step is to calculate  $R_{UL}$ ,  $C_{UL}$  and  $L_{UL}$  and use (20) to check the value of characteristic impedance again. Here we use a software, FastHerry, released by M.I.T.

We import the 3D structure of M8 to M4 microstrip with different space, and this software would calculate the unit length inductor. We calculate the unit length resistance and capacitor by TSMC transmission line model file. The result is shown in Table 6.2. We choose the space between shielding ground line and signal transmission line 6μm . And the channel model is built up. The dc attenuation and bandwidth is calculated by HSPICE. The dc attenuation is -2.75dB. The bandwidth is 11.1GHZ.

![](_page_71_Figure_2.jpeg)

61
|         | W of     | W of     | W of     | W of      |
|---------|----------|----------|----------|-----------|
|         | $Z0=100$ | $Z0=70$  | $Z0=60$  | $Z0=50$   |
| M8-M7   | X        | X        | X        | X         |
| M8-M6   | X        | X        | X        | X         |
| M8-M5   | X        | X        | X        | X         |
| M8-M4   | X        | X        | X        | 3.31E-6   |
| $M8-M3$ | X        | X        | 2.8E-6   | $4.92E-6$ |
| $M8-M2$ | X        | X        | $4.0E-6$ | $6.53E-6$ |
| M8-M1   | X        | $3.0E-6$ | $5.2E-6$ | 8.14E-6   |

 (b) **AMMA** Table 6.2 The Channel Model with Different Space



#### **6.2 Low Power Driver**

We design a driver to drive 10Gbps data from the 32-to-1 serializer. The driver should drive the signal through the microstrip channel with length  $10000 \mu m$ . The single end amplitude in the receiver front end should be 100mV at least. The channel model we build up in chapter 6.1 has 50ohm characteristic impedance. And we should design the driver with the same characteristic impedance. There are two kind of conventional driver. One is CML(*Current Mode Logic*) and the other is LVDS driver(voltage-mode driver). In [33], the analysis and simulation show the voltage-mode driver would save the 70% to 90% power consumption depending on data rate in 0.18μm process. Thus we decide to use the digitized structure as in Figure  $6.3(a)$ .



Figure 6.3 (a) Structure of Driver (b) Impedance matching

Next we calculate R1 and R2 in Figure 6.3(b). The attenuation in 10000μm length channel we choose is -2.75dB. We assume 200mV swing in driver output to ensure having enough swing margins in receiver front end. Thus, we have two equations as shown below.  $90000$ 

$$
R \frac{1}{2} = 50 \tag{21}
$$

$$
1200 \times \frac{R2}{/100} = 200
$$
 (22)

The solutions of both R1 and R2 are 150ohm. And we also design a pre-driver to link the 32-to-1 serializer with the LVDS driver. The architecture is shown in Figure 6.4.



Figure 6.4 Architecture of Pre-Driver

Inv1 and Inv2 in Figure 6.4 work as latch and can increase the output swing. The inverter with transmission gate is like a inductive peaking and can increase the bandwidth. The simulation result in receiver front end is show in Figure 6.5. The swing is  $120 \text{mV}$  and the jitter(P-P) is  $6.55$  ps. The power consumption of the pre-driver and driver is 4.44mW.



Figure 6.5 Simulation Result in Receiver Front End

### **Chapter 7**

# **Conclusion**



#### **7.1Conclusion**

This paper has proposed, implemented, and measured a novel tree-type serializer. By using quadrature clock for signal path selection, the retiming mechanism in a conventional tree is no longer needed. As a result, power consumption and circuit area is significantly reduced. Simulation results show that at 5Gbps/10Gbps it consumes 0.43/0.70 the power and 0.09/0.22 the area of a conventional tree. Performance wise, power-area product, it is 25.84/6.49 times better than a conventional tree. We ask for patent of this novel tree-type serializer.

The serializer has been implemented in TSMC 0.13um 2P8M CMOS process. The serializer occupies only 200um X 150um of area. We measure this chip by both off-chip and on-chip measurement. The chip is unable to reach 10Gbps due to some measurement difficulties. The serializer consumes 19.32mW of power at 7.5Gbps. The jitter is 47.78ps (p-p) and 7.64ps (RMS).

Besides, we analyze the on-chip channel model and design a low power driver to drive the 10Gbps data through 10000μm on-chip channel. The power consumption is 4.44mW.



## **Bibliography**

- [1] Hun wen Lu, Chauchin Su, "A 5Gbps LVDS Transmitter with Multi-Phase Tree-Type Multiplexer," 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-ASlC2004)i Aug. 4-5, 2004
- [2] Kuan Yu Chen, "A Self-Calibrate with Pre-Emphasis" Master Degree dissertation of National Chiao Tung University 2005
- [3] Yawen Guo, Zhanpeng Zhang, Wei Hu, Lianxing Yang, ASIC & System State Key Lab, Fudan University, Shanghai, PRC "CMOS Multiplexer and Demultiplexer for Gigabit Ethernet" IEEE 2002
- [4] Masakazu Kurisu, Makato Kaneko, Tetsuyuki Suzaki, et al., "2.8-Gb/s 176-mW Byte-Interleaved and 3.0-Gb/s 118-mW Bit-Interleaved 8:1 Multiplexers with a 0.15um CMOS Technology," IEEE Journal of Solid-State Circuits, Vol. 31, pp.2024-2029, Dec, 1996.
- [5] Patrick Chiang, Student Member, IEEE, William J. Dally, Fellow, IEEE, Ming-Ju Edward Lee, Member, IEEE,Ramesh Senthinathan, Senior Member, IEEE, Yangjin Oh, and Mark A. Horowitz, Fellow, IEEE "A 20-Gb/s 0.13- m CMOS Serial Link Transmitter Using an LC-PLL to Directly Drive the Output Multiplexer" IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 4, APRIL 2005 **MARITION**
- [6] Ming-Ju Edward Lee, William J. Dally*, Member,* IEEE, and Patrick Chiang "Low-Power Area-Efficient High-Speed I/O Circuit Techniques" IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
- [7] Ming-Ju Edward LCC, William Dally, Patrick Chiang Goriiputer Systems Laboratory, Slanford Uriiv., Stanlord, CA "A 90mW 4Gb/s Equalized I/O Circuit with Input Offset Cancellation" ISSCC 2000 / SESSION 15 *I*  HIGH-SPEED 110 *J* PAPER TP 15.3
- [8] F. Yuan "Fully differential 8-to-1 current-mode multiplexer for 10 Gbit/s serial links in 0.18 *l*m CMOS " ELECTRONICS LETTERS 24th June 2004 Vol. 40 No. 13
- [9] Jean Jiang, Fei Yuan, "A New CMOS Class-AB Transmitter for 10Gbps Serial Links" IEEE, 2004
- [10] Fei Yuan and Jean Jiang, Department of Electrical and Computer Engineering, Ryerson University, Toronto, Ontario, Canada, "A Pseudo-NMOS Fully Differential CMOS Current-Mode Multiplexing Transmitter for 10Gb/s Serial Links"
- [11] Meng-Tzer Wong, Wei-Zen Chen, "A 2.5Gbps CMOS Data Serializer" IEEE, 2002
- [12] Wei-Zen Chen and Meng-Chih Weng, "A 2.5Gbps Serial-Link Data Transceiver in a 0.35 pm Digital CMOS Technology", 2004 EEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-ASIC2004)/ Aug. 4-5,2004
- [13] Bruce E. Gorgon, Santa Barbara, Calif "Multiplexer Circuit" United States Patent, Patent NO.4270204, May 26 1981.
- [14] Jun Cao*,* and etc. " OC-192 transmitter and receiver in standard 0.18-/spl mu/m CMOS " IEEE J. Solid-State Circuits, Vol.37, NO. 12, Dec. 2002, pp: 1768 – 1780.
- [15] Daniel Kehrer, Hans-Dieter Wohlmuth, Herbert Knapp, Martin Wurzer, and Arpad L. Scholtz "40-Gb/s 2:1 Multiplexer and 1:2 Demultiplexer in 120-nm Standard CMOS", IEEE J. Solid-State Circuits, Vol.38, NO. 11, Nov 2003, pp: 1830 – 1837.  $u_{\rm H111}$
- [16] Robert G. Swartz, Tinton Falls, N.J., "High-Speed Multiplexer Circuit" United States Patent, Patent NO.4789984, Dec .6, 1988.
- [17] Daniel Kehrer1,2, Hans-Dieter Wohlmuth1, Herbert Knapp1, Martin Wurzer1, Arpad L. Scholtz2, "40Gb/s 2:1 Multiplexer and 1:2 Demultiplexer in 120nm CMOS", ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.6
- [18] Jri Lee*, Member,* IEEE*,* " High-Speed Circuit Designs for Transmitters in Broadband Data Links", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 5, MAY 2006
- [19] Mounir Meghelli, Alexander V. Rylyakov, Lei Shan, "50Gb/s SiGe BiCMOS 4:l Multiplexer and 1:4 Demultiplexer for Serial-Communication Systems", ISSCC 2002 / SESSION 15 / GIGABIT COMMUNICATIONS / 15.7
- [20] Daniel Kehrer and Hans-Dieter Wohlmuth, "A 30-Gb/s 70-mW One-Stage 4:1 Multiplexer in 0.13- m CMOS", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004
- [21] Michael M. Green', Afshin Momtaz, Kambiz Vakilian, Xin Wang, Keh-Chee Jen, David Chung, Jun Cao, Mario Caresosa, Armond Hairapetian, lchiro Fujimori, Yijun Cai, "OC-192 Transmitter in Standard 0.18pm CMOS", ISSCC 2002 / SESSION 15 / GIGABIT COMMUNICATIONS / 15.1
- [22] F. Znidarsic, E. Miillner and R. Strunz, "16:l retimingl multiplexer for 10 Gbit/s in Si production technology", ELECTRONICS LETTERS 1st February 1996 Vol. 32 No. 3
- [23] Yasushi Amamiya, Yasuyuki Suzuki, Zin Yamazaki, Masayuki Mamada, and Hikaru Hida System Devices Research Laboratories, NEC Corporation, "Low Supply Voltage Operation of 40-Gbh Full-rate 4: 1 Multiplexer Based on Parallel-Current -Switching Latch Circuitry", 2004 IEEE CSIC Digest

E N

- [24] Lindor Henrickson*, Member, IEEE*, David Shen, Uno Nellore, Alan Ellis, Joong Oh, HuiWang, Giovanni Capriglione, Ali Atesoglu, Alice Yang, Peter Wu, Syed Quadri, and David Crosbie , "Low-Power Fully Integrated 10-Gb/s SONET/SDHTransceiver in 0.13-um CMOS", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 10, OCTOBER 2003
- [25] Behzad Razavi, Professor of Electrical Engineering University of California, Los Angeles, "Design of Integrated Circuits for Optical Communications"
- [26] Daniel Kehrer, Hans- D ie t er Wohlmut h, "A 20 Gb/s 82 mW One-Stage 4:l Multiplexer in 0.13 pm CMOS", IEEE, 2003
- [27] Jaeha Kim1, Jeong-Kyoum Kim1, Bong-Joon Lee1, Moon-Sang Hwang1, Hyung-Rok Lee1, Sang-Hyun Lee2, Namhoon Kim2, Deog-Kyoon Jeong2, Wonchan Kim1, "Circuit Techniques for a 40Gb/s Transmitter in 0.13µm CMOS", ISSCC 2005 / SESSION 8 / CIRCUITS FOR HIGH-SPEED LINKS AND CLOCK-GENERATORS / 8.1
- [28] Jaeha Kim, Member, IEEE, Jeong-Kyoum Kim, Student Member, IEEE,

Bong-Joon Lee, Member, IEEE, Namhoon Kim, Deog-Kyoon Jeong, Member, IEEE, and Wonchan Kim, Member, IEEE, "A 20-GHz Phase-Locked Loop for 40-Gb/s Serializing Transmitter in 0.13-\_m CMOS", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 4, APRIL 2006

- [29] Harish S. Muthali*, Member, IEEE*, Thomas P. Thomas*, Member, IEEE*, and Ian A. Young*, Fellow, IEEE,* "A CMOS 10-Gb/s SONET Transceiver", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004
- [30] J.H.R. Schrader, E.A.M. Klumperink, J.L. Visschers1, B. Nauta*,* "CMOS Transmitter using Pulse-Width Modulation Pre-Emphasis achieving 33dB Loss Compensation at 5-Gb/s", 2005 Symposium on VLSI Circuits Digest of Technical Papers
- [31] Behzad Razavi, McGraw-Hill International Edition, "Design of Analog CMOS Integrated Circuits"
- [32] Peter Caputa and Christer Svensson, Fellow, IEEE , "Well-Behaved Global On-Chip Interconnect", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. 52, NO. 2, FEBRUARY 2005 1896

**Allille,** 

[33] Lei Luo, John M. Wilson, Stephen E. Mick, Jian Xu, Liang Zhang, and Paul D. Franzon, Fellow, IEEE, "3 Gb/s AC Coupled Chip-to-Chip Communication Using a Low Swing Pulse Receiver", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 1, JANUARY 2006