# A 0.1–0.3 V 40–123 fJ/bit/ch On-Chip Data Link With ISI-Suppressed Bootstrapped Repeaters

Yingchieh Ho, Student Member, IEEE, and Chauchin Su, Member, IEEE

Abstract—This paper presents a 40–130 fJ/bit/ch on-chip data link design under a 0.1–0.3 V power supply. A bootstrapped CMOS repeater is proposed to drive a 10 mm on-chip bus. It features a  $-V_{\rm DD}$  to  $2V_{\rm DD}$  swing to enhance the driving capability and reduces the sub-threshold leakage current. Additionally, a precharge enhancement scheme increases the speed of the data transmission, and a leakage current reduction technique suppresses ISI jitter. A test chip is fabricated in a 55 nm SPRVT Low-K CMOS process. The measured results demonstrate that for a 10 mm on-chip bus, the achievable data rate is 0.8–100 Mbps, and the energy consumption is 40–123 fJ per bit under 0.1–0.3 V  $V_{\rm DD}$ .

Index Terms—Bootstrapped circuit, energy efficient, intersymbol interference (ISI), low-voltage, leakage current reduction low-power, sub-threshold circuit.

#### I. INTRODUCTION

N THE PAST few years, low voltage and low power designs have attracted significant attentions because of the popularity of portable devices. Emerging embedded biomedical applications have once more pushed the low-power designs into another extreme case. Scaling the supply voltage down below the threshold voltage is the most favorable solution for low-power designs. A 180 mV, 1024-point FFT processor is a pioneer subthreshold-supply design [1], and followed by [2]–[4]. Subthreshold SRAM is another important category [5]–[7]. Other designs include a 6-bit Flash ADC for use at 0.2–0.9 V and a 14-tap 8-bit finite impulse response (FIR) at 20 MHz under 0.27 V [8], [9].

Subthreshold circuit design is challenging because the driving capability  $(I_{\rm on})$ , the  $I_{\rm on}/I_{\rm off}$  ratio, and process variations are degraded significantly [10]–[12], affecting the circuit performance, the power efficiency (leakage power), and the fabrication yield.

As technology continues to be scaled down, the on-chip global interconnect in the SoC design becomes a bottleneck with regard to speed, power and cost. The repeater insertion represents a feasible technique for the trade-off among speed, power and cost requirements [13]–[15]. Unfortunately, in the subthreshold region, conventional CMOS repeaters still suffer from the severe design problems mentioned above.

Manuscript received September 07, 2011; revised December 26, 2011; accepted January 13, 2012. Date of publication March 06, 2012; date of current version April 25, 2012. This paper was approved by Associate Editor Stefan Rusu.

C. Su is with the Department of Electrical Engineering, National Chiao-Tung University, Hsinchu 30010, Taiwan.

Y. Ho is with the Electrical Engineering Department and Institute of Electrical Control Engineering, National Chiao Tung University, Hsinchu 30010, Taiwan (e-mail: jasonho0421.ece94g@nctu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2012.2186722

The bootstrap technique in [16] drives large capacitive load. The bootstrapped driver consists of a pull-up and a pull-down control pair to boost the gate voltage and push the driver MOS into the linear region, improving the driving capability. However, the charge leakage at the boosted node and the poor precharge capability restrict the usability in the subthreshold region. Kil *et al.* proposed a subthreshold bootstrapped repeater in a 9 MHz distributed clock network at 0.4 V [17]. However, while this approach is applied to a data link, the kick-back disturbance through the boosting capacitors causes a large timing jitter. Furthermore, it consumes large static power and is associated with high capacitor costs. We have proposed a low-voltage ring oscillator composed of bootstrapped delay cells which is able to oscillate at 48 MHz at 0.2 V [18].

In this paper, an ISI-suppressed bootstrapped subthreshold repeater is proposed. Operating at the subthreshold supply voltage is the most effective means in power reduction. However, the decreasing in driving capability and increasing in power consumptions are two drawbacks. To overcome the poor driving capability, the bootstrap technique is used. In addition, a precharge enhancement and a leakage current reduction schemes are adopted. They achieve beneficial speed-energy tradeoff. Furthermore, the proposed repeater suppresses ISI noise in data link applications.

The rest of the paper is organized as follows. Section II introduces the on-chip bus structure with the proposed bootstrapped repeaters and the operations of the design. Section III presents a detailed performance evaluation, including the leakage current reduction scheme, leakage power analysis, ISI suppression, energy efficiency analysis and their Monte Carlo simulations. Section IV shows the test chip and the measured results. Finally, Section V draws the conclusions.

# II. ON-CHIP DATA LINK DESIGN WITH BOOTSTRAPPED REPEATER INSERTION

#### A. On-Chip Bus Architecture

Fig. 1 shows the proposed 4-bit on-chip bus for data communication under the subthreshold power supply. A bus is divided into several segments, each of which is driven by a bootstrapped repeater. Ground shielding is used to eliminate the effective-loading uncertainty and decouple the noise from adjacent channels. The staggered repeaters on adjacent channels are misaligned to reduce the coupling noise and simultaneous switching noise (SSN).

# B. Proposed Bootstrapped Repeater

The proposed bootstrapped repeater is composed of an inverter as the driver and a bootstrap control circuit. The



Fig. 1. Proposed on-chip bus architecture with new bootstrapped repeater insertion.



Fig. 2. Circuit of proposed bootstrapped repeater.

the  $I_{\rm on}/I_{\rm off}$  ratio is improved substantially.

bootstrap control circuit has many important features. First, a precharge enhancement scheme improves the precharge capability to achieve high-speed operation. Second, a leakage current elimination technique suppresses the ISI noise. Third, the bootstrap control circuit produces a boosted output swing from  $-V_{\rm DD}$  to  $2V_{\rm DD}$  to increase the driving current  $(2V_{\rm DD})$  and turn off the transistor aggressively  $(-V_{\rm DD})$ . As a result,

Fig. 2 depicts the proposed bootstrapped CMOS repeater.  $C_{\mathrm{BP}}$  and  $C_{\mathrm{BN}}$  are the bootstrap capacitors;  $M_{\mathrm{P1}}$  and  $M_{N1}$  are the precharge transistors for  $C_{\mathrm{BP}}$  and  $C_{\mathrm{BN}}$ ;  $INV_{\mathrm{P}}$  and  $INV_{N}$  are the pre-drivers to boost  $C_{\mathrm{BP}}$  and  $C_{\mathrm{BN}}$ ; and  $M_{\mathrm{PD}}$  and  $M_{\mathrm{ND}}$  are the output drivers.  $N_{\mathrm{BT}}$  is boosted to  $2V_{\mathrm{DD}}$  and  $-V_{\mathrm{DD}}$  to enhance the driving capability of  $M_{\mathrm{PD}}$  and  $M_{\mathrm{ND}}$ .  $N_{\mathrm{BT}}$  is also fed back to control  $M_{\mathrm{P1}}$  and  $M_{N1}$  to enhance the precharge capability and eliminate the reverse leakage current simultaneously.

Figs. 3 and 4 show the transient waveforms with input switching from H to L and from L to H. Assume that the bootstrap capacitors  $C_{\rm BP}$  and  $C_{\rm BN}$  had stored a voltage potential of  $V_{\rm DD}$  before  $V_{\rm in}$  has a transition from H to L; node  $N_{\rm BP}$  has an initial voltage of  $V_{\rm DD}$ , and node  $N_{\rm BT}$  has an initial voltage



Fig. 3. Proposed bootstrapped repeater operation (input H-to-L).

of  $-V_{\rm DD}$ , ideally. After  $V_{\rm in}$  transits from H to L,  $N_{\rm OP}$  transits from L to H and  $N_{\rm BP}$  is boosted to  $2V_{\rm DD}$ . At the same time,  $M_{\rm P2}$  is turned on and  $M_{N2}$  is turned off.  $2V_{\rm DD}$  at  $N_{\rm BP}$  starts to charge  $N_{\rm BT}$  through  $M_{\rm P2}$  and pushes  $N_{\rm BT}$  to  $2V_{\rm DD}$ . After  $N_{\rm BT}$  is charged above threshold voltage  $V_{\rm th}$ ,  $M_{N1}$  is turned on to precharge  $N_{\rm BN}$  to GND. Now,  $C_{\rm BN}$  has a potential of  $-V_{\rm DD}$ .

As  $V_{\rm in}$  transits from L to H, a similar mechanism pushes  $N_{\rm BT}$  to  $-V_{\rm DD}$ . Fig. 5 shows the simulated transient waveforms with a 1 mm wire load and a  $V_{\rm DD}$  of 0.2 V. Here,  $N_{\rm BT}$  swings from 384 mV to  $-186~{\rm mV}$  instead of the ideal 400 mV to  $-200~{\rm mV}$  owing to the charge sharing effect.

Like all bootstrap circuits, the proposed design has start-up and stand-by problems. Before start-up, one of the bootstrap capacitors does not have charge stored. Similarly, during a long stand-by period, one of the bootstrap capacitors becomes depleted of charge by subthreshold leakage. A transition of the data input is required to recharge the depleted bootstrap capacitor. The normal bootstrap function can then be regained at the next transition.



Fig. 4. Proposed bootstrapped repeater operation (input L-to-H).



Fig. 5. Simulated timing waveforms under 0.2 V supply.



Fig. 6. Cross-section of proposed circuit.

A CMOS transistor has parasitic diodes between sources/drains to the body. Although, the body and the sources can be shortened in PMOS using an N-well bulk-CMOS process, the parasitic diodes are retained for  $M_{N2}$ , as shown in Fig. 6. When a negative voltage  $(-V_{\rm DD})$  is generated at  $N_{\rm BN}$ , the parasitic diode might be turned on if  $V_{\rm DD}$  exceeds 0.7 V. Therefore, the proposed design is used in subthreshold applications.



Fig. 7. Equivalent circuit for evaluating boosting efficiency.

#### III. DETAILED EVALUATION AND COMPARISONS

The previous section briefly introduced the architecture of the on-chip bus and the basic operation of the proposed bootstrapped repeater. This section will discuss them in greater detail with reference to boosting efficiency, leakage power, ISI suppression, energy efficiency and Monte Carlo analysis.

#### A. Boosting Efficiency

Ideally, the boosted node  $N_{\rm BT}$  generates a voltage swing from  $2V_{\rm DD}$  to  $-V_{\rm DD}$ . However, the parasitic capacitance at node  $N_{\rm BT}$  exhibits the charge-sharing effect with the bootstrap capacitance [17]. For example, when  $N_{\rm BT}$  transitions above  $V_{\rm DD}$ , the equivalent circuit of the upper side is as shown in Fig. 7.  $V_{\rm BTP}$  and  $C_{\rm PTP}$  are the voltage and the total parasitic capacitance at  $N_{\rm BT}$ , respectively. Ideally,  $V_{\rm BTP}$  transits from  $-V_{\rm DD}$  to  $2V_{\rm DD}$ . Thus,

$$V_{\rm BTP} = \frac{C_{\rm BP}}{C_{\rm BP} + C_{\rm PTP}} \cdot 2V_{\rm DD} - \frac{C_{\rm PTP}}{C_{\rm BP} + C_{\rm PTP}} \cdot V_{\rm DD}. \quad (1)$$

To increase driving capability, the bootstrap capacitance is designed to be significantly larger than the parasitic capacitance at the node. As a result, (1) can be rewritten as

$$V_{\rm BTP} \approx \frac{C_{\rm BP}}{C_{\rm BP} + C_{\rm PTP}} \cdot 2V_{\rm DD} \triangleq \beta_{\rm P} \cdot 2V_{\rm DD}.$$
 (2)

 $eta_{
m P}$  is the boosting efficiency factor or simply the boosting efficiency. Similarly, as  $V_{
m BTN}$  transits from  $V_{
m DD}$  to below ground, the estimated  $V_{
m BTN}$  is

$$V_{\mathrm{BTN}} pprox \frac{C_{\mathrm{BN}}}{C_{\mathrm{BN}} + C_{\mathrm{PTN}}} \cdot (-V_{\mathrm{DD}}) \triangleq \beta_N \cdot (-V_{\mathrm{DD}}).$$
 (3)

In fact, the boosting efficiency factor is a time-variant function, according to the accumulation of leakage charge. When  $V_{\rm BTP}$  is boosted above  $V_{\rm DD}$ , the leakage currents  $I_{\rm LMP1}$  and  $I_{\rm LMN2}$  discharge  $C_{\rm BT}$  through  $M_{\rm P1}$  and  $M_{N2}$ , respectively, as shown in Fig. 7. The time-variant boosting efficiency causes an ISI problem, which will be discussed in a later section.

#### B. Leakage Current Reduction

In a low-voltage design, the leakage current  $I_{\rm off}$  accounts for a large portion of the total power consumption. The  $I_{\rm off}$  current



Fig. 8. Suppression of leakage current by negative gate voltage.

is mostly the sub-threshold leakage current, which is expressed as follows [10]–[12]:

$$I_{\text{off}} = \mu C_{\text{dep}} \frac{W}{L} V_T^2 \exp\left(\frac{V_{GS} - V_{\text{th}}}{nV_T}\right) \times \left(1 - \exp\left(\frac{-V_{\text{DS}}}{V_T}\right)\right)$$
(4)

where  $\mu$  is the effective mobility,  $C_{\rm dep}$  is the depletion capacitance, W and L are the width and length of the device,  $V_T$  is the thermal voltage,  $V_{GS}$  is the gate-to-source voltage,  $V_{\rm th}$  is the threshold voltage, n is the sub-threshold slope factor, and  $V_{\rm DS}$  is the drain-to-source voltage. Scaling down to a sub-threshold supply voltage substantially reduces  $I_{\rm on}$ , which is proportional to  $(V_{GS}-V_{\rm th})$  and varies exponentially with  $V_{\rm DS}$ . Since  $I_{\rm off}$  also remains exponentially proportional to  $V_{\rm DS}$ ,  $I_{\rm off}$  becomes responsible for a significant fraction of the total power consumption. As a result, scaling the supply voltage below the sub-threshold directly lowers the  $I_{\rm on}/I_{\rm off}$  ratio.

Making  $V_{GS}$  negative is an effective means of reducing  $I_{\text{off}}$ and improving the  $I_{\rm on}/I_{\rm off}$  ratio, consistent with (4). For example, Fig. 8 plots the  $I_D$  of an NMOS with a fixed 0.2 V drain voltage as  $V_{GS}$  is swept from -0.45 V to 0.65 V. Obviously,  $I_{\rm D}$  varies exponentially proportional with the gate voltage  $V_{\rm G}$ in the subthreshold region. Since HSPICE is based on BSIM4 model (level = 54), drain current has a good approximation to the nano-scaled effects such as drain-induced-barrier-lowering (DIBL) and gate-induced-drain-leakage (GIDL). Typically, the leakage current of the NMOS is 0.4 nA at  $V_{GS} = 0$  V. When  $V_{GS} = -0.22 \text{ V}, I_{D}$  is reduced to 30 pA from 0.4 nA at  $V_{GS} = 0$  V. However, the GIDL current that is induced by the high electrical field between gate and drain becomes the major component of the leakage current while the gate voltage remains negative [19]. In the case considered here,  $I_D$  is slightly increased to 70 pA as the gate voltage is shifted to  $-0.45~\mathrm{V}$ .  $I_{\mathrm{off}}$ for a single transistor is analyzed and  $P_{\rm Leakage}$  for a complete circuit is determined as follows



Fig. 9. Comparisons of total power at different  $V_{\rm DD}$ .

#### C. Leakage Power Analysis

Although HSPICE can simulate steady-state leakage power, characterizing the leakage power under dynamic operations is difficult. The following approach is taken. The total energy per bit is represented as

$$E_T = E_{SW} + E_{SC} + E_{\text{Leakage}} \tag{5}$$

where  $E_T$  represents the total energy per bit,  $E_{SW}$  is the switching energy,  $E_{SC}$  is the short-circuit energy, and  $E_{\text{Leakage}}$  is the leakage energy. A long wire can be regarded as large capacitive load is pF range. When a CMOS driver drives heavy capacitive loads, the energy contributions of the short-circuit current can be ignored [20].  $E_{\text{Leakage}}$  is proportional to T;  $E_{\text{rep}}$  is the total energy of the repeaters; and  $\alpha$  is activity factor. Thus, we can rewrite (5) as

$$E_T \approx \left(E_{\text{rep}} + \frac{\alpha}{2}C_{\text{wire}}V_{\text{DD}}^2\right) + P_{\text{Leakage}} \cdot T.$$
 (6)

For two identical signals with different periods  $T_1$  and  $T_2$ , Leakage power  $P_{\text{Leakage}}$  is derived as

$$P_{\text{Leakage}} = \frac{P_{\text{T}_1} \cdot T_1 - P_{\text{T}_2} \cdot T_2}{(T_1 - T_2)}.$$
 (7)

To demonstrate the reduction of leakage current, the proposed design is compared with the conventional inverter and two reported works [16], [17]. They are all designed to drive a 200 fF  $C_{\rm L}$ . A 55 nm SPRVT process is used. For all bootstrap drivers,  $C_B=50$  fF and the widths of  $M_{\rm PD}$  and  $M_{\rm ND}$  are 288 nm and 108 nm, respectively, for a fair comparison. The conventional inverter was designed to be 50 times the size of the bootstrapped driver to obtain the similar output  $t_{\rm rise}$  and  $t_{\rm fall}$  as the bootstrapped one at  $V_{\rm DD}=0.2$  V. Additionally, due to the iso-area condition, the results of the case with m=150 is also added.

Fig. 9 plots the total power as a function of the supply voltage for the five designs. As mentioned, the switching power and leakage power constitute almost all the total power consumption. Fig. 10 plots the leakage power as a function of the supply voltage. The operating frequencies are 0.5 MHz, 3 MHz, 10 MHz, 25 MHz and 66 MHz at 0.1 V to 0.3 V, respectively.



Fig. 10. Comparisons of leakage power at different  $V_{\mathrm{DD}}$ .



Fig. 11. Comparisons of  $P_{\text{Leakage}}/P_T$  ratio at different  $V_{\text{DD}}$ .

Owing to the negative  $V_{GS}$ , the leakage power of the proposed bootstrapped repeater is one order of magnitude less than those of the other designs. Fig. 11 shows the  $P_{\rm Leakage}/P_T$  ratio as function of the supply voltage. The proposed design has the lowest total power and a  $P_{\rm Leakage}/P_T$  ratio of 1.5% even though  $V_{\rm DD}=0.1~{\rm V}$ . It is roughly one order of magnitude lower than those of the others.

Fig. 12 shows the total power as a function of activity factors. When the activity factor is small, the non-transient time is long. That means the leakage power takes larger portion of the the total power. Fig. 13 shows the  $P_{\rm Leakage}/P_T$  ratio. The proposed design has a  $P_{\rm Leakage}/P_T$  ratio of 1% at 0.02 activity factor, which is much smaller than all other designs.

#### D. ISI Suppression

In data communication, inter-symbol interference (ISI) critically limits the data rate. The boosting efficiency of a bootstrapped inverter is closely related to the ISI, as follows. The driving capability of the output driver is controlled by the voltage  $V_{\rm BT}$  at  $N_{\rm BT}$ , which is either  $2\beta_{\rm P}V_{\rm DD}$  or  $-\beta_N V_{\rm DD}$ . In the design herein, the fed-back  $V_{\rm BT}=2V_{\rm DD}(V_{\rm BT}=-V_{\rm DD})$  eliminates the reverse current through  $M_{\rm P1}(M_{N1})$  when



Fig. 12. Comparisons of total power at different activity factors.



Fig. 13. Comparisons of  $P_{\text{Leakage}}/P_T$  ratio at different by activity factors.

 $N_{\rm BP}(N_{\rm BN})$  is boosted. Fig. 14 shows a data string with consecutive a 0s followed by b 1s. According to the circuit model in Fig. 7, the bootstrapped voltage can be derived as

$$V_{\text{BT}}(a+b) \approx \frac{2}{C_{\text{BP}} + C_{\text{PT}}} \cdot Q(a+b)$$

$$= \frac{2}{C_{\text{BP}} + C_{\text{PT}}}$$

$$\cdot \left( Q(0) - \int_0^{aT} (I_{\text{LMP1}} + I_{\text{LMN2}}) dt + \int_{aT}^{\text{BT}} I_{\text{DMP1}} \cdot dt \right).$$
(8)

Here, T is the period, Q(0) is the initial charge in  $C_{\rm BP}$ , and  $I_{\rm DMP1}$  is the precharge current on  $M_{\rm P1}$ . As a result,  $\beta_{\rm P}$  depends on input data. To minimize the variation of  $\beta_{\rm P}$ , according to (4), the leakage currents  $I_{\rm LMP1}$  and  $I_{\rm LMN2}$  must be minimized. Since the proposed design employs a special mechanism to suppress the subthreshold leakage  $I_{\rm LMP1}$  and  $I_{\rm LMN2}$ , as stated earlier, the precharge current  $I_{\rm DMP1}$  is also enhanced by the boosted signal. Therefore, the proposed design has better



Fig. 14. Timing diagram fro various numbers of consecutive 1 s and 0 s.



Fig. 15. Waveforms at nodes for various numbers of consecutive 0 s.

immunity to ISI. Fig. 15 shows the boosted and the output waveforms of the data with 4, 16 and 64 consecutive 0 s followed by only one "1". The ISI is suppressed successfully in all cases.

Fig. 16(a) compares the proposed design with reported repeaters in the clock link. The total length of the interconnect is fixed at 10 mm with minimum wire spacing for coplanar ground shielding. The 10 mm interconnect is segmented for various interconnect lengths along the X axis. The drivers are designed to yield  $t_{\rm rise}$  and  $t_{\rm fall}$  equal to 7.5% of a clock period. Fig. 16(b) compares the data links of the designs and demonstrates data rate as a function of segment length. The parameters  $t_{\rm rise}$  and  $t_{\rm fall}$  are designed to be 15% of a unit interval (UI) in data links. Notably, only one transition occurs per clock period in data links while two occur in clock links. The jitter tolerance is defined as 0.3UI peak-peak jitter of the output signal. Both Fig. 16(a) and (b) indicate that our design can simultaneously achieve the highest data rate and energy efficiency.

#### E. Energy Efficiency

The proposed design has a significant speed improvement and high energy efficiency. Bootstrap techniques improve the driving capability exponentially by boosting the gate voltage of the driver. However, the bootstrap circuit consumes extra power. The average power of the bootstrap circuit can be represented as

$$P_{T,BT} = P_{SW,BT} + P_{SC,BT} + P_{\text{Leak,BT}} \tag{9}$$

where  $P_{T,BT}$ ,  $P_{SW,BT}$ ,  $P_{SC,BT}$ , and  $P_{Leak,BT}$  are the average, switching, short-circuit and leakage power of the bootstrap circuit, respectively. For the proposed bootstrapped circuit in Fig. 2, the switching power is

$$P_{SW,BT} \approx \alpha f (2C_{INV} + 9\beta C_{PT})V_{DD}^2 \tag{10}$$



Fig. 16. Comparison of (a) clock links, and (b) data links as function of segment length.

where  $C_{INV}$  is the total input and output capacitance of  $INV_{\rm P}$  and  $INV_{\rm N}$ ;  $\beta$  is the boosting efficiency. Assume that  $\beta_{\rm P}=\beta_N=0.9$ , and  $C_{INV}\approx C_{\rm PT}$ , (10) can be rewritten as

$$P_{SW,BT} \approx 10.1 \cdot \alpha f C_{\rm PT} V_{\rm DD}^2$$
. (11)

Combined with the switching power for the wire, the total energy consumption is

$$E_T \approx \alpha \left( 10.1 \cdot C_{\text{PT}} V_{\text{DD}}^2 + \frac{1}{2} C_{\text{wire}} V_{\text{DD}}^2 \right) + P_{\text{Leak,BT}} \cdot T.$$
 (12)

 $P_{\rm Leak,BT}$  is the leakage power of the bootstrap circuit. The leakage energy of the driver can be ignored, as shown in Fig. 10. Fig. 17 shows that the proposed bootstrapped repeater and the conventional one drive a 0.5 pF capacitive load while  $V_{\rm DD}$  is being swept from 0.1–0.3 V. The bootstrapped repeater and the conventional one use the same output driver. Both these two circuits operate at their highest speed. The data rate of the proposed bootstrapped repeater is 7–13 times higher than the conventional one. When these two circuits are operated at



Fig. 17. Comparison of driving capability and energy.



Fig. 18. Monte Carlo simulation results of maximum clock rate.

0.1–0.2 V, the energy of the proposed design is even lower than the conventional one, because the proposed one reduces the leakage power effectively.

## F. Monte Carlo Simulations

Since sub-threshold circuits indeed suffer severe process variation problems, Monte Carlo simulations are used to investigate the effects. Four types of repeaters are discussed. A 10 mm interconnect is divided into 10 segments. Device mismatch, threshold voltage  $V_{\rm th}$  and process corner variation are assumed to be Gaussian random distribution.

The analysis is setup to find out the distribution of the maximum clock rate and the variability ratio. The maximum clock rate is the highest speed in each Monte Carlo sample and the variability ratio is defined as  $f_{\rm max}/f_{\rm min}$ . Under  $3\sigma$  variation, we simulated the designs at 20 different clock rates by the ratio of power of two. The number of samples in each clock rate is 1000. The PDFs of the maximum clock rate are shown in Fig. 18 in which X axis is normalized to 10 MHz and scaled by power-of-two. Fig. 18 also shows the mean  $\mu$ , standard deviation  $\sigma$ , minimal clock rate  $f_{\rm min}$ , and maximum clock rate  $f_{\rm max}$ . Our design has the minimal  $f_{\rm max}/f_{\rm min}$  ratio of 11.3, as compared to 16.9, 16.0 and 34.0 of the inverter, [16] and [17], respectively.



Fig. 19. Monte Carlo simulation results of leakage power.



Fig. 20. Monte Carlo simulation results of  $P_{\text{Leakage}}/P_T$  ratio.

Fig. 19 shows Monte Carlo simulation of the leakage power at 1 MHz under a 0.2 V  $V_{\rm DD}$ . Our design has an average of 13.0 pW and a standard deviation is 7.3 pW, which are two to three orders better than the rest. Fig. 20 shows the  $P_{\rm Leakage}/P_T$  ratio at 0.2 V. An average of 0.16% is far better the others and a  $\sigma$  of 0.09% indicates more concentrated as well.

#### IV. EXPERIMENTAL SETUP AND MEASUREMENTS

#### A. Chip Implementation

A test chip was designed and fabricated in 55 nm 1P10M SPRVT. The test chip includes two on-chip buses—the proposed bootstrapped repeater and the conventional one. Fig. 21 shows the block diagram of both on-chip buses. Four-bit pseudo-random bit sequences (PRBS) are generated and passed through an H-to-L level shifter to adjust the voltage swing to 0.1–0.3 V. An extra input I/P enables the equipment to provide a tunable clock signal or random data. Each on-chip bus has four channels. Each channel is 10 mm long and is divided into 10 segments, with a wire spacing of 90 nm for ground shielding in Metal5. In each bootstrapped repeater, two 50 fF MOM



Fig. 21. Block diagram of test circuits.



Fig. 22. Die photo and cell layout.



Fig. 23. Measured waveforms with core  $V_{\rm DD}=0.11~{\rm V}$ , 0.2 V and 0.3 V  $(0.11-1.2~{\rm V}I/OV_{\rm DD})$ . (a) Eye diagrams of clock link. (b) Eye diagrams of PRBS data link. (c) Input and output transient waveforms.

capacitors serve as the bootstrap capacitors. Level shifters are used for the I/O. The total area is 821  $\mu$ m × 820  $\mu$ m and the core area is 637  $\mu$ m × 206  $\mu$ m. Fig. 22 shows a photograph of the die. The layout area of the proposed bootstrapped repeater is 16.7  $\mu$ m × 11.8  $\mu$ m

### B. Measured Waveforms

Fig. 23 shows the measured clock waveforms (a), a data eye diagram (b), and I/O transient waveforms (c) under supply voltages of 0.11 V, 0.2 V, and 0.3 V. Table I presents the timing performance. The random data are a  $2^{10}-1$  bit PRBS sequence

TABLE I MEASURED TIMING PERFORMANCE

| Supply voltage     | 0.1V    | 0.11V           | 0.2V    | 0.3V    |  |
|--------------------|---------|-----------------|---------|---------|--|
| Clock rate         | 0.6MHz  | 1MHz            | 22.5MHz | 100MHz  |  |
| Clock jitter (RMS) | 22.4ns  | 12.0ns          | 0.58ns  | 132ps   |  |
| Clock jitter (p-p) | 206ns   | 87.3ns          | 5.15ns  | 954ps   |  |
| Data rate          | 0.8Mbps | 1.25Mbps 40Mbps |         | 100Mbps |  |
| Data jitter (RMS)  | 81.0ns  | 48.5ns          | 0.95ns  | 0.43ns  |  |
| Data jitter (p-p)  | 395ns   | 271ns           | 5.72ns  | 2.65ns  |  |
| Data latency       | 2.93µs  | 1.99µs          | 166µs   | 36.0µs  |  |

TABLE II CHIP SUMMARY

| Process                              | 55nm 1P10M SPRVT Low-K CMOS  |                   |                   |  |  |  |
|--------------------------------------|------------------------------|-------------------|-------------------|--|--|--|
| $V_{th}$                             | NMOS: 300mV; PMOS: -310mV    |                   |                   |  |  |  |
| Core Supply                          | 0.1-0.3V                     |                   |                   |  |  |  |
| Supply Voltage of                    | $V_{IOL}$                    | $V_{IOM}$         | $V_{IOH}$         |  |  |  |
| Level Shift Buffers                  | 0.1–0.3V                     | 0.2-0.8V          | 0.4–1.0V          |  |  |  |
| Supply Voltage of<br>Digital Circuit | 0.4-1.0V                     |                   |                   |  |  |  |
| Max. Clock Link                      | 0.6MHz<br>@ 0.1V             | 22.5MHz<br>@ 0.2V | 100MHz<br>@ 0.3V  |  |  |  |
| Max. Data Link                       | ax. Data Link 0.8Mbps @ 0.1V |                   | 100Mbps<br>@ 0.3V |  |  |  |
| Energy (fJ/bit)                      | 0.1V<br>@ 0.6MHz             | 0.2V<br>@ 22.5MHz | 0.3V<br>@ 100MHz  |  |  |  |
|                                      | 40                           | 59                | 123               |  |  |  |
| Lookaga Dawar                        | 0.1V                         | 0.2V              | 0.3V              |  |  |  |
| Leakage Power                        | 0.03µW                       | 0.14µW            | 0.57µW            |  |  |  |
|                                      | Conventional bus             | 637µm x 183µm     |                   |  |  |  |
| Layout Area                          | Bootstrapped bus             | 637µm x 206µm     |                   |  |  |  |
|                                      | Whole Chip                   | 821μm x 820μm     |                   |  |  |  |

and the level shifters contribute an RMS of 174 ps and a peak-to-peak jitter of 982 ps.

Fig. 24 shows the simulated and measured power and energy efficiencies of both the bootstrapped and the conventional buses. The FF process corner is used in the post-layout simulation to ensure consistency with the measurements. In general, the measured results coincide with the simulated ones, except in the extreme case of  $V_{\rm DD}=0.1~\rm V$ . The proposed design can operate at 0.6 MHz (100 MHz) under 0.1 V (0.3 V) with an energy efficiency of 40 fJ/bit (123 fJ/bit). The conventional repeater bus is 4 MHz (20 MHz) and 98 fJ/bit (182 fJ/bit) at 0.2 V (0.3 V).

| TABLE III   |  |
|-------------|--|
| COMPARISONS |  |

|                            | TVLSI08[17]     | TCASI08[14]     | JSSC08[21]      | JSSC10[22]      | Conv.        | Proposed    |       |       |
|----------------------------|-----------------|-----------------|-----------------|-----------------|--------------|-------------|-------|-------|
| Technology                 | 180nm           | 180nm           | 180nm           | 90nm            | 55nm         | 55nm        |       |       |
| Topology                   | BT<br>repeaters | INV<br>repeater | Cap<br>coupling | Cap<br>coupling | INV repeater | BT repeater |       |       |
| Single/ Differential       | Single          | Diff            | Diff            | Diff            | Single       | Single      |       |       |
| Supply voltage (V)         | 0.4             | 1.0             | 1.8             | 1.2             | 0.2          | 0.1         | 0.2   | 0.3   |
| Total length (mm)          | 80              | 10              | N/A             | 10              | 10           | 10          | 10    | 10    |
| Width (nm)                 | N/A             | 1000            | 2 x 300         | 2 x 540         | 90           | 90          | 90    | 90    |
| Spacing (nm)               | N/A             | 1500            | 2 x 300         | 2 x 320         | 90           | 90          | 90    | 90    |
| Data rate (Mbps)           | ★9 MHz          | 1500            | 1000            | 2000            | 8            | 0.8         | 40    | 100   |
| *FoM <sub>1</sub> (pJ/bit) | N/A             | 1.74            | 2.24            | 0.28            | 0.098        | 0.04        | 0.059 | 0.123 |
| *FoM₂<br>(Mbps/μW·μm)      | N/A             | 0.23            | 0.37            | 2.08            | 28.34        | 69.44       | 47.08 | 22.58 |

★ only shows clock rate.

\* 
$$FoM_1 = \frac{Power (\mu W)}{Data \ rate (Mbps)} = Energy (pJ/bit); FoM_2 = \frac{Data \ rate (Mbps)}{Power (\mu W) \cdot Pitch(\mu m)}$$



Fig. 24. Comparisons of measured and post-simulation results.

It shows the proposed one performs higher speed, wider range and better energy efficiency.

#### C. Leakage Power Measurement

A distinguishing feature of the proposed design is the reduction in leakage current. Fig. 25 plots measured and simulated leakage power. The measured powers are 30 nW, 140 nW, 575 nW, and 2.75 uW at  $V_{\rm DD}=0.1$ –0.4 V, which are closer to FF corner than the TT corner.

Table II summarizes the performance of the on-chip bus test chip. Table III compares the results with some previous works. Most other relevant investigations have focused on low-power on-chip data communication in the Gbps range. The FoMs are used to compare the performance of the data link. The  $\rm FoM_1$  is defined as the energy per bit. The proposed design can operate in the subthreshold region under a supply voltage of 0.1–0.3 V. The energy per bit is 40 fJ/bit at 0.1 V, 59 fJ/bit at 0.2 V, and 123 fJ/bit at 0.3 V, indicating that the proposed design is more



Fig. 25. Measured and post-simulation leakage power versus supply voltage.

power-efficient than the others. The definition of the  $\rm FoM_2$  is the data rate normalized to pitch-power product. It shows that the proposed one can achieve higher normalized data rate than the rest.

#### V. CONCLUSIONS

This work successfully explores on-chip bus design under a supply voltage of 0.1–0.3 V. The proposed insertion of a bootstrapped CMOS repeater to suppress ISI yields low accumulated ISI jitter and a high clock/data rate even at a subthreshold-supply voltage. Additionally, the proposed bootstrapped repeater improves energy efficiency and has a  $P_{\rm Leakage}/P_T$  ratio of 1% even at  $V_{\rm DD}=0.1$  V. This ratio is one order of magnitude lower than those of the other designs. According to Monte Carlo analysis, the proposed design has small variability under of device mismatch and process variation. Measured results verify that the proposed design achieves a 100 MHz (0.6 MHz) clock link and 100 Mbps (0.8 Mbps)

data link at 0.3 V (0.1 V)  $V_{\rm DD}$ . It is energy-efficient, consuming only 123 fJ (40 fJ) per bit.

#### REFERENCES

- A. Wang and A. P. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, Jan. 2005.
- [3] M. Seok, S. Hanson, Y. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "The Phoenix Processor: A 30 pW platform for sensor applications," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2008, pp. 188–189.
- [4] Y. Pu, J. P. Gyvez, H. Corporaal, and Y. Ha, "An ultra-low-energy multi-standard JPEG co-processor in 65 nm CMOS with sub/near threshold supply voltage," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 668–680, Jan. 2010.
- [5] N. Verma and A. P. Chandrakasan, "A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [6] M. H. Tu, J. Y. Lin, M. C. Tsai, S. J. Jou, and C. T. Chuang, "Single-ended subthreshold SRAM with Asymmetrical Write/Read-Assist," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 57, no. 12, pp. 3039–3047, Dec. 2010.
- [7] M. F. Chang, S. W. Chang, P. W. Chou, and W. C. Wu, "A 130 mV SRAM with expanded write and read margins for subthreshold applications," *IEEE J. Solid-State Circuits*, vol. 46, no. 2, pp. 520–529, Feb. 2011.
- [8] D. C. Daly and A. P. Chandrakasan, "A 6-bit, 0.2 V to 0.9 V highly digital flash ADC with comparator redundancy," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3030–3038, Nov. 2009.
- [9] W. H. Ma, J. C. Kao, V. S. Sathe, and M. C. Papaefthymiou, "187 MHz sub-threshold-supply charge-recovery FIR," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 793–803, Apr. 2010.
- [10] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, "Nanometer device scaling in sub-threshold logic and SRAM," *IEEE Trans. Electron Devices*, vol. 55, no. 1, pp. 175–185, Jan. 2008.
- [11] D. Bol, R. Ambroise, D. Flandre, and J. D. Legat, "Interests and limitations of technology scaling for subthreshold logic," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 17, no. 10, pp. 1508–1519, Oct. 2009
- [12] M. Alioto, "Understanding DC behavior of subthreshold CMOS logic through closed-form analysis," *IEEE Trans. Circuits Syst. I: Reg. Pa*pers, vol. 57, no. 7, pp. 1597–1607, Jul. 2010.
- [13] X. C. Li, J. F. Mao, H. F. Huang, and Y. Liu, "A global interconnect width and spacing optimization for latency, bandwidth, and power dissipation," *IEEE Trans. Electron Devices*, vol. 52, no. 10, pp. 2272–2279, Oct. 2005.
- [14] V. V. Deodhar and J. A. Davis, "Optimal voltage scaling, repeater insertion, and wire sizing for wave-pipelined global interconnects," *IEEE Trans. Circuits Syst. 1: Reg. Papers*, vol. 55, no. 4, pp. 1023–1030, May 2008

- [15] M. Ghoneima, Y. Ismail, M. M. Khellah, J. Tschanz, and V. De, "Serial-link bus: A low-power on-chip bus architecture," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 56, no. 9, pp. 2020–2032, Sep. 2009.
- [16] J. H. Lou and J. B. Kuo, "A 1.5-V full-swing bootstrapped CMOS large capacitive-load driver circuit suitable for low-voltage CMOS VLSI," *IEEE J. Solid-State Circuits*, vol. 32, no. 1, pp. 119–121, Jan. 1997.
- [17] J. Kil, J. Gu, and C. H. Kim, "A high-speed variation-tolerant interconnect technique for sub-threshold circuits using capacitive boosting," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 4, pp. 456–465, Apr. 2008.
- [18] Y. Ho, Y.-S. Yang, and C. Su, "A 0.2–0.6 V ring oscillator design using bootstrap technique," in *IEEE Asian Solid-State Circuits Conf.* (ASSCC) Dig. Tech. Papers, Nov. 14–16, 2011, pp. 333–336.
- [19] X. Yuan, J. E. Park, J. Wang, E. Zhao, D. Ahlgren, T. Hook, J. Yuan, V. Chan, H. Shang, C. H. Liang, R. Lindsay, S. Park, and H. Choo, "Gate-induced-drain-leakage current in 45 nm CMOS technology," *IEEE Trans. Device Mater. Reliabil.*, vol. 8, no. 3, pp. 501–508, Sep. 2008
- [20] H. J. M. Veendrick, "Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," *IEEE J. Solid-State Circuits*, vol. SC-19, no. 4, pp. 468–473, Aug. 1984.
- [21] R. Ho, T. Ono, R. D. Hopkins, A. Chow, J. Schauer, F. Y. Liu, and R. Drost, "High speed and low energy capacitively driven on-chip wires," IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 52–60, Jan. 2008.
- [22] E. Mensink, D. Schinkel, E. A. M. Klumperink, E. van Tuijl, and B. Nauta, "Power efficient gigabit communication over capacitively driven RC-limited on-chip interconnects," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 447–457, Feb. 2010.



Yingchieh Ho (S'09) received the B.S. and M.S. degrees in electronic engineering from National Central University, Chung-Li, Taiwan, in 1999 and 2001, respectively. He is currently pursuing the Ph.D. degree at National Chiao-Tung University, Hsinchu, Taiwan.

His research interests include ultra-low voltage CMOS circuits and systems design.



Chauchin Su (M'90) received the B.S. and M.S. degree in electrical engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 1979 and 1981, respectively. He received the Ph.D. degree in electrical and computer engineering from University of Wisconsin at Madison, Madison, in 1990.

He is now a Professor in the Department of Electrical Engineering, National Chiao-Tung University. His research interests are in the area of mixed analog and digital circuit design and testing, especially in low-power biomedical circuits and systems.