# Interconnect Accelerating Techniques for Sub-100-nm Gigascale Systems

Hong-Yi Huang, Member, IEEE, and Shih-Lun Chen, Student Member, IEEE

Abstract—This work describes new circuits called capacitor coupling trigger and capacitor coupling accelerator (CCA) circuits used to reduce the long interconnect *RC* delay in sub-100-nm processes. The proposed circuits use capacitors to split the output driving paths to eliminate the short-circuit current and thus improve the signal transition time. Besides, the capacitor coupling technique is applied to adjust the gate threshold voltage of the proposed circuits and isolate the input signal from the output driving transistors. The proposed circuits are faster than the prior circuits. Furthermore, the CCA can be applied to bi-directional interface, multiports bus, field-programmable gate array interconnections, and complex dynamic logic circuits.

*Index Terms*—Accelerator, capacitor coupling, gigascale systems, interconnect, receivers.

### I. INTRODUCTION

HE total delay time of integrated circuits (ICs) consists of two parts: the logic gate delay and the interconnect delay. The latter is induced by the parasitical resistance and capacitance of the interconnections. The interconnections have become critical in IC performance as the ICs are scaled into sub-100-nm dimensions [1]-[10]. The interconnections are decreased largely in the horizontal dimension as the semiconductor process is scaled down, and the vertical dimension is only slightly decreased. As the width and space occupied by the interconnections continues to decrease, the aspect ratio of wires must be increased to prevent the interconnect resistance from increasing greatly. In addition, the decreasing of the interconnect spacing increases the adjacent capacitance and fringing capacitance between interconnections [11], [12]. Although, the copper (Cu) wires and lower dielectric constant (low-k) materials are used in sub-100-nm processes, the parasitical effect of the interconnections is still serious. Besides, the trend of system-on-a-chip (SOC) makes more long wires between devices as the chips become larger. Fig. 1(a) and (b) shows the parasitic parameters of a 10000- $\mu$ m wire with 2× and 5× minimum width and space in various semiconductor processes, respectively [6]. As shown in Fig. 1(a) and (b), although the parasitic capacitance decreases slightly, the parasitic resistance

S.-L. Chen is with the Nanoelectronics and Gigascale Systems Laboratory, Institute of Electronics, National Chiao-Tung University, Hsinchu 30050, Taiwan, R.O.C. (e-mail: slchen@ieee.org).

Digital Object Identifier 10.1109/TVLSI.2004.836311

10,000 µm wire (2×)



Fig. 1. Parasitic parameters of 10000  $\mu$ m wire in various semiconductor processes: (a) 2× minimum width and space and (b) 5× minimum width and space.

increases greatly. Hence, the *RC* time constant of a  $10\,000-\mu$ m wire increases as the semiconductor process is scaled down. Compared with Fig. 1(a) and (b), although using wider width and space wire can decrease the *RC* time constant, it may consume more power and occupy more chip area to implement the long wires. Thus, the wider wire can not be used in consumer ICs. Accordingly, the interconnect delay dominates the whole chip delay in sub-100-nm processes. Designers must focus on the interconnect delay rather than the logic gate delay in sub-100-nm gigascale systems.

Fig. 2 shows the simple interconnect model, where Rd and Cr denote the driver output resistance and the receiver input capacitance, Rw and Cw denote the parasitic resistance and capacitance of the interconnection. As Rw is smaller than Rd, increasing the driving current by enlarging the driver size can reduce the delay time effectively. However, as Rw is larger than Rd, enlarging the driving current is no longer efficient. Fig. 3 shows the delay time of a 10 000- $\mu$ m wire (2× minimum width and space) in a 70-nm process. For small drivers, the input-to-output delay time (t3) decreases as the enlargement of the driver

Manuscript received December 11, 2002; revised January 30, 2004. This work was supported by the National Science Council, Taiwan R.O.C., under Contract NSC 89-2215-E-030-005.

H.-Y. Huang is with the Very Large Scale Integration/Computer-Aided Design Laboratory, Department of Electronic Engineering, Fu-Jen Catholic University, Taipei 24205, Taiwan, R.O.C. (e-mail: hyhuang@mails.fju.edu.tw).



Fig. 2. Simple interconnect model.



Fig. 3. Delay times of a wire driven by drivers of different sizes.

increases the driving current. However, the delay saturates as the driver size is more than five times of a unit inverter because the resistive effect of the metal wire dominates the total delay. Therefore, the receiver design must be considered rather than the driver design when designing a long wire in sub-100-nm gigascale systems.

Several methods to reduce the interconnect delay are reported. The aluminum used in the interconnections can be replaced with less resistive metals such as copper and substances with a lower dielectric constant (low-k) can be used as interlevel dielectric materials [13]-[20]. However, the interconnect delay is still serious using copper and low-k process. Inserting repeaters (buffers) also reduces the interconnect delay [21]–[24]. The repeaters are large size devices so they may consume much power. Using a special receiver [25]–[27] or inserting an accelerator (booster or transparent repeater) [27]-[29] also efficiently reduces the interconnect delay. The transient sensitive trigger (TST) and the transient sensitive accelerator (TSA) [27] circuits exhibit a threshold voltage drop problem during transition, increasing the delay time [30]. The booster and transparent repeater (TR) perform longer delay on the detection of the signal transition fast [28], [29].

The work presents new schemes called the capacitor coupling trigger (CCT) and the capacitor coupling accelerator (CCA) to accelerate the signal of the long wires [31]. The proposed CCA can be applied to bi-directional interface, multiports buses, field-programmable gate array (FPGA) interconnections, and complex dynamic logic circuits. The rest of this paper is organized as follows. Sections II and III describe the capacitor coupling triggers and the capacitor coupling accelerator, respectively. Section IV presents the simulation results and discussions. Conclusions are finally drawn in Section V.



Fig. 4. Concept behind capacitor coupling technique.





Fig. 6. Waveforms of HCCT.

### **II. CAPACITOR COUPLING TRIGGERS**

Fig. 4 shows the capacitor coupling technique. According to the capacitor coupling effect, the transition of V in results in a charge injection to signal Vout. The relationship between  $\Delta V$  in and  $\Delta V$  out is as follows:

$$\Delta V \text{out} = \frac{C1}{C1 + C2} \Delta V \text{in}.$$

It is seen that  $\Delta V$  out is proportional to C1 and  $\Delta V$  in. If C2 is fixed as a constant, a target  $\Delta V$  out can be achieved by sizing the capacitor C1. A split-path methodology is used to eliminate the short-circuit current and improve the transient speed [32], [33]. In this work, the capacitor coupling technique is applied to adjust the gate threshold voltage of the proposed circuits and to isolate the input signal from the output driving transistors [34].

Fig. 5 shows the high-threshold capacitor coupling trigger (HCCT). The driving signals A and B of the output stage devices P4 and N4 are isolated to the input signal IN by the capacitors C1 and C2, respectively. Fig. 6 shows the waveforms of the HCCT. Initially, signals IN and OUTB are at  $V_{DD}$ . Signal OUT is at GND. Transistors P1, P2, and P3 are turned off and transistors N1 and N3 are turned on. The gate and the drain of transistor



Fig. 7. Characteristic of hysteresis.

N2 are both at  $V_{DD}$ . Transistor N2 is not turned on at this moment. At period t1, signal IN starts to make a high-to-low transition. Signal C does not receive signal IN immediately. After time  $\Delta t1$ , signal IN is pulled down to  $V_{DD}$ -Vtn at time t2. Vgs of the transistor N2 exceeds Vtn so transistor N2 is turned on. Then, signal C starts to be pulled down through transistor N2. At the same time, signal A follows the transition of signal C according to the coupling effect. As |Vgs| of transistor P4 exceeds its threshold voltage |Vtp|, transistor P4 is turned on. Output signal OUT is pulled up to  $V_{DD}$ . The complementary signal OUTB is pulled down to GND. Consequently, transistors N1, N2, and N3 are turned off and transistors P1 and P3 are turned on. Finally, signals A and C are pulled up to  $V_{DD}$ . Then, transistor P4 is turned off. The initial conditions of the internal nodes A, B, C, and D are set up. The HCCT is ready to receive a low-to-high transition of the signal IN.

At time t4, signal IN starts to make a low-to-high transition. Signal D does not receive signal IN immediately. After period  $\Delta$ t2, signal IN is pulled up to |Vtp| at time t5. |Vgs| of transistor P2 exceeds |Vtp| so that P2 is turned on. Then, signal D starts to be pulled up through transistor P2. At the same time, signal B follows the transition of signal D according to the coupling effect. When Vgs of transistor N4 exceeds its threshold voltage Vtn, transistor N4 is turned on. The output signal OUT is pulled down to GND. The complementary signal OUTB is pulled up to  $V_{DD}$ . Therefore, transistor N4 is turned off and transistors N1 and N3 are turned on. Finally, signals B and D are pulled down to GND. Then, transistor N4 is turned off. The initial conditions of the internal nodes A–D are set up. The HCCT is ready to receive a high-to-low transition of the signal IN.

The transitions of signal IN can either pull node A down or pull node B up. Transistors P4 and N4 do not turn on simultaneously while the incoming signal is received. Therefore, the short-circuit current of the output stage can be eliminated. The operation of the HCCT has a hysteresis effect on the Schmitt–Trigger circuit shown in Fig. 7. Sizing of the coupling capacitors enables the transition points VL and VH of the hysteresis effect to be designed so that the HCCT can be highly immune to noise. The HCCT can be applied as a wire receiver to reduce the effect of signal noise.

Fig. 8 presents the low-threshold capacitor coupling trigger (LCCT), which is used to detect a small transition of the input signal and causes a fast transition of the output signal. The



Fig. 8.LCCT.



Fig. 9. Waveforms of the LCCT.

HCCT and the inverter shown in Fig. 8 are used to generate delayed control signals CO1 and CO2. Fig. 9 shows the waveforms of the LCCT. Initially, the signals IN and CO1 are at  $V_{DD}$ . The signals OUT and CO2 are at GND. Transistors N1, N3, N6, P2, and P5 are turned on. The other devices are turned off. At time t1, signal IN starts to make a high-to-low transition. Signal C immediately follows the transition of signal IN through transistor N2. At the same time, signal A follows the transition of signal C according to the coupling effect. As Vgs of transistor P4 exceeds its threshold voltage |Vtp|, the output node OUT is pulled up through transistor P4. Then transistor N5 is turned on. Transistors N5 and N6 provide a pulled-down path of signal A. The positive feedback provided by the transistors N5 and N6 to node A generates a lager Vgs to transistor P4. Therefore, the driving current of transistor P4 can be increased. The output signal OUT is pulled up to  $V_{DD}$ . The internal control signals CO1 and CO2 are delayed by an HCCT and an inverter. Transistors N1, N3, N6, P2, and P5 are turned off. Finally, nodes A and C are pulled up to  $V_{DD}$ . Then transistor P4 is turned off. The initial conditions of the internal node for the subsequent transition are set up. The LCCT is ready to receive a high-to-low transition of signal IN.

At time t3, signal IN starts to make a low-to-high transition. Signal D immediately follows the transition of signal IN through the transistor N2. At the same time, signal B follows the transition of signal D according to the coupling effect. When Vgs of



Fig. 10. Capacitor coupling accelerator.



Fig. 11. Operation of the CCA (steady state "high").

transistor N4 exceeds its threshold voltage Vtn, the output node OUT is pulled down through transistor N4. Then transistor P5 is turned on. Transistors P5 and P6 provide a pulled-up path of signal B. The positive feedback provides by the transistors P5 and P6 to node B generates a lager Vgs to transistor N4. Therefore, the driving current of transistor N4 can be increased. The output signal OUT is pulled down to GND. The control signals CO1 and CO2 have a delay induced by the HCCT and the inverter. Transistors P1, P3, P6, N2, and N5 are turned off. Finally, nodes B and D are pulled down to GND. Then, transistor N4 is turned off. The initial conditions of the internal node are set up for the subsequent transition. The LCCT is ready to receive a low-to-high transition of signal IN.

## **III. CAPACITOR COUPLING ACCELERATOR**

Fig. 10 shows the capacitor coupling accelerator (CCA). The CCA is connected to the middle-point of a long wire. Fig. 11 depicts the steady-state operation of the CCA while the interconnect signal is at  $V_{DD}$ . Transistors N1, N3, N6, P2, and P7 are turned on. Fig. 12 presents the operation of the CCA while the interconnect signal makes a high-to-low transition. Node C receives the transition of the interconnect signal through transistor P2. At the same time, node A gets a voltage transition through coupling capacitor C1. Then, node E is pulled up to  $V_{DD}$  through transistor P4. Transistor N8 is turned on which generate a pull-down path to the interconnect signal. Therefore,



Fig. 12. Operation of the CCA (high-to-low transition).



Fig. 13. Operation of the CCA (steady state "low").

the high-to-low transition of the interconnect signal is accelerated.

Fig. 13 presents the steady-state operation of the CCA while the interconnect signal is at GND. Transistors P1, P3, P6, N2, and N7 are turned on. Fig. 14 presents the operation of the CCA while the interconnect signal makes a low-to-high transition. Node D receives the transition of the interconnect signal through transistor N2. At the same time, node B gets a voltage transition through the coupling capacitor C2. Then, node F is pulled down to GND through transistor N4. Transistor P8 is turned on which generates a pull-up path to the interconnect signal. Therefore, the low-to-high transition of the interconnect signal is accelerated. When the CCA detects the transition of the interconnect signal, a large driving current is generated to the middle-point of a long wire. The CCA improves the rise/fall time of a long wire. Therefore, the interconnect delay can be reduced. The CCA can operate without an external control signal and consume no dc power. Moreover, it can be applied to the transmission of bi-directional signals shown in Fig. 15.

### **IV. SIMULATION RESULTS AND DISCUSSION**

In this section, some simulation results and discussions are reported. The wire parameters in a 90-nm process are given in [20] and [35]. Besides, the threshold voltages and gate-oxide



Fig. 14. Operation of the CCA (low-to-high transition).



Fig. 15. Use of CCA in the bidirectional interface circuit.



Fig. 16. Delay of the HCCT according to various coupling capacitor.

thickness of 0.18- $\mu$ m devices model are modified into a 90-nm process for the simulation of these circuits.

# A. Delay Versus Coupling Capacitor of the HCCT

Fig. 16 shows the simulation result of the HCCT. The delay time is measured from the input node to the output node of the HCCT. Increasing the size of the coupling capacitor is seen to reduce the delay time. Therefore, the gate threshold of the HCCT can be adjusted by sizing the coupling capacitor.

#### B. Improvement of Rise/Fall Time

In the simulation of driving a long interconnection, two static inverters are used as the driver and the receiver, respectively. The CCA, TSA [27], booster [28], and TR [29] are connected to the middle-point of the long wire, respectively. The wire is modeled by an *RC* distributed  $\pi$ -model [25]. The signal waveforms at the middle-point of a 10 000- $\mu$ m wire are shown in Fig. 17 where "Normal" denotes the wire without any accelerating techniques. The CCA, TSA, booster, and TR are seen to improve the rise/fall time of the interconnect signal so that the delay time of the long wire can also be reduced. The TSA has the drawback of a Vt-drop [30] and the booster and transparent repeater can not make a fast detection on the wire signal. Besides, the output driving capability of the TSA and the booster are decreased due to the stacked devices. Hence, the accelerating performance of the CCA is better than that of the TSA, the booster, and the transparent repeater.

# C. Position of the CCA

Fig. 18 presents the results of the simulation given various positions of the CCA. Lx denotes the distance from the driver to the CCA and L denotes the length of the interconnection. The best position of the CCA is at 0.3L. If the CCA is positioned at less than 0.3L of the distance to the driver, then it can detect the transition very quickly. However, the large parasitic RC of the long wire, after the insertion of the CCA still dominates the total delay. If the CCA is positioned near the receiver, then it cannot quickly detect the signal transition.

# D. Delay Comparison of the Accelerators

Fig. 19 compares the delays obtained using the CCA, the TSA, the booster, the TR and the repeater. The delay is measured from the input node of the driver to the output node of the receiver. Using the CCA achieves the best performance than others.

Fig. 20 compares the delay time of a 10 000- $\mu$ m wire obtained using various number of CCAs and repeaters. The positions of the CCAs and repeaters are optimized in this simulation. The



Fig. 17. Waveforms at the middle point of a long wire.



Fig. 18. Simulation results for various positions of CCA.



Fig. 19. Comparisons of the delay time using CCA, TSA, booster, TR, and repeater.

delay obtained using a CCA is less than that obtained using a repeater. As the number of the CCAs increases, the delay may saturate. As the number of repeaters exceeds the optimum value, the delay will increase.



Fig. 20. Delay time using different number of CCAs and repeaters.

# E. Delay Comparison of the LCCT and the TST

Fig. 21 compares the delay time using different receivers. The delay time measured from the driver input to the receiver output. The LCCT is faster than the TST [26] because the LCCT includes a positive feedback path to the input node of the output driving stage. Furthermore, the LCCT involves no Vt-drop problem.

# F. Noise Consideration

Because of the parasitic line-to-line capacitance and mutual inductance, the main noise in digital circuits for interconnections is due to crosstalk effect. Fig. 22 shows the cross section view of three parallel signal lines. In Fig. 22, Cc is the coupling capacitance between the signal line and adjacent line, Cgs is the parasitic capacitance of the signal line to ground plate, and Cga is the parasitic capacitance of the adjacent line to ground plate. If the signals of the three lines shown in Fig. 22 make transitions at the same time, the coupling capacitance Cc can be modeled into the capacitance of the signal line to ground plate. As the



Fig. 21. Comparisons of delay time using different receivers.



Fig. 22. Cross section view of three parallel-signal lines.

4

TABLE I SIMULATION RESULTS OF CROSSTALK EFFECT

| With CCA (ns) | Without CCA (ns)                                      |
|---------------|-------------------------------------------------------|
| 2.99          | 3.65                                                  |
| 3.49          | 4.70                                                  |
| 4.12          | 5.75                                                  |
| -14.3%~18.0%  | -22.3%~18.3%                                          |
|               | With CCA (ns)<br>2.99<br>3.49<br>4.12<br>-14.3%~18.0% |

transitions of the adjacent line and that of the signal line are in phase, the parasitic capacitance of the signal line to ground is decreased. Otherwise, as the transitions of the adjacent lines and that of the signal line are out-of-phase, the parasitic capacitance of the signal line to ground is increased. Table I shows the simulation of the crosstalk effect induced by neighboring signals. Three parallel wires are modeled by distributed RC model and the parameter of the three wires is also given in [20] and [35]. In this simulation, the signals of these three wires have transitions at the same time. As the transitions of the adjacent lines and that of the signal line are in phase, the delay is decreased. As the transitions of the aggressors and that of the signal line are out-of-phase, the parasitic capacitance of the signal line to ground is increased. As shown in Table I, the adjacent wires out-of-phase to the signal line undergoes a 1.63-ns reduction in the delay if a CCA is inserted into a 10000- $\mu$ m wire. Besides, the delay variation with CCA is smaller than that without CCA.

If the signal line is steady state and the signals of the adjacent lines have transitions, the CCA and the receiver may detect error signals on the wire. The space between wires can be increased to decrease the parasitic coupling capacitance. Shielding lines can be added to decrease the noise from other signal lines. However, increasing line-to-line space and adding shielding lines may occupy more chip area. Besides, a keeper can be added on the signal line to increase the noise immunity, and the HCCT or a Schmitt trigger circuit can be used as the receiver to reject noise.

TABLE II SIMULATION RESULTS OF POWER CONSUMPTION

|                                         | Driver<br>(µW) |       | Receiver<br>(µW) |      | CCA<br>(µW) |       | Total Power<br>(µW) |       |
|-----------------------------------------|----------------|-------|------------------|------|-------------|-------|---------------------|-------|
|                                         | S              | D     | S                | D    | S           | D     | S                   | D     |
| With                                    | 1.72           | 40.64 | 5.10             | 4.84 | 0           | 31.29 | 6.82                | 76.77 |
| CCA                                     |                |       |                  |      |             |       |                     |       |
| Without                                 | 0.71           | 68.35 | 7.45             | 4.83 | 0           | 0     | 8.15                | 73.18 |
| CCA                                     |                |       |                  |      |             |       |                     |       |
| S: Short-circuit power, D: Dynamic powe |                |       |                  |      |             |       |                     |       |

#### G. Power Consumption

In CMOS digital circuits, static power dissipation due to leakage current can be ignored. Hence, dynamic power consumption and short-circuit power consumption are responsible in digital circuits. Dynamic power is due to charging and discharging capacitors so it is proportional to the load capacitance if the operation frequency and the power supply voltage are fixed. Thus, no matter the wires with CCA or not, the dynamic power consumptions are the same. Short-circuit power consumption, which is due to direct-path currents as the logic gate has a transition, is a strong function of the ratio between input and output signal slopes. As the output load capacitance is large or the rise/fall time of the input signal is slow, the short-circuit power consumption becomes large. Table II shows the simulation results of power consumptions. In this simulation, an enhanced technique is used [36] to simulate the power consumption of the circuits at 50 MHz. The wire length is 10000  $\mu$ m and the parameters of the wire are also given in [20] and [35]. Besides, the total power consumption includes the power dissipates on the wire, the driver, the receiver, and the CCA. As show in Table II, because the rise/fall time of the wire signal with CCA is faster than that without CCA, the driver driving the wire with CCA consumes more short-circuit power than the driver driving the wire without CCA. The receiver receiving the wire signal without CCA consumes more short-circuit power than the receiver receiving the wire signal with CCA. Although the short-circuit power of the driver and the receiver will be changed with and without CCA, the dynamic power dominates the total power because of the long wires. As shown in Table II, the total power with CCA is slightly larger than that without CCA because the extra dynamic power is consumed on the CCA. In conclusion, using CCA can decrease wire delay, and will not consume more power.

## V. CONCLUSION

This paper presents the capacitor coupling trigger and the capacitor coupling accelerator circuits to improve the long interconnection *RC* delay in sub-100-nm processes. Using the CCA, the delay time of a 10 000  $\mu$ m wire can be made 22.6%–33.6% lower than those obtained using the TSA, the booster and the TR. The new techniques require no external synchronization signals. The LCCT can be very efficient when a long wire receiver is applied in sub-100-nm processes. The HCCT can function as a Schmitt–Trigger circuit to adjust the gate threshold. If considering the noise and speed tradeoff, the CCA can be used to reduce the long wire delay and the HCCT can be as the receiver to increase the noise immunity. As well to accelerate the long wire signal, the CCA also can be applied to the transmission of bi-directional signals, multiports bus, FPGA interconnections, and complex dynamic logic circuits. In conclusion, the proposed circuits are suitable for gigascale systems in sub-100-nm processes.

#### REFERENCES

- K. C. Sarawat and F. Mohammadi, "Effect of scaling of interconnections on the time delay of VLSI circuits," *IEEE J. Solid-State Circuits*, vol. SC-17, pp. 275–280, Apr. 1982.
- [2] M. Bohr, "Interconnect scaling—The real limiter to high performance ULSI," in *IEDM Tech. Dig.*, 1995, pp. 241–244.
- [3] S. Asai and Y. Wada, "Technology challenges for integration near and below 0.1 µm," Proc. IEEE, vol. 85, pp. 505–519, Apr. 1997.
- [4] Y. Taur et al., "CMOS scaling into nanometer regime," Proc. IEEE, vol. 85, pp. 486–504, Apr. 1997.
- [5] J. A. Davis *et al.*, "Interconnect limits on gigascale integration (GSI) in the 21st century," *Proc. IEEE*, vol. 89, pp. 305–324, Mar. 2001.
- [6] J. Cong, "An interconnect centric design flow for nanometer technologies," *Proc. IEEE*, vol. 89, pp. 505–527, Apr. 2001.
- [7] R. Ho, K. W. Mai, and M. A. Horowitz, "The future of wires," *Proc. IEEE*, vol. 89, pp. 490–504, Apr. 2001.
  [8] A. Deutsch *et al.*, "On-chip wiring design challenges for gigahertz op-
- [8] A. Deutsch *et al.*, "On-chip wiring design challenges for gigahertz operation," *Proc. IEEE*, vol. 89, pp. 529–555, Apr. 2001.
- [9] R. H. Havemann and J. Hutchby, "High-performance interconnects: An integration overview," *Proc. IEEE*, vol. 89, pp. 586–600, May 2001.
- [10] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, "3D-ICs: A novel chip design for improving for deep-submicrometer interconnect performance and system-on-chip integration," *Proc. IEEE*, vol. 89, pp. 602–633, May 2001.
- [11] Y. Ushiku, H. Ono, and N. Shigyo, "A three-level wiring capacitance analysis for VLSI's using a three-dimensional simulator," in *IEMD TEch. Dig.*, 1988, pp. 340–343.
- [12] W. H. Kao, C.-Y. Lo, M. Basel, and R. Singh, "Parasitic extraction: Current state of the art and future trends," *Proc. IEEE*, vol. 89, pp. 729–739, May 2001.
- [13] B. Zhao, "Advanced interconnect systems for ULSI technology," in *IEEE Int. Conf. Solid-State Integrated Circuits*, 1998, pp. 43–46.
- [14] D. Sylvester, C. Hu, O. S. Nakagawa, and S.-Y. Oh, "Interconnect scaling: Signal integrity and performance in future high-speed CMOS designs," in *Symp. VLSI Tech.*, 1998, pp. 42–43.
- [15] A. K. Stamper, T. L. McDevitt, and S. L. Luce, "Sub-0.25-micron interconnection scaling: Damascene copper versus subtractive aluminum," in *IEEE Advanced Semiconductor Manufacturing Conf. Workshop*, 1998, pp. 337–346.
- [16] M. Naik *et al.*, "Process integration of double level copper-low k (*hboxk* = 2.8) interconnect," in *IEEE Int. Conf. Interconnect Tech.*, 1999, pp. 181–183.
- [17] S. M. Jang *et al.*, "Advance Cu/low-k (k = 2.2) multilevel interconnect for 0.10/0.07  $\mu$  m generation," in *Symp. VLSI Technology*, 2002, pp. 18–19.
- [18] M. Miyamoto, T. Takeda, and T. Furusawa, "High-speed and low-power interconnect technology for sub-quarter-micron ASIC's," *IEEE Trans. Electron Devices*, vol. 44, pp. 250–256, Feb. 1997.
- [19] T. I. Bao *et al.*, "90 nm generation Cu/CVD low-k (k < 2.5) interconnect technology," in *IEDM Tech. Dig.*, 2002, pp. 583–586.
  [20] C. C. Wu *et al.*, "A 90-nm CMOS device technology with high-speed,
- [20] C. C. Wu *et al.*, "A 90-nm CMOS device technology with high-speed, general-purpose, and low-leakage transistors for system on chip applications," in *IEDM Tech. Dig.*, 2002, pp. 65–68.
  [21] S. Dhar and M. A. Franklin, "Optimum buffer circuits for driving long
- [21] S. Dhar and M. A. Franklin, "Optimum buffer circuits for driving long uniform lines," *IEEE J. Solid-State Circuits*, vol. 26, pp. 32–40, Jan. 1991.
- [22] Y. Jiang, S. S. Sapatnekar, and E. Sarto, "Interleaving buffer insertion and transistor sizing into a single optimization," *IEEE Trans. VLSI Syst.*, vol. 6, pp. 625–633, Dec. 1999.
- [23] V. Adler and E. G. Friedman, "Uniform repeater insertion in RC trees," IEEE Trans. Circuits Syst. I, vol. 47, pp. 1515–1523, Oct. 2000.

- [24] Y. I. Ismail, E. G. Friedman, and J. L. Neves, "Repeater insertion in tree structured inductive interconnect," *IEEE Trans. Circuits Syst. II*, vol. 48, pp. 471–481, May 2001.
- [25] H. Zhang, V. George, and J. M. Rabaey, "Low-swing on-chip signaling techniques: Effectiveness and robustness," *IEEE Trans. VLSI Syst.*, vol. 8, pp. 264–272, Jun. 2000.
- [26] R. Colshan and B. Jaroun, "A novel reduced swing CMOS BUS interface circuit for high speed low power VLSI systems," in *IEEE Int. Symp. Circuits Syst.*, vol. 4, 1994, pp. 351–354.
- [27] T. Iima, M. Mizuno, T. Horiuchi, and M. Yamashina, "Capacitor coupling immune, transient sensitive accelerator for resistive interconnect signals of subquarter micron ULSI," *IEEE J. Solid-State Circuits*, vol. 31, pp. 531–536, Apr. 1996.
- [28] R. M. Secareanu and E. G. Friedman, "Transparent repeaters," in ACM Great Lakes Symp. VLSI, 2000, pp. 63–66.
- [29] A. Nalamalpu, S. Srinivasan, and W. P. Burleson, "Boosters for driving long onchip interconnects—Design issues, interconnect synthesis, and comparison with repeaters," *IEEE Tran. Computer-Aided Design*, vol. 21, pp. 50–62, Jan. 2002.
- [30] H.-Y. Huang and S.-L. Chen, "High-speed receivers for on-chip interconnections in deep-submicron process," in *IEEE Int. Conf. Electrons, Circuits, and Syst.*, 2002, pp. 769–772.
- [31] —, "Threshold triggers and accelerator for deep submicron interconnection," in *IEEE Asia–Pacific Conf. Circuits and Syst.*, 2002, pp. 143–146.
- [32] C. Yoo, "A CMOS buffer without short-circuit power consumption," *IEEE Trans. Circuits Syst. II*, vol. 47, pp. 935–937, Sept. 2000.
- [33] K.-H. Cheng, W.-B. Yang, and H.-Y. Huang, "The charge-transfer feedback-controlled split-path CMOS buffer," *IEEE Trans. Circuits Syst. II*, vol. 46, pp. 346–348, Mar. 1999.
- [34] T. Kawahara, M. Horiguchi, J. Etoh, T. Sekiguchi, K. Kimura, and M. Aoki, "Low-power chip interconnection by dynamic termination," *IEEE J. Solid-State Circuits*, vol. 30, pp. 1030–1034, Sept. 1995.
- [35] International Technology Roadmap for Semiconductor, 2001.
- [36] G. Y. Yacoub and W. H. Ku, "An enhanced technique for simulation short-circuit power dissipation," *IEEE J. Solid-State Circuits*, vol. 24, pp. 847–848, June 1989.



Hong-Yi Huang (S'89–M'94) was born in Taiwan, R.O.C., in 1965. He received the B.S. degree in nuclear engineering from the National Tsing-Hua University, Hsinchu, Taiwan, R.O.C., in 1987, and the M.S. and Ph.D. degrees from the Institute of Electronics, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C., in 1989 and 1994, respectively.

From 1994 to 1999, he was with the Industrial Technology Research Institute (ITRI), where he worked in mixed-signal integrated circuits design. In 1999, he joined the Department of Electronic

Engineering, Fu-Jen Catholic University, Taipei, Taiwan, R.O.C. His research interests include high-speed and low-power low-voltage integrated circuits and systems, embedded memory, analog, and communication integrated circuits. He holds over 15 patents on VLSI circuits.



Shih-Lun Chen (S'02) was born in Taipei, Taiwan, R.O.C., in 1976. He received the B.S. and M.S. degrees from the Department of electronic engineering, Fu-Jen Catholic University, Taiwan, R.O.C. in 1999 and 2001, respectively, and is currently working toward the Ph.D. degree at the Institute of Electronics, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C.

His current research interests include the high-speed and mixed-voltage I/O interface, analog, and digital circuits.