*MUX1*, whose select signal does not contribute to the critical path. This is because the adder in the BZ-FAD, in contrast to the conventional architecture, can begin its work independent of multiplexer *MUX1*. In fact, while the adder is busy with performing the addition, there is enough time for the ring counter and multiplexer  $M1$  to deliver the value of the next hot bit. All delays in this path are shorter than the adder delay and, hence, do not increase the delay of BZ-FAD. The synthesis timing reports estimates the critical path delay for the BZ-FAD and the conventional multipliers to be 9.76 and 9.74 ns, respectively, which agrees with the above discussion. The slight difference between the reported delays originates from the fact that the input clock signal to the *Feeder* and *Bypass* registers pass through a NAND and a NOR gate in the BZ-FAD architecture. For SPST (synthesized in gate level) the critical path is about 25 ns.

## V. SUMMARY AND CONCLUSION

In this paper, a low-power architecture for shift-and-add multipliers was proposed. The modifications to the conventional architecture included the removal of the shift of the  $B$  register (in  $A \times B$ ), direct feeding of  $A$  to the adder, bypassing the adder whenever possible, use of a ring counter instead of the binary counter, and removal of the partial product shift. The results showed an average power reduction of 30% by the proposed architecture. We also compared our multiplier with SPST [6], a low-power tree-based array multiplier. The comparison showed that the power saving of BZ-FAD was only 6% lower than that of SPST whereas the SPST area was five times higher than that of the BZ-FAD. Thus, for applications where small area and high speed are important concerns, BZ-FAD is an excellent choice.

Additionally, we proposed a low-power architecture for ring counters based on partitioning the counter into blocks of flip-flops clack gated with a special clock gating structure the complexity of which was independent of the block sizes. The simulation results showed that in comparison with the conventional architecture, the proposed architecture reduced the power consumption more than 75% for the 64-bit counter.

#### **REFERENCES**

- [1] A. Chandrakasan and R. Brodersen, "Low-power CMOS digital design," *IEEE J. Solid-State Circuits*, vol. 27, no. 4, pp. 473–484, Apr. 1992.
- [2] N.-Y. Shen and O. T.-C. Chen, "Low-power multipliers by minimizing switching activities of partial products," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2002, vol. 4, pp. 93–96.
- [3] O. T. Chen, S. Wang, and Y.-W. Wu, "Minimization of switching activities of partial products for designing low-power multipliers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 3, pp. 418–433, Jun. 2003.
- [4] B. Parhami*, Computer Arithmetic Algorithms and Hardware Designs*, 1st ed. Oxford, U.K.: Oxford Univ. Press, 2000.
- [5] V. P. Nelson, H. T. Nagle, B. D. Carroll, and J. I. David*, Digital Logic Circuit Analysis & Design*. Englewood Cliffs, NJ: Prentice-Hall, 1996.
- [6] K.-H. Chen and Y.-S. Chu, "A low-power multiplier with the spurious power suppression technique," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 7, pp. 846–850, Jul. 2007.
- [7] K. H. Chen, K. C. Chao, J. I. Guo, J. S. Wang, and Y. S. Chu, "An efficient spurious power suppression technique (SPST) and its applications on MPEG-4 AVC/H.264 transform coding design," in *Proc. IEEE Int. Symp. Low Power Electron. Des.*, 2005, pp. 155–160.
- [8] K. H. Chen, Y. M. Chen, and Y. S. Chu, "A versatile multimedia functional unit design using the spurious power suppression technique," in *Proc. IEEE Asian Solid-State Circuits Conf.*, 2006, pp. 111–114.
- [9] J. S. Wang, C. N. Kuo, and T. H. Yang, "Low-power fixed-width array multipliers," in *Proc. IEEE Symp. Low Power Electron. Des.*, 2004, pp. 307–312.
- [10] O. Chen, S. Wang, and Y. W. Wu, "Minimization of switching activities of partial products for designing low-power multipliers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 3, pp. 418–433, Jun. 2003.
- [11] Z. Huang and M. D. Ercegovac, "High-performance low-power lefttoright array multiplier design," *IEEE Trans. Comput.*, vol. 54, no. 2, pp. 272–283, Mar. 2005.
- [12] H. Lee, "A power-aware scalable pipelined booth multiplier," in *Proc. IEEE Int. SOC Conf.*, 2004, pp. 123–126.

## **A Unified Detection Scheme for Crosstalk Effects in Interconnection Bus**

Katherine Shu-Min Li, Chung-Len Lee, Chauchin Su, and Jwu E Chen

*Abstract—***For very deep sub-micrometer VLSI, crosstalk becomes an important issue in affecting performance and signal integrity of the circuits. Two crosstalk fault effects, namely, glitch and crosstalk-induced delay, in the system-on-chip (SOC) interconnect bus are analyzed and a unified scheme to detect them is proposed and demonstrated in this paper. The crosstalk induced delay is found to be superposition of the induced glitch and the applied signal at the victim line, and this effect is more important in affecting the circuit performance. A pulse detector with an adjustable detection threshold is proposed to detect glitches and consequently the induced delay. Several issues affecting the yield of the proposed testing scheme are discussed and Monte Carlo simulations are conducted to show the feasibility of the scheme.**

*Index Terms—***Crosstalk, delay, glitch, interconnect.**

## I. INTRODUCTION

The crosstalk-induced noises have been attracting increasing attention as spacing between lines decreases and coupling capacitances increases in the interconnection structure of deep sub-micrometer VLSI. These noises affect the circuit performance in two ways: glitches, which may be captured by end latches to produce erroneous logic values, or signal propagation delay [1], [2]. These *crosstalk* issues should be considered during the design stage for performance, and they should be tested during the manufacture step.

The crosstalk noises on interconnect lines had been modeled and analyzed in many previous works. For example, they were studied by treating the interconnect lines as coupled lossy transmission lines [3], [4], and analyzed numerically [5], [6]. Simulation models for interconnect lines were also reported [7]. Simplified lumped resistance–capacitance (*RC*) model for studying crosstalk noises was proposed and analyzed by many authors [1], [2], [8], [9]. Other issues for crosstalk, including fault avoidance, test generation, test set evaluation, and built-in

Manuscript received March 02, 2007; revised July 09, 2007. First published January 06, 2009; current version published January 14, 2009.

K. S.-M. Li is with the Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan (e-mail: smli@ cse. nsysu.edu.tw; gis88554@cis.nctu.edu.tw).

C.-L. Lee and C. Su are with the Department of Electronics Engineering, National Chiao Tung University, Hsichu, Taiwan (e-mail: cllee@mail.nctu.edu.tw; ccsu@cn.nctu.edu.tw).

J.-E Chen is with the Department of Electrical Engineering, National Central University, Chungli, Taiwan (e-mail: jechen@ee.ncu.edu.tw).

Digital Object Identifier 10.1109/TVLSI.2008.2004548



Fig. 1. Circuit model for the crosstalk analysis.

self-test (BIST) had been reported in [10]–[14]. In addition, various on-chip circuits for the measurement of crosstalk effects have been proposed [15], [16], including measuring glitch amplitude [15], or characterizing the crosstalk effects [16]. A comprehensive review of signal integrity problem, including crosstalk modeling, analysis, and measurement was given in [17].

In general, the two crosstalk effects, i.e., the induced glitches and the induced delay, require different fault detection techniques. A glitch can be easily detected by a pulse detector. On the other hand, delay fault detection requires more complicated at-speed test schemes or special measurement circuit if timing measurement is involved [17], which complicates the testing process and increases the testing cost. In particular, the on-chip measurement of delay effect is rather difficult and needs special circuit [17]. In this paper, we investigate the relationship between crosstalk glitch and induced delay, and identify their respective importance in affecting the circuit performance; and then show that crosstalk induced delay can be tested by detecting glitches whose amplitudes are larger than a predetermined detection threshold. The proposed method is implemented with a very simple circuit, and crosstalk induced delay faults can be detected in the same way as glitches are detected. This result greatly facilitates on-chip detection of crosstalk delay.

## II. CIRCUIT MODEL FOR CROSSTALK

Coupling crosstalk effects are mainly caused by parasitic capacitors between neighboring interconnection lines. Fig. 1 shows the *bus circuit model*, where adjacent wires run in parallel. The middle wire is the *victim* net, while the other two wires are the *aggressor* nets. The wires are driven by inverters served as buffers with characteristic "ON" resistances  $R_{\text{on}-a1}$ ,  $R_{\text{on}-a2}$ , and  $R_{\text{on}-v}$ , respectively. Each wire contains unit-length wire resistance ( $R_{a1}$ ,  $R_{a2}$ , and  $R_v$ ) and capacitance ( $C_{a1}$ ,  $C_{a2}$ , and  $C_v$ ). A coupling capacitance ( $C_{c1}$ ,  $C_{c2}$ ) exists between two adjacent wires, which causes crosstalk effects. The output of each wire is connected to an end inverter, which also serves as a buffer. These end inverters provide the load,  $C_{L1}$ ,  $C_{L2}$ , and  $C_{Lv}$ , for the wires, respectively. Assume that wires are homogenous (i.e.,  $R_{a1} = R_{a2} =$  $R_v = R_w, C_{a1} = C_{a2} = C_v = C_w, C_{L1} = C_{L2} = C_{Lv} = C_L,$ and  $C_{c1} = C_{c2}$ ). All aggressor signals are assumed to be synchronized in order to maximize the crosstalk effects and the skew effects among aggressor lines. In the following analysis, we conduct simulation for the circuit model shown in Fig. 1 with the TSMC 0.18  $\mu$ m 1.8/3.3 V 1P6M technology. Metal 5 layer is used in the simulation, where the wire length is set to be 1 mm, and the wire width and spacing are both set to 0.28  $\mu$ m, which are the minimum values allowed in the technology. A distributed *RC* model is used to characterize the circuit behavior. The power supply  $(V_{\text{DD}})$  is set to 1.8 V, and all transistors are minimum sized.



Fig. 2. Superposition of crosstalk-induced delay.

 $\mathbf{0}$ 

#### III. RELATIONSHIP BETWEEN GLITCHES AND DELAYS

200p

Time (lin) (TIME)

400p

Fig. 2 shows the SPICE simulation results for the circuit in Fig. 1 with the applied inputs. The curves are taken from point  $B$  (i.e., the end of victim line). Two sets of simulations have been carried out: one with nominal coupling capacitance (upper figure), and the other with four times larger (lower figure) coupling capacitance. A curve marked with " $+$ " is the response of the victim line when only a rising transition is applied to the input of the line, and a curve marked with " $\times$ " is the response of the same victim line but with only the crosstalk glitch effect considered, i.e., a static "0" is applied to the victim line. The curve marked with "\*" is the signal on the victim line with both excitations considered, i.e., the victim line is affected by the coupling effect and its own applied rising transition input. The curve  $(*)$  is exactly superposition of the first  $(+)$  and second  $(\times)$  curves in each case. This means that the crosstalk-induced delay is in fact the crosstalk-induced glitch plus the original response on the victim line.

The previous result is obvious since the interconnect part of the circuit is a linear circuit for which the superposition rule holds. The larger the glitch, the larger the induced delay. There is a monotonic relationship between the induced glitches and the induced delay, which is illustrated in Fig. 3. As the amplitude of induced glitch increases, the induced delay also increases. The induced delay is calculated as follows. The propagation delay of a signal is defined as the time between input signal reaching 50% of its final value and output signal reaching 50% of its final value. The induced delay is thus the difference between propagation delays with and without excessive coupling effects.

The monotonic relationship suggests a unified crosstalk detection approach. If the induced glitch is detected, the induced delay can also be detected. For example, in Fig. 3, if the specified delay of the interconnection line is 200 ps, we can detect if there are induced glitch faults with amplitude higher than 0.91 V. We can devise a detector to detect



Fig. 3. Monotonic relationships between the peak of the induced glitch and the induced delay.



Fig. 4. PD with an adjustable threshold by W/L ratio of INV1.

the induced glitches; and once these induced glitches are detected, the corresponding induced delays are detected. Thus, a delay fault can be detected without actual delay measurement.

#### IV. PULSE DETECTOR WITH DETECTION THRESHOLD

A glitch is a pulse, and it can be detected by a pulse detector. In order to detect the crosstalk-induced glitch with a given peak value (amplitude), we should be able to adjust the detection threshold of a detector. A pulse detector (PD) is shown in Fig. 4. The detector consists of two major components: an inverter  $(INV<sub>1</sub>)$  that is used to determine the detection threshold voltage  $(V_{\text{det}})$ , and a pseudo static latch (the remaining part) that is locked to "1" once a pulse is detected. The latch is enabled by  $V_{\text{DD}}$ , which is controlled by a pMOS pass transistor. The two pass transistors are controlled by  $\text{INV}_1$  and they serve as a multiplexer. The detection threshold  $V_{\text{det}}$  of the pulse detector is determined by the W/L values of pull-down nMOS and/or pull-up pMOS in  $\text{INV}_1$ . The latch can be reset by a reset input. Whenever a glitch whose amplitude is higher than the detection threshold  $V_{\text{det}}$ , the pMOS pass transistor will be turned on and the latch output  $q$  is set to "1", indicating a glitch is detected.

The transient behavior of the PD is illustrated in Fig. 5 by SPICE simulation. The upper figure show the input waveform (*in*), which is a glitch generated by 1 mm Metal 5 wires with quadruple coupling as shown in Fig. 2, and the output of  $INV_1$  (*in*<sub> $)$ </sub>. Signal *in*<sub> $-$ </sub> is the inverse of *in*, and since the swing of *in*\_ is large enough to temporarily turn off the nMOS pass transistor while at the same time turn on the pMOS pass transistor, the latch output  $(q)$  changes state successfully, as shown in the lower part of the figure.

Fig. 6 shows the simulated results on the threshold of the detected pulse amplitude  $(V_{\text{det}})$  versus  $(W/L)_{p=\text{mos}}/(W/L)_{n=\text{mos}}$  in INV<sub>1</sub> of the pulse detector. Both pass transistors are minimum sized, with  $W/L = 0.22 \mu m/0.18 \mu m$ . In the gates, except for  $\text{INV}_1$ , all nMOS transistors are minimum-sized with  $L_n = 0.18 \mu m$  and

\* detector with pulse from 1mm metal 5 line, quadruple coupling







Fig. 5. Simulated transient waveform of the PD.



Fig. 6. Simulated relationship between the thresholds of detection threshold voltage  $(V_{\text{det}})$  versus the W/L ratio of the PD.

 $W_n = 0.22 \mu m$ , while in the logic gates the channel widths of pMOS transistors are increased to equalize the rising and falling transition time:  $L_p = 0.18 \mu m$  and  $W_p = 0.50 \mu m$ . In INV<sub>1</sub>,  $L_n = L_p = 0.18 \mu$ m, while  $W_n$  and  $W_p$  are varied to create different detection thresholds  $(V_{\text{det}})$ . The figure shows that the detection threshold of the pulse detector is adjustable by changing the W/L ratio of the pull-up transistors (pMOS) and pull-down (nMOS) in  $\text{INV}_1$ .

In general, many simulations are needed to determine the proper W/L ratios of the MOS transistors for a given detection threshold  $V_{\text{det}}$ . To facilitate the design, a procedure similar to the binary search algorithm can be used to determine the ratio  $r = (W/L)_{p{\text{max}}}/(W/L)_{n{\text{max}}}$  for the target  $V_{\text{det}}$ , except that the geometric mean is used instead of the arithmetic mean in each iteration.

The PD itself becomes a load to the victim line, and hence may impact the pulse detection result. A PD with very large or very small



Fig. 7. Delay (picoseconds) versus skews: (a) 3-D waveforms and (b) 3-D mesh.

detection threshold requires a very wide nMOS or pMOS transistor, which is essentially a large load capacitance. One should always avoid using such detectors.

### V. SOME CONSIDERATIONS FOR UNIFIED DETECTION SCHEME

#### *A. Effect of Skew Between Aggressor and Victim Signals*

In Sections II and III, it is assumed that excitations from aggressor lines are synchronous, and the falling excitation signals are in coincidence with the rising edge of the victim line signal, i.e., there was no skew between signals on the lines. In practice, the signals on aggressor lines and in the victim line may not be in coincidence. One might think that the induced delay on a victim line is maximized when the falling edges of the aggressors' excitation signals and the rising edge of the victim line signal occur at the same time. However, according to [18], the maximum delay actually occurs when there is a little skew between these two signals. To investigate how the relationship between the peak amplitude of the induced glitch and the induced delay is affected by the skew between aggressor lines and the victim line, we conducted simulation for the three-wire system in Fig. 1, considering different skews between aggressor lines and the victim line. Metal 5 layer is used in the simulation, where the wire length is set to be 1 mm, and the wire

Fig. 8. Glitch (Volts) versus skews: (a) 3-D waveforms and (b) 3-D mesh.

width and spacing are both set to 0.28  $\mu$ m. For the output buffers, all nMOS transistors are minimum-sized ( $W/L = 0.22/0.18$ ), while the size of nMOS transistors is 0.5/0.18. To maximize the induced glitch, the widths of transistors in the input buffers are set to  $10\times$  of the minimum values. The results are given in Figs. 7 and 8.

Let the skew between aggressor 1 (aggressor 2) and the victim be denoted as  $SK_1(SK_2)$ . In the simulation, we set  $-80 \text{ ps } \leq SK_1$ ,  $SK<sub>2</sub> \leq +80$  ps. For skews outside this range, the coupling effects on signal delay are negligible. Fig. 6 plots the crosstalk-induced delay on the victim net, which is shown in the  $z$ -axis with the unit in picosecond, versus the aggressor signals' skews (SK<sub>1</sub> and SK<sub>2</sub>), and two figures are given. Fig. 6(a) shows 17 curves for  $SK_1 = -80, -70, \ldots, +80$ . In each curve,  $SK_1$  is fixed while  $SK_2$  is varied from  $-80$  ps to  $+80$  ps. It can be seen that, in each curve, the induced delay increases as  $SK<sub>2</sub>$ moves from  $-80$  ps toward  $+80$  ps, and the maximum value occurs when  $SK_2$  is slightly larger than 0, as reported in [17]. Fig. 6(b) is a 3-D mesh plot over the same set of data. Fig. 7 plots the relationship between the glitch amplitude (measured in  $V$ ) versus skews. It can be seen that, in either case, the plot is symmetric with respect to  $SK_1 = SK_2$ . In Fig. 7, we can see that the maximum glitch amplitude always happens when the aggressor signals are synchronized (i.e.,  $SK_1 = SK_2$ ). The reason is that, the glitch is the results of coupling effect only, and the coupling effect is maximized when both aggressor signals arrive at



Fig. 9. Induced delay versus the peak of the induced glitch for three different Peak Glitch (V)<br>Fig. 9. Induced delay versus the peak of the induced glitch for three different<br>cases: 1)  $SK_1 = SK_2 = 0$ ; 2)  $SK_1 = SK_2 = -80$  ps; and 3)  $SK_1 = 0$ ;  $SK<sub>2</sub> = 45$  ps.

the same time. On the other hand, the maximum delay is affected by the coupling effect as well as the skews. For a fixed  $SK_1$ , the maximum delay occurs when  $SK_2$  is positive. In our experiments, for a fixed  $SK_1$ , the delay is maximized when  $SK_2$  is about  $+30$  to  $+50$  ps.

The results are summarized in Fig. 9 for three different cases: 1)  $SK_1 = SK_2 = 0; 2) SK_1 = SK_2 = -80$  ps; (i.e., both aggressor signals 1 and 2 arrived 80 ps before the victim line signal); and 3)  $SK_1 = 0$ ,  $SK_2 = 45$  ps, (i.e., aggressor 1 has zero skew and aggressor 2 has 45 ps skew with respect to the victim line respectively). Case 1) is the case that the aggressor signals are in coincidence with the victim signal. Case 2) creates the maximum glitch but a small delay, while Case 3) represents the maximum delay but a small glitch. Thus, the relationship between the amplitude of the induced glitch and the induced delay spread as a band instead of a single line.

The spreading of curves in Fig. 9 depends on skews between aggressor lines. However, for a interconnect bus system, it could be safely assumed that these skews will not be large since signals on a bus system usually change simultaneously as a set of bits switch their states at the same time. Also, in the test mode, it could be relatively easy to have every bit in a bus switching at roughly the same time.

In the previous simulation, it is assumed that the aggressors' drivers have the same driving strength. When the drivers' sizes are different, a compact model [19] can be used for delay analysis.

### *B. Process Variation Effect on Pulse Detector*

In Section IV, the detection threshold of the proposed pulse detector is adjusted by the W/L ratio of the input inverter. This W/L values, along with other circuit parameters, can be affected by the manufacture process variation. Hence, the relationship between the threshold of detected pulse amplitude  $(V_{\text{det}})$  and the ratio  $(W/L)_{p\text{~MOS}}/(W/L)_{n\text{~MOS}}$  of the pulse detector is also Monte Carlo simulated. In the simulation, all circuit parameters of the pulse detector are allowed to be varied by 10% with respect to the nominal values. The results are shown in Fig. 10. The relationship becomes a band. A designer should consider the worst-case scenario in order to achieve enough fault coverage, and this will be covered in Section VI.

## VI. MONTE CARLO SIMULATION ON THE UNIFIED DETECTION SCHEME CONSIDERING PROCESS VARIATION

In Fig. 11, we demonstrate the overkill and escape probability with respect to (W/L) ratio of the pulse detector. The "overkill" occurs when a good circuit is identified as faulty by the PD, while "escape" occurs when a faulty circuit is identified to be fault-free. Monte Carlo simulations are carried out to show the process variation effect on both overkill and escape probability. Either process variation or signal skews may cause overkill or escape condition. For example, if the threshold



Fig. 10. Monte Carlo simulation of the threshold of detected pulse amplitude  $(V_{\text{det}})$  with respect to the W/L ratio of the pulse detector.



Fig. 11. Monte Carlo simulation of the escape probability and overkill probability with respect to (W/L) ratio.

is chosen according to the maximum induced delay but with the lowest peak of the induced glitch, i.e., Case 3) in Section V-A there may exist "*Overkills*". As an example, assume that we want to detect an induced delay of 100 ps. According to Fig. 9, the PD threshold is set to about 0.55 V (the curve marked with  $\times$ ). However, if both SK<sub>1</sub> and SK<sub>2</sub> are 0 (the curve marked with  $\blacklozenge$ ), a 0.55 V pulse corresponds to an induced delay smaller than 100 ps. As a result, the overkill condition occurs. On the other hand, if the threshold is chosen to Case 2) in Section V-A there may exist "*Escapes*". Fig. 11 shows simulated probability for overkills and escapes with respect to the  $(W/L)_{p\_mos}/(W/L)_{n\_mos}$  ratio of the pulse detector. With the ratio set to 1, the detection threshold  $V_{\text{det}}$  is 0.91 V, which corresponds to a delay of about 200 ps. When the W/L ratio decreases, so does the detection threshold  $V_{\text{det}}$ , and the probability of overkills increases while the probability of escapes decreases. On the other hand, a larger  $(W/L)_{p\_mos} / (W/L)_{n\_mos}$  ratio produces a detector with a higher threshold  $V_{\text{det}}$ , which increases the probability of escapes but decreases the probability of overkill.

### VII. CONCLUSION

We analyzed the crosstalk effects on the interconnect bus in very deep submicrometer SOC VLSI circuits and proposed a unified detection scheme to test glitches and the crosstalk induced delay at the same time. We found that the crosstalk induced fault in fact is superposition of the crosstalk induced glitch fault and the original applied signal on the victim line. For the unified detection scheme, we proposed a pulse detector whose detection threshold is determined by designers through transistor sizing, and thus it can also detect induced delay whose corresponding glitch amplitude exceeds the given detection threshold. Several issues regarding the detection scheme, such as the amplitude of the induced glitch, the skews between the applied excitation signals on the aggressor line and the victim line, and variations in PD were discussed.

#### **REFERENCES**

- [1] A. Rubio, N. Itazaki, X. Xu, and K. Kinoshita, "An approach to the analysis and detection of crosstalk faults in digital VSLI circuits," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 13, no. 3, pp. 387–395, Mar. 1994.
- [2] W. Chen, S. Gupta, and M. A. Breuer, "Analytic models for crosstalk delay and pulse analysis under non-ideal inputs," in *Proc. Int. Test Conf.*, 1997, pp. 809–818.
- [3] A. E. Zain and S. Chowdhury, "An analytical method for finding the maximum crosstalk in lossless-coupled transmission lines," in *Proc. Int. Conf. Comput.-Aided Des.*, 1992, pp. 443–448.
- [4] S. L. Manney, M. S. Nakhla, and Q. Zhang, "Analysis of non-uniform, frequency dependent high-speed interconnects using numerical inversion of laplace transform," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 13, no. 12, pp. 1513–1525, Dec. 1994.
- [5] H. You and M. Soma, "Crosstalk and transient analysis of high-speed interconnects and packages," *IEEE Trans. Solid-State Circuits*, vol. 26, no. 3, pp. 319–329, Mar. 1991.
- [6] H. You and M. Soma, "Crosstalk analysis of interconnect lines and packages in high-speed integrated circuits," *IEEE Trans. Circuits Syst.*, vol. 37, no. 8, pp. 1019–1026, Aug. 1990.
- [7] K. J. Chang, N. H. Chang, S. Y. Oh, and K. Lee, "Parameterized SPICE subcircuits for multilevel interconnect modeling and simulation," *IEEE Trans. Circuits Syst.*, vol. 39, no. 11, pp. 779–789, Nov. 1992.
- [8] F. Moll and A. Rubio, "Spurious signals in digital CMOS VLSI circuits: A propagation analysis," *IEEE Trans Circuits Syst.*, vol. 39, no. 10, pp. 749–752, Oct. 1992.
- [9] M. Cuviello, S. Dey, X. Bai, and Y. Zhao, "Fault modeling and simulation for crosstalk in system-on-chip interconnects," in *Proc. Int. Conf. Comput.-Aided Des.*, 1999, pp. 297–303.
- [10] A. Vittal and M. Marek-Sadowska, "Crosstalk reduction for VLSI," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 16, no. 3, pp. 290–298, Mar. 1997.
- [11] K. T. Lee, C. Nordquist, and J. A. Abraham, "Automatic test pattern generation for crosstalk glitches in digital circuits," in *Proc. VLSI Test Symp.*, 1998, pp. 34–39.
- [12] Y. Zhao and S. Dey, "Fault-coverage analysis techniques of crosstalk in chip interconnects," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 22, no. 6, pp. 770–782, Jun. 2003.
- [13] T. Sakurai, "Closed-form expressions for interconnection delay, coupling, and crosstalk in VLSI's," *IEEE Trans. Electron Devices*, vol. 40, no. 1, pp. 118–124, Jan. 1993.
- [14] K. Sekar and S. Dey, "LI-BIST: A low-cost self-test scheme for SoC logic cores and interconnects," in *Proc. VLSI Test Symp.*, 2002, pp. 417–422.
- [15] J. A. Sainz, M. Roca, R. Munoz, J. A. Maiz, and L. A. Aguado, "A crosstalk sensor implementation for measuring interferences in digital CMOS VLSI circuits," in *Proc. On-Line Test. Workshop*, 2000, pp. 45–51.
- [16] F. Caignet, S. Delmas-Bendhia, and E. Sicard, "On the measurement of crosstalk in integrated circuits," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 8, no. 5, pp. 606–609, Oct. 2000.
- [17] F. Caignet, S. Delmas-Bendhia, and E. Sicard, "The challenge of signal integrity in deep-submicrometer CMOS technology," *Proc. IEEE*, vol. 89, no. 4, pp. 556–573, 2001.
- [18] S. Delmas-Bendhia, F. Caignet, E. Sicard, and M. Roca, "On-chip sampling in CMOS integrated circuits," *IEEE Trans. Electromag. Compatib.*, vol. 41, no. 4, pp. 403–406, Nov. 1999.
- [19] J. L. Rossello and J. Segura, "A compact model to identify delay faults due to crosstalk," *Proc. Des., Autom., Test Eur.*, vol. 1, pp. 6–10, 2006.

# **CMOS Driver-Receiver Pair for Low-Swing Signaling for Low Energy On-Chip Interconnects**

## José C. García Montesdeoca, Juan A. Montiel-Nelson, and Saeid Nooshabadi

*Abstract—***This paper describes the design of symmetric low-swing driver-receiver pairs (***mj-sib***) and (***mj-db***) for driving signals on the global interconnect lines. The proposed signaling schemes were implemented** on 1.0 V 0.13- $\mu$ m CMOS technology, for signal transmission along a **wire-length of 10 mm and the extra fan-out load of 2.5 pF (on the wire). The** *mj-sib* **and** *mj-db* **schemes reduce delay by up to 47% and 38% and energy-delay product by up to 34% and 49%, respectively, when compared with other counterpart symmetric and asymmetric low-swing signaling schemes. The other key advantages of the proposed signaling schemes is that they require only one power supply and threshold voltage, hence significantly reducing the design complexity. This paper also confirms the relative reliability benefits of the proposed signaling techniques through a signal-to-noise ratio (SNR) analysis.**

*Index Terms—***Bus drivers, bus receivers, digital CMOS, interconnect signaling, level converters, low energy, low-voltage, performance tradeoffs.**

#### I. INTRODUCTION

An ever increasing energy budget in the integrated circuits comes from the interconnect wires (busses, global clocks, and timing signals) and associated driver and receiver circuitries. In some gate array design styles power dissipation from the interconnect wires amounts to up to 40% [1] of the total on-chip power dissipation. On the field-programmable gate array (FPGA) fabric the reported power dissipation from interconnect wires is up to 90% [2]. Interconnect is also a dominating factor in the chip performance and robustness [3], [4].

To achieve power reduction and energy  $\times$  delay efficiency on the global interconnects reducing the voltage swing of the signal on the wire is the most effective way. However, reducing the voltage swing generally comes at the expense of reduced reliability and performance and increase in the driver and receiver complexity. Signal reliability and integrity effects include interconnect delay, cross talk, transmission line effects, substrate coupling, power supply integrity, and noise-on-delay effects [4].

Most low-swing voltage techniques to-date [1], [5] rely on extra power supply, or reference voltage, multiple threshold process technology, large area penalty, and multiple wire interconnects when differential signaling is employed [6]. They also suffer from large short-circuit current problem, long propagation delay, and high power dissipation [1], [5]. Due to reduction in the voltage swing, drivers for the lowswing voltage signaling schemes generally do not provide sufficient driving capability for the larger loads. In order to improve the driving capability, some driver circuits rely on bootstrapping techniques [7], [8]. However, these circuits require extra bootstrapping capacitors, and generally need access to the well terminals that may not be readily available in many digital CMOS processes.

Manuscript received May 08, 2007; revised October 15, 2007. First published January 06, 2009; current version published January 14, 2009.

J. C. García Montesdeoca and J. A. Montiel-Nelson are with the Institute for Applied Microelectronics, University of Las Palmas de Gran Canaria, E-35017 Las Palmas de Gran Canaria, Spain (e-mail: jcgarcia@iuma.ulpgc.es; montiel@iuma.ulpgc.es).

S. Nooshabadi is with the Department of Information and Communications, Gwangju Institute of Science and Technology, GIST, Gwangju, Republic of Korea (e-mail: saeid@gist.ac.kr).

Digital Object Identifier 10.1109/TVLSI.2008.2004549