No difficulties with numerical stability or convergence have been observed when the model is used with WATAND. On the other hand, expected thermal instabilities occur with some circuits and can be studied with the self-heating model.

It should be noted that the increase in model complexity over the standard GP model causes some increase in CPU time. For this work with the current mirror, CPU time for a single dc solution was a fraction of a second with the standard model versus a little over one second with the self-heating model. No rigorous study of CPU time was done since such studies depend upon the simulator and techniques used with it [8].

The self-heating capability adds two nodes and two current variables to the basic GP model for an increase of four variables for each BJT. The number of variables is six for the standard GP model versus ten for the self-heating model.

## VI. CONCLUSION

This paper has presented a self-heating Gummel-Poon BJT model and has demonstrated it with discrete and IC BJT current-mirror circuits. The discrete simulations showed error with respect to experimental measurements of 6.1% or less. In contrast, the standard GP model produced errors up to 84%. The IC current-mirror simulation showed the expected current tracking with the self-heating model, which also calculated the transistors' junction temperatures.

This work shows, as others have also observed [2], [4], that including self-heating effects in the modeling of devices like the BJT can mean the difference between good simulation or totally incorrect results.

## REFERENCES

- I. E. Getreu, Modeling the Bipolar Transistor. New York: Elsevier Scientific, 1978.
- [2] K. Fukahori and P. R. Gray, "Computer simulation of integrated circuits in the presence of electrothermal interaction," *IEEE J. Solid-State Circuits*, vol. SC-11, pp. 834–846, Dec. 1976.
- [3] M. Latif and P. R. Bryant, "Network analysis approach to multidimensional modeling of transistors including thermal effects," *IEEE Trans. Computer-Aided Design*, vol. CAD-1, pp. 94-101, Apr. 1982.
- [4] R. S. Vogelsong and C. Brzezinski, "Extending spice for electrothermal simulation," in *Proc. IEEE Custom Integrated Circuits Conf.*, May 1989, pp. 21.4.1-21.4.4.
- [5] I. N. Hajj, K. Singhal, J. Vlach, and P. R. Bryant, "WATAND-A program for the analysis and design of linear and piecewise-linear networks," in *Proc. 16th Midwest Symp. Circuit Theory* (Waterloo, Ont., Canada), vol. 1, Apr. 1973, pp. VI.4.1-VI.4.9.
- [6] P. R. Bryant, H. J. Strayer, and M. Vlach, "Watand User's Manual V1.11.00," Univ. Waterloo, Waterloo, Ont., Canada, Tech. Rep. UWEE 87-01, Sept. 1987.
- [7] F.-Q. Ye, "A BJT model with self heating for WATAND computer simulation," M.S. thesis, Youngstown State Univ., Youngstown, OH, June 1990.
- [8] H. J. Strayer, D. J. Roulston, and P. R. Bryant, "DC solution speed in piecewise linear network analysis programs," *Electron. Lett.*, vol. 22, pp. 165–166, Jan. 30, 1986.

# Latched CMOS Differential Logic (LCDL) for Complex High-Speed VLSI

Chung-Yu Wu and Kuo-Hsing Cheng

Abstract —A new CMOS differential logic, called the latched CMOS differential logic (LCDL), is proposed and analyzed. LCDL circuits can implement a complex combinational logic function in a single gate, and form the pipeline structure as well. It is shown that the LCDL with a fan-in number between 6 and 15 has the highest operation speed among those differential logic circuits. It is also free from charge-sharing, clock-skew, and race problems. Experimental results have verified the high speed and race-free performance of the proposed LCDL.

Manuscript received December 5, 1990; revised April 23, 1991. The authors are with the Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Hsin-Chu, Taiwan, Republic of China. IEEE Log Number 9101785.

## I. INTRODUCTION

THE implementation of large and complex combinational logic functions using differential CMOS logic circuits [1]-[4] has certain advantages over that using the conventional single-ended-output CMOS logic. The differential CMOS logic circuits simultaneously provide complementary function outputs, thus offering a high logic flexibility. They could reduce wiring complexity and improve packing density and speed [5].

Among the proposed differential CMOS logic circuits [1]–[3], static cascode voltage switch logic (CVSL) [1] improves the packing density by cascoding NMOS transistors into differential logic tree networks within a single

0018-9200/91/0900-1324\$01.00 ©1991 IEEE

gate. Clocked CVSL [1], a variation of static CVSL, was proposed for dynamic circuits. The sample-set differential logic (SSDL) [2] is derived from the dynamic CVSL. It improves the operation speed by adding a latching sense amplifier between the two output nodes. But it has a static power dissipation in the sample phase. Enabled/ disabled CMOS differential logic (ECDL) [3] was proposed to implement iterative networks. But its applications are limited by its low driving capability. The design procedures of differential cascode voltage switch (DCVS) circuits were also developed to contract the differential trees for the implementation of random logic functions [4].

A new differential CMOS dynamic logic called the latched CMOS differential logic (LCDL) is proposed in this work. The LCDL has two basic versions, pseudotwo-phase type LCDL and pseudo-one-phase LCDL. LCDL circuits can implement a complex combinational logic function in a single gate and achieve high operation speed and high driving capability without static power dissipation. Moreover, the pipeline structure of the LCDL circuits is shown to have no charge-sharing problem and no race problem.

# II. CIRCUIT TECHNIQUES AND CLOCKING STRATEGY A. Pseudo-Two-Phase LCDL

The schematic diagram of the pseudo-two-phase LCDL circuit is shown in Fig. 1. It consists of five major components: 1) the differential cascode NMOS logic tree, which performs the complex logic function; 2) the five-transistor latching sense amplifier M1-M5; 3) two precharge transistors M6 and M7; 4) the control MOS transistors M9 and M10, which isolate the sense amplifier from the NMOS logic tree, and the control transistor M8, which activates the logic tree; and 5) the dynamic clocked CMOS (C<sup>2</sup>MOS) output latches [6] M11-M18, which enable the LCDL to form the pipeline connection. The clock timing diagram of the pseudo-two-phase LCDL is shown in Fig. 1(b).

There are four kinds of operations, namely, precharge, sample, set, and hold. When the clocks  $\overline{\phi}_1$  and  $\phi_2$  are low in this gate, transistors M6, M7, M9, and M10 are turned on and nodes A and B are precharged to  $V_{DD}$ . The gate is operated in the precharge phase. In this phase, the previous output states are held at Q and  $\overline{Q}$  by the  $C^2MOS$  output latches [6], [7]. Because transistor M8 is turned off, the path from  $V_{DD}$  to ground is off and there is no static power dissipation in this phase. In the sample phase, clock  $\overline{\phi}_1$  raises to high. Transistors M6 and M7 are turned off and M8 is turned on. A path between node A or B and the ground is formed. It pulls that node toward ground while the other node is held at  $V_{DD}$ . This generates a small voltage difference between nodes A and B. When  $\overline{\phi}_1$  and  $\phi_2$  are high and  $\overline{\phi}_2$  is low, this gate is operated in the set phase. Transistor M5 is turned on to activate the sense amplifier whereas transis-



Fig. 1. (a) Schematic diagram of the pseudo-two-phase LCDL circuit. (b) Clock timing waveforms of the pseudo-two-phase LCDL circuit. (c) The pipeline connection of the pseudo-two-phase LCDL.

tors M9 and M10 are turned off to isolate nodes A and B from the NMOS complex logic tree. This decreases the capacitive loading of the sense amplifier. Through the regeneration of the sense amplifier, node A or B with a lower voltage is discharged quickly to ground and the other node is pulled up to  $V_{DD}$ . At the same time, the voltages at nodes A and B are read out to nodes  $\overline{Q}$  and Q. As  $\phi_2$  goes low, this gate is in the hold phase, and the output data Q and  $\overline{Q}$  are held by the C<sup>2</sup>MOS latches until the next set phase.

The pipeline connection of the pseudo-two-phase LCDL is depicted in Fig. 1(c). It contains two stages, the  $\phi_1$  stage and the  $\phi_2$  stage. The circuit structure and the clock timing of the  $\phi_2$  stage are the same as those shown in Fig. 1(a). The schematic diagram of the  $\phi_1$  stage is the same as the  $\phi_2$  stage, but with the clock signals  $\overline{\phi}_1, \phi_2$ , and  $\overline{\phi}_2$  being replaced by  $\overline{\phi}_2, \phi_1$ , and  $\overline{\phi}_1$  stage is operated in the precharge phase, its previous stage, the  $\phi_2$  stage, is operated in the set phase. At the end of the precharge phase, the inputs of the  $\phi_1$  stage have been readily set by the  $\phi_2$  stage and remain unchanged afterwards. Thus, the  $\phi_1$  stage.

#### IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 9, SEPTEMBER 1991



Fig. 2. (a) Timing diagram showing the clock skew when clocks  $\phi_2$  and  $\overline{\phi}_2$  lead (lag)  $\phi_1$  and  $\overline{\phi}_1$ . (b) Timing diagram showing the clock skew when  $\overline{\phi}_2$  lags  $\phi_2$ . (c) Timing diagram showing the clock skew when  $\overline{\phi}_2$  leads  $\phi_2$ .

The pseudo-two-phase LCDL is also completely free from the clock-skew problem. To show this, all possible clock-skew cases are considered and analyzed. First, if propagation delay exists between clocks  $\phi_1$  and  $\phi_2$ , which causes the clock skew as shown in Fig. 2(a),  $\phi_2$  and  $\overline{\phi}_2$ lead (or lag)  $\phi_1$  and  $\overline{\phi}_1$ . This skew decreases the duration of the sample phase of the  $\phi_2$  ( $\phi_1$ ) stage from T to t1 (t4). This problem can be solved by designing a suitable clock cycle with enough tolerance. Thus this kind of clock skew does not affect normal operations.

Another type of skew is shown in Fig. 2(b) where clock  $\overline{\phi}_2$  lags  $\phi_2$ . Since the  $\phi_2$  stage contains both clocks  $\phi_2$ and  $\phi_2$  and the  $\phi_1$  stage does not, this skew problem affects the  $\phi_2$  stage only. As shown in Fig. 2(b), both  $\phi_2$ and  $\overline{\phi}_2$  are high during time t1. This turns on transistor M5 in Fig. 1(a) and activates the sense amplifier. Meanwhile, transistors M9 and M10 are also turned on, thus the sense amplifier will see the loading of the NMOS complex logic tree. This slows down the regeneration of the sense amplifier, but does not affect the correct result. During time t2, transistors M9 and M10 are turned off as in the normal operation. Thus the regeneration rate of the sense amplifier increases to the normal value and the voltages at nodes A and B are read out to nodes  $\overline{Q}$  and Q. During time t3, both  $\phi_2$  and  $\overline{\phi}_2$  are low. The NMOS transistors of the dynamic C<sup>2</sup>MOS output latches are turned off while the PMOS transistors are still on. This does not alter the voltages at nodes  $\overline{Q}$  and Q. After t3, Q and  $\overline{O}$  still can latch the previous results. This proves the



Fig. 3. (a) Schematic diagram and the corresponding clock timing of the pseudo-one-phase LCDL circuit. (b) The pipeline connection of the pseudo-one-phase LCDL.

immunity of the circuit to this type of clock skew.

Fig. 2(c) shows the third type of clock skew where  $\phi_2$ lags  $\overline{\phi}_2$ . During t1, both  $\phi_2$  and  $\overline{\phi}_2$  are low. Transistors M9 and M10 are turned off and transistor M5 is also off. Thus the small voltage difference generated in the sample phase is still held at nodes A and B. During time t2, transistor M5 is turned on and activates the sense amplifier as in the normal operation. At this time the voltages of nodes A and B are read to output nodes  $\overline{Q}$  and Q. During time t3, both  $\phi_2$  and  $\overline{\phi}_2$  are high. The loading of the NMOS complex logic is added to the sense amplifier. This does not affect the output voltages since definite results have been generated. Meanwhile, the C<sup>2</sup>MOS latches still hold the same results. Thus, this type of skew does not cause logic fault.

From the above analysis, it is evident that the pseudotwo-phase LCDL has no clock-skew and race problems.

## B. Pseudo-One-Phase LCDL

The pseudo-one-phase LCDL circuit is shown in Fig. 3. As compared to the pseudo-two-phase LCDL, it has a simpler clock scheme and fewer MOS transistors. Moreover, it requires only one clock line in each gate. The pseudo-one-phase LCDL has two phases of operations, namely, the precharge phase and the evaluation phase. As the clock signal goes low, M6 and M7 are turned on and this circuit is in the precharge phase. Nodes A and B are precharged to  $V_{DD}$ . Meanwhile, Q and  $\overline{Q}$  are held in the previous output state by the modified C<sup>2</sup>MOS output



Fig. 4. The SPICE-simulated minimum clock periods of the NORAtype pipeline structures with a multi-input NAND gate in each stage for the clocked CVSL, two-stage clocked CVSL, SSDL, ECDL, pseudotwo-phase LCDL, and pseudo-one-phase LCDL circuits.



Fig. 5. Chip photograph of the fabricated 15-input pseudo-two-phase LCDL NAND gate.

latches [7]. In the evaluation phase, the clock signal raises to high and M5, M8, M10, and M13 are turned on. A path exists from node A or B to ground through one side of the differential NMOS cascode tree. This leads to a voltage difference between nodes A and B, which causes the sense amplifier to trip. Thus, the node (A or B) with the lower voltage is discharged rapidly to ground while the other node remains at  $V_{DD}$ .

The pipeline connection of the pseudo-one-phase LCDL is shown in Fig. 3(b). The circuit schematic diagram and the corresponding clock timing of the  $\phi$  stage are the same as those shown in Fig. 3(a). The circuit structure of the  $\overline{\phi}$  stage is similar to that of the  $\phi$  stage but with the clock signal  $\phi$  replaced by  $\overline{\phi}$ . Based upon similar considerations in the pseudo-two-phase LCDL, it can be shown that this circuit also has no charge-sharing problem and is free from clock-skew and race problems.



(a)



Fig. 6. Photograph of input voltage, output voltage, and clock waveforms of the fabricated LCDL NAND gate shown in Fig. 5: (a) when operated by normal clocks, and (b) when a 5-ns clock skew occurs so that  $\phi_2$  lags  $\overline{\phi}_2$ .

### III. COMPARISON AND EXPERIMENTAL VERIFICATION

#### A. Comparisons

The speed comparisons of the NORA-type pipeline structures with a multi-input NAND gate in each stage for the clocked CVSL, two-stage clocked CVSL, ECDL, SSDL, pseudo-two-phase LCDL, and pseudo-one-phase LCDL are shown in Fig. 4, where SPICE-simulated minimum clock periods as a function of the fan-in number of the NAND gate are plotted. The logic gate has a unity fan-out number and a 0.2-pF output capacitive load. The 0.2 pF is equivalent to a fan-out number of 8. These SPICE simulation results are based upon the device parameters of the 1.2-µm CMOS process. Since the clocked CVSL can form a multistage domino-type structure in each pipeline section [5], it is also separated into two stages in the comparison of Fig. 4. For example, the 12-input NAND gate is separated into three 4-input NAND gates and one 3-input NAND gate. These simulation results in Fig. 4 are denoted as the two-stage clocked CVSL. Generally, multistage clocked CVSL is faster than the clocked CVSL, but the device count is higher.

From Fig. 5 it is seen that the operation speed of the LCDL is the fastest in complex logic application with a fan-in number smaller than 15 and greater than 6. For







Fig. 7. (a) Chip photograph of the fabricated ten-input pseudo-onephase LCDL NAND gate. (b) Photograph of input voltage, output voltage, and clock waveforms of the fabricated ten-input pseudo-one-phase LCDL NAND gate.

those gates with low fan-in numbers, the SSDL's speed is similar to the LCDL's. But the SSDL has considerable dc power dissipation. Moreover, for those gates with a fan-in number less than 6, the LCDL has no major benefits because of the required overhead devices. For those gates with a fan-in number beyond 15, the two-stage clocked CVSL tends to be faster than the LCDL. Thus, in this case, the multistage design in LCDL could be considered as a compromise between speed and device count.

### **B.** Experimental Verification

Several experimental circuits were designed and fabricated to verify part of the simulated results on the proposed logic circuits. The experimental chips were fabricated in a  $3.5-\mu m$ , single-metal, single-poly, p-well CMOS process. The test circuit of the pseudo-two-phase LCDL is a 15-input NAND gate. Fig. 5 shows the chip photograph of this test circuit. From the measurement results shown in Fig. 6(a), this test circuit can work at a clock period of 24 ns. This result is in agreement with the SPICE-simulated minimum clock period of 24 ns. As shown in Fig. 6(b), the fabricated 15-input pseudo-two-phase LCDL NAND gate can also work correctly even if there is a 5-ns clock skew so that  $\overline{\phi}_2$  lags  $\phi_2$  and the pulse widths of  $\phi_2$ and  $\overline{\phi}_2$  are only 6 ns. This verifies that the pseudo-twophase LCDL circuit has no clock-skew problem.

Fig. 7(a) shows the chip photograph of the fabricated pseudo-one-phase LCDL ten-input NAND gate. Fig. 7(b) shows the measurement results of the fabricated ten-input pseudo-one-phase LCDL NAND gate. It is seen that this circuit can work with a clock rate of more than 60 MHz, which is consistent with the SPICE-simulated maximum clock rate of 62.5 MHz.

## **IV.** CONCLUSIONS

In this paper, new differential CMOS logic circuits called the latched CMOS differential logic (LCDL) circuits are proposed and analyzed. The circuits can implement complex random logic functions and achieve a high operation speed. Moreover, the proposed logic circuits have no static-power-dissipation, no charge-sharing, and no clock-skew problems in the pipeline structure. The performance of the proposed LCDL circuits has been partly verified through an experimental chip.

#### References

- L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: A differential CMOS logic family," in *ISSCC Dig. Tech. Papers*, Feb. 1984, pp. 16–17.
- [2] T. A. Grotjohn and B. Hoefflinger, "Sample-set differential logic (SSDL) for complex high-speed VLSI," *IEEE J. Solid-State Circuits*, vol. SC-21, pp. 367-369, Apr. 1986.
- [3] S.-L. Lu, "Implementation of iterative networks with CMOS differential logic," *IEEE J. Solid-State Circuits*, vol. 23, pp. 1013-1017, Aug. 1988.
- [4] K. M. Chu and D. L. Pulfrey, "Design procedures for differential cascode voltage switch circuits," *IEEE J. Solid-State Circuits*, vol. SC-21, pp. 1082-1087, Dec. 1986.
- [5] K. M. Chu and D. L. Pulfrey, "A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional logic," *IEEE J. Solid-State Circuits*, vol. SC-22, pp. 528-532, Aug. 1987.
- [6] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS calculator circuitry," *IEEE J. Solid-State Circuits*, vol. SC-8, pp. 462–469, Dec. 1973.
- [7] J. Yuan, I. Karlesson, and C. Svensson, "A true signal-phase-clock dynamic CMOS circuit technique," *IEEE J. Solid-State Circuits*, vol. SC-22, pp. 899-901, Oct. 1987.
- [8] J. A. Pretorius, A. S. Shubat, and C. A. T. Salama, "Latched domino CMOS logic," *IEEE J. Solid-State Circuits*, vol. SC-21, pp. 514-522, Aug. 1986.