#### 1351

# BRIEF PAPER A Low-Power Level-Converting Double-Edge-Triggered Flip-Flop Design\*

## Li-Rong WANG<sup>†a)</sup>, Member, Kai-Yu LO<sup>†</sup>, and Shyh-Jye JOU<sup>†</sup>, Nonmembers

key words: double-edge-triggered, flip-flop, level-converting, sense amplifier, mixed threshold voltage

## 1. Introduction

Reducing power consumption is a key design goal for present semiconductor chips [1]. Many techniques have been proposed for achieving this goal. Among them, the clustered voltage scaling (CVS) technique was developed to trade speed for power by using low voltages for noncritical paths to facilitate power reduction [2]–[4]. In the CVS scheme, level converters are required when a gate that uses a lower voltage (VDDL) interfaces a gate that uses a higher voltage (VDDH) between various voltage blocks. If voltage crossing occurs on a clock cycle boundary, the level converter overhead can be merged into the pipelining flipflop to reduce the delay and power consumed. Therefore, incorporating a flip-flop with a level converter is appropriate to reduce power consumption and delay overheads [5].

In addition, to scale the supply voltage, a double-edge clocking technique is included to halve the clock power. A double-edge-triggered flip-flop (DETFF) [6]–[8] allows the same data throughput as that of a single-edge-triggered flip-flop (SETFF), but with only a half clock rate [9], [10]. This is because DETFFs sample data at both the rising and falling edges of the clock, enabling them to operate at the double speed of an SETFF. Previously, because it requires an ideal 50% duty cycle, it is less attractive. However, recently, simple and efficient designs for duty cycle correction have been proposed, for example in [11] to ease this extreme requirement. Moreover, the mixed-threshold voltage (MVT) CMOS design technique allows varying thresholds within a

Manuscript received January 3, 2013.

Manuscript revised April 30, 2013.

<sup>†</sup>The authors are with the the Department of Electronics Engineering, National Chiao Tung University, HsinChu 30010, Taiwan, R.O.C.

\*This work was supported in part by National Chip Implementation Center and TSMC university shuttle program.

a) E-mail: lrwang.ee92g@nctu.edu.tw

DOI: 10.1587/transele.E96.C.1351

logic gate to reduce the power consumption [12]–[14].

This paper describes a DETFF design that adopts a sense amplifier (SA) architecture which implicitly incorporates the level-converting capacity to function under a lower clock rate with a lower data input voltage. The flip-flop is also implemented with the MVT process on its critical path, to show that its power can be further reduced.

## 2. Level-Converting Double-Edge-Triggered Flip-Flops

## 2.1 Conventional Differential Static DETFF

Figure 1(a) shows the conventional differential static DETFF (DS\_DET) [7], which comprises two differential SETFFs connected in a parallel configuration. Both SETFFs have a pair of back-to-back inverters as the master stage and a C<sup>2</sup>MOS tri-state buffer as the slave stage, respectively. Their behavior is similar to that of a traditional 6T SRAM cell. When CK/CKb is at logic 0/1, Master\_1/Master\_2 is active/inactive because of the turning-on/off of N5/N6, respectively. Depending on the inputs of D and Db, node J is evaluated as Db, and node K retains the originally stored



**Fig.1** (a) Conventional DS\_DET; and (b) input and clock parts connected to the VDDL to build a implicit level-converting DS\_DET.

data. Furthermore, Slave\_1/ Slave\_2 is inactive/active, and Q is determined by the output of Slave\_2, which is the stored data. When the clock makes a positive edge transition, CK/CKb is then at logic 1/0. Therefore, the behavior of SETFF\_1 and of SETFF\_2 is exchanged. Q is consequently determined by Slave\_1, and the data stored at node J are transported to the output. When the clock makes a negative transition, the role of SETFF\_1 and SETFF\_2 is exchanged, exhibiting alternative sampling and transporting behavior. The main advantage of this configuration is the ability to avoid stacking PMOS transistors over NMOS transistors, and the prevention of floating nodes. In addition, because of its push-pull characteristics, the back-to-back inverter has the potential to have embedded voltage levelconverting capability. However, this type of DETFF has a disadvantage; that is, a relatively large crossover current exists in the internal nodes, causing significant delays and high power consumption. Moreover, when used as an embedded level converter, low-voltage inputs reduce the driving ability of NMOS transistors to pull down internal nodes. The struggles between the PMOS pull-up devices and the NMOS pull-down devices hinder the logic state switches of this circuit during state transitions. Therefore, the delay is larger. The contention is aggravated when the voltages of the clock and input (VDDL) are significantly lower than the supply voltage of DETFF (VDDH), leading to a possible sampling malfunction.

In addition, DS\_DET can be transferred to a levelconverting DETFF by connecting the power supply of DS\_DET with VDDH and the clock and data input of buffers/inverters with VDDL, as shown in Fig. 1(b), where the shaded gates are powered by VDDL. The sampling data voltage is thus shifted to VDDH during the master stage. However, this poses a potential risk for this flip-flop when it performs level conversions of a CVS design; a direct current (DC) path may exist between two slave stages when low-voltage clock signals are used as control signals. The top-left corner of Fig. 1(b) shows the detail circuit of the tribuffers for this case. When node J is in a low state (i.e., GND), the PMOS transistor MP1 is turned on. Transistor MP2 may also be turned on if the value of (VDDH-VDDL) exceeds the threshold voltage of MP2. Therefore, if K is in a high state (i.e., VDDH), MN3 and MN4 are possible to be turned on. Thus, a DC path from VDDH to the ground through MP1, MP2, MN3 and MN4 occurs, which causes extra power.

## 2.2 Sense-Amplifier-Based Level-Converting DETFF

To produce an efficient level converter with lower power consumption and delay overheads, a sense-amplifier-based DETFF (SA\_DET) as shown in Fig. 2, which integrates a level-converting circuit, is first developed. For this circuit, the structure is the same as that of DS\_DET but the inverters of master stages are replaced with sense amplifiers. When CK/CKb is at logic 0/1 and D/Db is at logic 1/0, Master\_1/Master\_2 is active/inactive. N1 is turned off to disable



Fig. 2 Sense-amplifier-based level-converting DETFF SA\_DET.



Fig.3 The proposed level-converting DETFF: (a) SA\_DET\_T; and (b) standard-V<sub>t</sub> cells substituted with LVT cells to construct SA\_DET\_TM.

the discharge path of node I. N2 is turned on to pull down node J and P1 is turned-on to charge node I subsequently. Thereby, a positive feedback loop disables the corresponding PMOS P2 to terminate the charging behavior.

This design has two advantages: (1) Two NMOS transistors are eliminated, which saves both the area and power. And (2) the positive feedback configuration makes turnon/off of transistors fast, thus making charging and discharging of internal nodes I, and J quickly. This shortens the transparency interval of the state transition of the circuit, easing the problem of the relatively large crossover current of the circuit mentioned in Sect. 2.1 for DS\_DET. Thus, this circuit has a lower power operation.

## 2.3 Sense-Amplifier-Based Level-Converting DETFF Adopting Transmission Gates

The above SA\_DET can be further improved as proposed in Fig. 3(a) to eliminate the leakage problem mentioned in Fig. 1(b), thus enabling it to function correctly with lower data and clock input voltages. For this SA\_DET\_T design, the SA-based level-conversion stage is moved to the slave stage. The master stage comprises a pair of transmission gates controlled by CK and CKb. DC path(s) then do not appear because the data inputs, clock signals, and power supply are connected to VDDL in the master stages. The wired-or compaction of SAs is possible because only one SA is active at a time. An additional benefit of SA\_DET\_T is that it resolves the SA\_DET charge-sharing problem caused by input data glitches. For the SA\_DET, when the internal node is at the logic state "0" while the master stage is floating and inactive, a charge-sharing problem results if input data are changed during this inactive period. For this SA\_DET\_T, it dispatches the original data input to various phases (D1/Db1/D2/Db2) to ensure a path for the current to leak away to the ground to inhibit the floating of internal nodes, thus eliminating the charge-sharing problem.

#### 2.4 Mixed-V<sub>t</sub> Approach for Low-Power Designs

Circuits implemented with MVT transistors have less power dissipation than those with only single-V<sub>t</sub> (SVT) transistors, which offer a comparable performance [14]. Combining MVT and CVS techniques can reduce power or enhance circuit performance further [13]. Therefore, the concept of MVT can be applied to SA\_DET\_T to form the SA\_DET\_TM circuits, as shown in Fig. 3(b). Inverters and transmission gates of the sampling stage, which are powered by VDDL, are substituted with low-V<sub>t</sub> (LVT) devices. For the SA latches, NMOS transistors N1 to N6 are replaced with LVT transistors. Moreover, transistors of the output inverter are replaced with LVT transistors to enhance performance further.

#### 3. Implementation and Simulation Results

#### 3.1 DETFF Implementations

The proposed sense-amplifier-based DETFFs are implemented using both SVT and MVT technology. The preliminary sizing strategy is explained below. The width of the NMOS transistor  $w_n$  is set as the parameter of interest. The size of the PMOS located on the critical path is maintained at a certain ratio to  $w_n$ . This ratio is determined by balancing the rising and falling edges of the output waveform of a typical inverter. For comparison, a level-converting version of DS\_DET [7] is also implemented.

## 3.2 Test Bench and Simulations

The simulation results are obtained from HSPICE simulations in the TSMC 130-nm CMOS process, under the "typical corner" operating condition. The referenced VDDL-to-VDDH ratio is 70% [4]. Therefore, VDDH is 1.2 V and VDDL = 0.84 V at a temperature of 25°C. The standard-V<sub>t</sub> is 0.33 V, and a low-V<sub>t</sub> is 0.24 V. Additionally, input driving inverters are designed to provide realistic clock and data signals to DETFFs, and fan-out-of-four (FO4) inverters are used as the standard load of DETFFs. All DETFFs are operated at a 1G data rate with a 500 MHz clock rate.

Three timing parameters are specified. The first parameter is the setup time of a register  $(T_{su})$ , and the second parameter is the time between the clock edge and the output edge  $(T_{cq})$ . The method introduced in [15] is used to measure  $T_{su}$  and  $T_{cq}$ . The insertion delay  $T_{ins}$  of a register, which is the summary of  $T_{su}$  and  $T_{cq}$ , is the delay overhead caused by the level-converting and pipelining register. To elucidate the low-power energy-efficient characteristics of the proposed circuit, the power-delay product (PDP) is selected as the figure of merit to compare all the DETFFs. Additionally, several voltage parameters are defined, that is,  $V_{\text{clock}}$ ,  $V_{\text{data}}$ , and  $V_{\text{out}}$  as the clock input voltage, data input voltage, and output voltage, respectively. All DETFFs are supplied by VDDH in Experiment 1. Experiments 2 and 3 focus on the level-converting function in a CVS environment. In Experiment 2, V<sub>data</sub> is changed to VDDL, and the output is ideally converted to VDDH. In Experiment 3,  $V_{\text{clock}}$  is further degraded to VDDL. All simulation results are shown in Table 1.

Table 1 indicates that DS\_DET has the highest power consumption of all DETFFs. In Experiment 1, SA\_DET and SA\_DET\_T improve the power consumption over DS\_DET by 29% and 50%. SA\_DET and SA\_DET\_T achieve power savings of 35% and 57% respectively over DS\_DET in Experiment 3. Unlike DS\_DET and SA\_DET, SA\_DET\_T inhibits static currents when VDDL clock signals drive PMOS transistors powered by VDDH. This leads to greater power reduction, and subsequently the smallest PDP. In Experiment 1 to 3, SA\_DET\_T improves the PDP reductions over DS\_DET by 47%, 59%, and 64% respectively. With the MVT technique, DETFFs offer further improvements in power consumption and delays. Table 1 shows that SA\_DET\_TM offers PDP reductions of 56%, 72%, and 78% respectively over those of DS\_DET.

Furthermore, VDDH sweepings are simulated. The power consumption and insertion delay dependencies of VDDH are shown in Figs. 4(a) ~ (c). For example, Fig. 4(c) shows the simulation result of  $V_{clock}/V_{data} = VDDL/VDDL$ , which ranged between 0.63 V and 0.84 V, and  $V_{out}$  is converted upward to 0.9 V to 1.2 V. Note that DS\_DET cannot work properly when VDDH = 0.9 V. The results show that DS\_DET is slow and power-hungry. SA\_DET\_T and SA\_DET\_TM offer compatible performances because of their good speed properties. The delay of SA\_DET\_T approaches that of SA\_DET in the case of  $V_{clock}/V_{data} = VDDL/VDDL$ , rendering them suitable for the voltage conversion scenario.

#### 4. Conclusion

This paper has proposed a new double-edge-triggered implicitly level-converting flip-flop (DETFF) based on the SA latch structure. Because of the feed-back property of the sense amplifier, it has less delay and power consumption. Experimentally, when implemented with a 130-nm, single- $V_t$  and 0.84 V  $V_{DD}$  process, it achieves 64% power-delay product (PDP) improvement as compared to that of the clas-

|             |                                                               | 0                     |            |                                                                  |                       |            |                                                                   |                       |             |
|-------------|---------------------------------------------------------------|-----------------------|------------|------------------------------------------------------------------|-----------------------|------------|-------------------------------------------------------------------|-----------------------|-------------|
|             | Ex1: $V_{\rm clock}/V_{\rm data} = 1.2  {\rm V}/1.2  {\rm V}$ |                       |            | Ex2: $V_{\rm clock}/V_{\rm data} = 1.2 \text{ V}/0.84 \text{ V}$ |                       |            | Ex3: $V_{\rm clock}/V_{\rm data} = 0.84 \text{ V}/0.84 \text{ V}$ |                       |             |
| Design Name | P <sub>avg</sub> (uW)                                         | T <sub>ins</sub> (ps) | PDP(fJ)    | P <sub>avg</sub> (uW)                                            | T <sub>ins</sub> (ps) | PDP(fJ)    | P <sub>avg</sub> (uW)                                             | T <sub>ins</sub> (ps) | PDP(fJ)     |
| DS_DET [7]  | 32.93(100%)                                                   | 209.25(100%)          | 6.89(100%) | 34.61(100%)                                                      | 280.70(100%)          | 9.72(100%) | 36.13(100%)                                                       | 409.10(100%)          | 14.78(100%) |
| SA_DET      | 23.26(71%)                                                    | 202.10(97%)           | 4.70(68%)  | 23.69(68%)                                                       | 253.60(90%)           | 6.01(62%)  | 23.45(65%)                                                        | 367.10(90%)           | 8.61(58%)   |
| SA_DET_T    | 16.50(50%)                                                    | 220.47(105%)          | 3.64(53%)  | 15.68(45%)                                                       | 255.63(91%)           | 4.01(41%)  | 15.45(43%)                                                        | 340.80(83%)           | 5.27(36%)   |
| SA_DET_TM   | 16.43(50%)                                                    | 183.99(88%)           | 3.02(44%)  | 13.86(40%)                                                       | 197.89(70%)           | 2.74(28%)  | 13.26(37%)                                                        | 249.80(61%)           | 3.31(22%)   |

 Table 1
 Comparison of DETFFs for delay, power consumption, and PDP under various supply voltages.



**Fig.4** Power consumption and insertion delay dependencies of the supply voltage: (a)  $V_{clock}/V_{data} = VDDH/VDDH$ ; (b)  $V_{clock}/V_{data} = VDDH/VDDL$ ; and (c)  $V_{clock}/V_{data} = VDDL/VDDL$ .

sic DETFF design. If implemented in the same technology but with a mixed- $V_t$ , it achieves 78% improvement on PDP. Thus is suitable for the design of the low-power and low-voltage applications.

#### References

- D. Perlmutter, "Sustainability in silicon and systems development," IEEE ISSCC Dig. Tech. Papers, pp.31–35, Feb. 2012.
- [2] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanazawa, M. Ichida, and K. Nogami., "Automated low-power technique exploiting multiple supply voltages applied to a media processor," IEEE J. Solid-State Circuits, vol.33, no.3, pp.463–472, March 1998.
- [3] M. Hamada, M. Takahashi, H. Arakida, A. Chiba, T. Terazawa, T. Ishikawa, M. Kanazawa, M. Igarashi, K. Usami, and T. Kuroda, "A top-down low power design technique using clustered voltage scaling with variable supply-voltage scheme," Proc. IEEE Custom Integrated Circuits Conference, pp.495–498, May 1998.
- [4] F. Ishihara, F. Sheikh, and B. Nikolić, "Level conversion for dualsupply systems," IEEE Trans. Very Large Scale Integration Systems, vol.12, no.2, pp.185–194, Feb. 2004.
- [5] P. Zhao, J. McNeely, P. Golconda, M. Bayoumi, R. Barcenas, and W. Kuang, "Low-power clock branch sharing double-edge triggered flip-flop," IEEE Trans. Very Large Scale Integration Systems, vol.15, no.3, pp.338–344, March 2007.
- [6] R. Hossain, L. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," IEEE Trans. Very Large Scale Integration Systems, vol.2, no.2, pp.261–265, June 1994.
- [7] W. Chung, T. Lo, and M. Sachdev, "A comparative analysis of low-power low-voltage dual-edge-triggered flip-flops," IEEE Trans. Very Large Scale Integration Systems, vol.10, no.6, pp.913–918, Dec. 2002.
- [8] C. Kim and S. Kang, "A low-swing clock double-edge triggered flipflop," IEEE J. Solid-State Circuits, vol.37, no.5, pp.648–452, May 2002.
- [9] P. Zhao, J. McNeely, P. Golconda, S. Venigalla, N. Wang, M. Bayoumi, W. Kuang, and L. Downey, "Low-power clocked-pseudo-NMOS flip-flop for level conversion in dual supply systems," IEEE Trans. Very Large Scale Integration Systems, vol.17, no.9, pp.1196– 1202, Sept. 2009.
- [10] Y. Min, C. Jeong, K. Kim, W. Choi, J. Son, C. Kim, and S. Kim, "A 0.31–1 GHz fast-corrected duty-cycle corrector with successive approximation register for DDR DRAM applications," IEEE Trans. Very Large Scale Integration Systems, vol.20, no.8, pp.1524–1528, Aug. 2012.
- [11] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed., Addison Wesley, Boston, 2011.
- [12] L. Wei, Z. Chen, and K. Roy "Mixed-V<sub>th</sub> (MVT) CMOS circuit design methodology for low power applications," Proc. 36th Design Automation Conference, pp.430–435, June 1999.
- [13] A. Srivastava and D. Sylvester, "Minimizing total power by simultaneous V<sub>dd</sub>/V<sub>th</sub> assignment," IEEE Trans. Computer-Aided Design of Integrated Circuits And Systems, vol.33, no.5, pp.665–677, May 2004.
- [14] L.R. Wang, Y. W Chiu, C.L. Hu, M.H. Tu, S.J. Jou, and C.L. Lee, "A

reconfigurable MAC architecture implemented with mixed-V<sub>t</sub> standard cell library," Proc. IEEE International symposium on Circuits and Systems, pp.3426–3429, May 2008.

[15] B. Nikolić, V. Oklobdžija, V. Stojanović, W. Jia, J. Chiu, and M.

Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," IEEE J. Solid-State Circuits, vol.35, no.6, pp.876–883, June 2000.