## Chapter 1 Introduction

In the deep-submicron era, the power dissipation is becoming a major design requirement, not only in portable application, but also in high performance VLSI systems. In a highly synchronous system, the power consumption of the clock network contributes 20% ~ 45% of overall system power dissipation. And the clocked storage elements consume 90% power dissipation of the clock network. If we can design a low power clocked storage element ( ex: reduce 20% power ), then the total power of the system will reduced 4% ~9% power consumption.

The clocked storage elements also have a great effect on the performance of a system. In a 2 GHz processor, there is only 500 psec in a clock cycle time. But the typical clocked storage element D to Q delay is in the order of 100ps ~ 150ps, it occupies 20% ~ 30% operation time of a cycle. If we can design a faster clocked storage element (ex: with 80ps ~ 100ps D to Q delay), we can get 10% ~ 15% performance improvement. As the result, a high performance and low power clocked storage element is the important concept in the digital circuit design. In this thesis, I will focus on the low power and high performance clocked storage elements design and propose some new designed circuit.

In chapter 2, the low power digital circuit design concepts are discussed. It is emphasized on the device characteristic and the circuit design concepts. Some device parameters: channel length, area, speed, leakage, wire delay, and MTCMOS technique, will be discussed. And some circuit design concepts will also be mentioned.

The pulsed latches are discussed and designed in chapter 3. A single edge-triggered pulsed latch (LSCCFF) is proposed. This flip-flop using low swing voltage technique, conditional capture technique, and stacked technique to improve the delay chain in the pulsed latch. It could reduce both the dynamic power and static power efficiently. A double edge-triggered latch (CPDFF) is also presented. This flip-flop use the conditional precharged

technique to reduce the dynamic power consumption. It could be used to reduce half of the clock power without a loss in throughput.

The clock gating techniques are discussed in chapter 4. A new gating circuit is presented. This circuit uses clock gating technique, signal gating technique, and pipeline gating technique, to form a new gating circuit. The scan-retention mechanism is also discussed in chapter 4. The scan mechanism is used for testing the function of a chip. The data retention mechanism is used to preserve the stored value in the sleep mode. The CPDFF is redesigned with scan-retention mechanism and simulated.

In chapter 5, a reconfigurable FIFO cell is designed. This FIFO cell is designed for low power and high performance. It could reconfigure the memory size for different input data length to reduce the redundant power. It is also implemented with the size expansion feature.

Finally, the overall investigation results will be presented in chapter 6.



## Chapter 2

# Low Power Digital Circuit Design Concepts and Overview of Flip-Flop and First-In-First-Out Register Files

## 2.1 Introduction

Even the analog IC, RF IC, and wireless applications, have become the hot research topics in the recent year. But the digital circuit is still contributed the major part of a chip. In a conventional IC, 90% area of which is contributed by the digital circuit. To be a good IC designer, the basic digital circuit design concepts have to be studied in the beginning.

In this chapter, some digital circuit design concepts will be discussed. There are several layers, system layer, algorithm layer, architecture layer, circuit layer, and device layer that could be discussed for digital circuit design. But I will only emphasize on the circuit layer and device layer in the chapter.

In the section 2.2, some device characteristic will be presented. The basic concepts in size, power, and performance of the device will be discussed [1]. In the section 2.3, some circuit design technique is studied. These design technique will be used in the later chapter.

Flip-flop is the most conventional clocked element used in the synchronous system. It is used to sample data and store data at the clock edges. It could make the system to work in order and increase the throughput of system by using the pipeline skills. The overview of flip-flops will be studied in the section 2.4, and the new flip-flop circuit will be designed in the chapter 3.

First-in-first-out register file (FIFO) is a special type memory. The major difference between the FIFO and conventional random access memory (RAM) is the access order. In a FIFO cell, the data is stored into memory in order and

the data is read out in order too. The earlier stored data are also read out earlier. The special access method increases the access time and reduces the complexity of the decoder block. In section 2.5, the overview of FIFO cell will be discussed. And a new reconfigurable FIFO will be designed in the chapter 5.

## 2.2 Device Characteristic

## 2.2.1 Basic device characteristics

As the technology scaling in modern VLSI design, more and more transistors are integrated into a chip. It is possible to embed more and more function blocks in a system. The Moor's Law made a prediction that the semiconductor technology will double its effectiveness every 18 months and it still comes into effect.

The figure 2.1 shows the transistor counts of Intel Pentium series. It gets a exponential growth in the transistor counts and proves the Moor's law indirectly.



The figure 2.2 plots the integration density of memory as the function of

time. As shown in the figure, the integration complexity doubles approximately every 1 to 2 years. As a result, memory density has increased by more than a thousandfold since 1970.



The figure 2.3 shows the evolution in speed. The clock frequency double every three years and still in increasing. An observation is that, as of now, these trends still meet the Moor's Law and have not shown any signs of a slow down.

#### 2.2.2 Power issue

In the deep-submicron era, it brings out the power problem when the more and more transistors are integrated into one chip. The figure 2.4 shows the trends in power consumption of each processor. The power consumption is increasing exponentially and may be achieved to 18KW in 2008. The high power consumption will reduce the life of the battery in a portable application and cause the problem to cool down the system. In order to resolve the thermal problem, more and more low power techniques and technologies are proposed. Some of which will be introduced in the section 2.3.



The figure 2.5 shows the leakage power problem in the deep-submicron era. The leakage current is increased as the technology scaled down and wasted the significant amount of power. In order to suppress the sub-threshold leakage, the reverse body-bias technique, power gating technique, and MTCMOS technique are proposed to reduce the standby power.



### 2.2.3 Wire delay

As the technology scaled down, the gate delay is reduced every generation and approximated to the wire delay. So the wire delay is becoming the dominant one that couldn't be neglected. The interconnect problem will reduce the reliability and affect the performance and power consumption. The figure 2.6 shows the wire model composed with resistance, capacitance, and inductance. This model could be used to estimate the interconnect effects in the circuit.



Fig. 2.6 Wire Models



Fig. 2.7 Elmore delay model

The Elmore model is the more accurate model than lumped model, and it is spotted in the figure 2.7. The equivalent resistance and delay could be described as below:

$$R_{ik} = \sum R_j \Rightarrow (R_j \in [path(s \to i) \cap path(s \to k)])$$
  
$$\tau_{Di} = \sum_{k=1}^{N} C_k R_{ik}$$

A long wire with no branch is modeled in the figure 2.8. The wire could be separated into N equal-length segments and use an equivalent model to compute the equivalent delay. The delay is also shown in the figure 2.8.



Fig. 2.8 Non-branched RC model

The step responses of different N are simulated. The result is listed in the

table 2.1 and illustrated in the figure 2.9. It is shown that the result of distributed RC model is more close to the real condition. The lumped RC model could reduce the error by multiplying a weight value.

| Voltage Range               | Lumped RC              | Distributed RC |
|-----------------------------|------------------------|----------------|
| $0 \rightarrow 50\%$ $t_p$  | 0.69 <i>RC</i>         | 0.38 <i>RC</i> |
| $0 \rightarrow 63\%$ $	au$  | RC                     | 0,5 <i>RC</i>  |
| $10\% \rightarrow 90\% t_r$ | 2 <b>.</b> 2 <i>RC</i> | 0,9 <i>RC</i>  |
| $0\% \rightarrow 90\%$      | 2 <b>.</b> 3RC         | 1.0 <i>RC</i>  |

Table 2.1 Step response of lumped and distributed RC model



## 2.3 Circuit Design Concepts and techniques

### 2.3.1 Buffering

In the digital circuit design, it is necessary to drive a loading with large capacitance. It usually consumes more power and spend lots of time if we

drive the loading directly. The buffer technique is introduced in this section. It uses an inverter chain as the buffer to provide a larger driving current. It could reduce both the power and delay time.



Fig. 2.10 Inverter buffer chain to drive a output loading

The figure 2.10 shows the buffer chain diagram. It uses N stages buffer to drive an output loading with capacitance  $C_L$ . The propagation delay of each inverter stage is described below. The  $t_{pN}$  is the N-th stage's propagation delay. The  $t_p$  is the total propagation delay.

$$t_{p} = t_{p1} + t_{p2} + \dots + t_{pN}$$

$$t_{pj} \sim R_{unit} C_{unit} \left( 1 + \frac{C_{gin,j+1}}{\gamma C_{gin,j}} \right)$$

$$t_{p} = \sum_{j=1}^{N} t_{p,j} = t_{p0} \sum_{i=1}^{N} \left( 1 + \frac{C_{gin,j+1}}{\gamma C_{gin,j}} \right), \ C_{gin,N+1} = C_{L}$$

The delay equation has N-1 unknown,  $C_{gin,2} - C_{gin,N}$ . In order to minimize the delay, we should find out the N - 1 partial derivatives. The related results of the partial derivatives are  $C_{gin,j+1}/C_{gin,j} = C_{gin,j}/C_{gin,j-1}$ . The result shows the size of each stage is the geometric mean of two neighbors as shown below. The result also indicates that each stage has the same effective fanout ( $C_{out}/C_{in}$ ) and the same propagation delay.

$$C_{gin,j} = \sqrt{C_{gin,j-1}C_{gin,j+1}}$$

Assume that each stage is sized by f and has same effective fanout f. The effective fanout of each stage is shown.

$$f^{N} = F = C_{L} / C_{gin,1}$$
$$f = \sqrt[N]{F}$$

Combining these equations, the minimum path delay is shown:

$$t_p = N t_{p0} \left( 1 + \sqrt[N]{F} / \gamma \right)$$

For a given load,  $C_L$  and given input capacitance  $C_{in}$ . The optimal sizing f could be expressed as below.

$$C_L = F \cdot C_{in} = f^N C_{in}$$
 with  $N = \frac{\ln F}{\ln f}$ 

Combining these equations, the delay formula is obtained.

$$t_{p} = Nt_{p0} \left( F^{1/N} / \gamma + 1 \right) = \frac{t_{p0} \ln F}{\gamma} \left( \frac{f}{\ln f} + \frac{\gamma}{\ln f} \right)$$
$$\frac{\partial t_{p}}{\partial f} = \frac{t_{p0} \ln F}{\gamma} \cdot \frac{\ln f - 1 - \gamma/f}{\ln^{2} f} = 0$$

The figure 2.11 shows the relation between effective fanout and the function f/ln(f) when =0. It shows the optimum fanout is *e*. Furthermore, the optimum fanout equals 3.6 when =1.



The figure 2.12 shows an example to decide the numbers of stage and the effective fanout.



#### 2.3.2 Stack Effect

The stack technique uses the NMOS or PMOS to gate the combinational logic from the vdd or ground, which is illustrated in the figure 2.13. The figure 2.13 uses a simple inverter as the combinational logic. The figure 2.13(1) shows the typical inverter; figure 2.13 (2) shows the p-type stack transistor; figure 2.13(3) shows the n-type stack transistor; and figure 2.13(4) shows the np-type stack transistor. The simulation result of the stack technique is shown in the table 2.2, and illustrated in the figure 2.14. The data is simulated by Hspice tool with TSMC 0.1um technology.

and the



Fig.2.13 Stack technique for inverter (1) typical inverter (2)p-stack (3) n-stack (4) np-stack



Fig. 2.14 Normalized results of the Stack Technique

|            | Static Power |              | Propagation Delay |              |              | and dates    |              |  |
|------------|--------------|--------------|-------------------|--------------|--------------|--------------|--------------|--|
|            | Input ()     | Input 1      | 0>1               | 1>0          | arg, power   | arg. ceay    | PDP          |  |
| 1 inverter | 1.572940E-08 | 4.032679E-08 | 5.653149E-12      | 5.916880E-12 | 2.802810E-08 | 5.785015E-12 | 1.621429E-19 |  |
| 2 p-stack  | 7.269466E-09 | 3.111979E-09 | 5.871837E-12      | 9.766997E-12 | 5.190723E-09 | 7.819417E-12 | 4.058842E-20 |  |
| 3. n-stack | 1.532137E-09 | 7.2668385-09 | 7.8027405-12      | 6.042182E-12 | 4.399488E-09 | 6.922461E-12 | 3.045528E-20 |  |
| 4.np-stack | 1.520219E-09 | 2.777810E-09 | 8.234439E-12      | 1.004955E-11 | 2.149015E-09 | 9.1419958-12 | 1.964628E-20 |  |

Table 2.2 The Simulation Result of the Stack technique

As the results shown in the table 2.2, the stack technique can reduce 82% - 93% of the leakage power with a little performance penalty. It is an efficient way to suppress the sub-threshold leakage current.

## 2.3.3 MTCMOS design and Power gating

The leakage current is increased because the threshold voltage is lowed down to increase the operation speed. In order to retain the high performance and reduce the sub-threshold leakage current in the same time, the MTCMOS technique is proposed [2][3][4]. The MTCMOS technique uses two kinds of transistors: high Vt transistor and low Vt transistor. The low Vt transistor could be used in the critical path of the circuit to retain the operation speed, and the

|                 | PMOS(10/0.1)<br>Vtp | NMOS(10/0.1)<br>Vtn |
|-----------------|---------------------|---------------------|
| Standard Vt (V) | -0.289              | 0.249               |
| High Vt (V)     | -0.37866            | 0.3396              |
| Low Vt (V)      | -0.296              | 0.252               |

high Vt transistor is used in the non-critical path to reduce the leakage current.

Table 2.3 The comparison of different ¥t device

Table 2.3 shows the threshold voltage in the single Vt technology and dual Vt technology. The results are simulated with TSMC 100nm technology. The threshold voltage of the standard Vt device is close to the threshold voltage of the low Vt device.



The power gating technique is used popularly in modern digital circuit design [5][6]. The power gating technique is illustrated in the figure 2.15. It uses the stacked transistors to gate the leakage current of the logic circuit. The circuit could be operated in two kinds of operation mode active mode and sleep mode. In the active mode, the SL signal is biased at high level to activate the logic function. In the sleep mode, the SL signal is disabled and the leakage current is reduced since the stacked effect.

#### 2.3.4 Transistor size optimum

In a digital circuit design, the size of each transistor could affect the performance and power consumption greatly. Therefore, how to decide the optimum transistor size is an important concept.

We could increase the width of transistor in the critical path. It could increase the operation speed but also the power penalty. It is possible to

decrease the width of transistor to reduce the parasitic capacitance. It could reduce the switched time for charging and discharging. But it also reduces the operation current and slow down the circuit. To select a most efficient way to optimize the circuit is not a easy thing.

The hspice tool could be also used to find out the optimal point. We could set a requirement, and constrain the optimal range of transistor width and other restrictions. Then the CAD tool could find out the optimal value that we wanted automatically.

## 2.3.5 Unbalanced latch

In digital circuit design, the latch is widely used to store data or keep the value of the node stable. The latch cell is composed with two inverse inverters and illustrated in the figure 2.16.



This structure can keep the data and steady the value to against the glitch signals. But the loop type structure also forms a feedback path to the origin. It will slow down the operation speed and increase the power consumption. An unbalanced latch could be used to resolve the problem. It uses a larger size inverter in the forward direction and a smaller one in the backward direction. The stronger inverter could increase the operation speed in charging and discharging the nodes.

## 2.3.6 Reducing effective capacitance technique

The dynamic power consumption is proportioned to the effective capacitance. To reduce the effective capacitance is also an efficient way to reduce the dynamic power. For example, the clock tree usually possesses a great deal of parasitic capacitance. If we could separate the clock tree into global clock and local clock, we could save lots power in switching the clock signal. This method is used as the clock gating technique [7][8]. Another

example is the separated bitline, and it is used in the memory cell [9][10]. The separated bitline technique is used to separate the whole bitline into several ones, and use the decoder to select which one to be activated. It could reduce the effective bitline capacitance and save the read/write power.

## 2.4 Overview of Flip-Flops

In a digital circuit design, flip-flops are widely used as the clock tree branch elements. They are used to sample and store the branch data and send to the next stage combinational circuit when the next clock edge is arrived. In a high performance application, pipeline technique is widely used to increase the clock frequency and throughput rate. Therefore, the flip-flop consumes great part of power and occupies a larger area in a digital system.

The traditional flip-flop is master-slave flip-flop[11][12][13][14]. It is composed with two latches which are triggered at opposite clock edge. When the master latch is activated in the sample data state, the slave latch keeps the data and sends data to the stage. Similarly, when the slave latch is activated in the sample data and sends data to the slave latch keeps data and sends data to the slave latch keeps data and sends data to the slave latch keeps data.

ATTILLED.

The pulsed latch is proposed to replace master-slave flip-flop to increase the system performance [15][16][17][18][19]. It uses a pulse generator to generate a short pulse and activates the latch with the pulse. It could make the latch to work as a flip-flop. The advantage of pulsed latch is the faster setup time and soft-clock edge property.

In recent years, there has been an increasing demand for high-speed digital circuits at low power consumption. The double edge triggered (DET) flip-flop has been paid much attention. The DET could maintain the same throughput rate at half clock frequency comparing with conventional single edge triggered flip-flop. Therefore, it could save half of clock tree power efficiently by using a double edge triggered flip-flop. [20][21][22][23][24][25]

The flip-flop could also be used in many applications. For examples, clock

gating technique; and scan chain for test mechanism. Clock gating technique is also a convenient circuit skill to reduce the clock tree power [26][27][28]. In the SoC era, how to test the function of a system has become a serious problem. The scan chain is the one methodology to resolve the testing problem [29][30][31][32][33].

As the result, flip-flop has become a significant element in the digital circuit. To design an efficient flip-flop is an important issue in modern IC design.

## 2.5 Overview of First-In-First-Out Register File

In recent years, first-in-first-out (FIFO) data storages are in great demand for many applications, especially in telecommunication LSIs. FIFO is usually performed by using a shift register or register file when a small storage capacity is required. When a larger storage capacity is needed, SRAM-like FIFO could be used to reduce the area efficient and power consumption. FIFO works as a special serial access memory. It uses the sequential read/write access instead of random read/write access to attain a faster response time. In a communication system, the transmission data is always transferred sequentially. Therefore, FIFO could be used to increase the performance efficiently. [34][35][36][37][38][39][40][41]

FIFO is widely used to buffer data in many applications. The buffer memory could provide a delay time to make the function units work successfully.

In the asynchronous-transfer-mode (ATM) switch, a high speed and low power FIFO chip is required [36][42]. The FIFO chip could be used to buffer the packet data. Furthermore, it could also be used as a serial to parallel converter. The LAN controller is a similar application of FIFO [43].

In communication system, the FIFO chip plays an important role in the transmission function unit. The excess of transmission data/received data should be buffered with a FIFO chip. It could increase the performance

efficiently by using a proper storage capacity FIFO chip. [44][45]

In the network-on-chip (NoC) era, many function blocks could be integrated into a chip and the function blocks are connected by bus. But each function block may operate in the different clock rate. It causes the communication problems between the function blocks that are working in different clock frequency. An asynchronous FIFO chip could be inserted and resolve these problems [38][39][46]. The asynchronous FIFO provides a interface between different clock rate function blocks and it could save the unnecessary block power.

As the result, a low power high performance FIFO chip is required in the modern IC design. A size reconfigurable FIFO cell could decrease the area cost and reduce power consumption efficiently.

## 2.6 Conclusion



In the chapter, some circuit design concepts have been discussed. These concepts will be used as the basic formulas in the later design. The cell area, performance, and power consumption are the competitive points in digital IC design. The wire delay issue will be concerned as the key point in a circuit layout.

Some circuit techniques are also discussed in this chapter. The buffering technique is used to reduce the propagation delay in the circuit with a large loading capacitance. The MTCMOS technique could be used to suppress the leakage current and increase the performance. The stack effect and power gating technique are used to reduce the leakage current to save the static power. The transistor optimum method is applied in the circuit level design. The unbalance latch is used as a storage element that could be used in the flip-flop design. The reducing effective capacitance technique is used to reduce the dynamic power.

Low power circuit design has become the most important research topic in the IC design era. Low power flip-flop and low power FIFO will play important roles in the SoC generation. In this thesis, two new low power flip-flops will be developed in chapter 3, and a low power reconfigurable FIFO will be presented in chapter 5.



# Chapter 3 Low Power Pulsed Latch Design

## 3.1 Introduction

As the technology scaling in modern VLSI design, it is possible to build chips consisting of hundreds of millions of transistors. The demand for System-on-Chip (SoC) design is addressed by achieving higher performance and integrating more component into a chip. As the increasing the clock frequency and the numbers of transistors, the higher and higher power consumption has become a handicap to realize a high performance design. Recently, the portable applications such as notebook PCs, PDAs, mobile phones, are widely used in modern society. The higher power consumption also shortens the battery lifetime in this application. Power dissipation is becoming a limiting factor in both high performance and mobile application.

One way to achieve low power consumption is to lower the supply voltage. Lower the supply voltage can reduce the power consumption efficiently. Because the power dissipation is proportional to the square of the supply voltage, however the penalty of reduced voltage is increased propagation delay proportional to  $\frac{Vdd}{(Vdd V_f)^2}$ . This makes the performance of the system degrade greatly. It can compensate for the lost performance by using the pipeline and parallelism technique, but the pipeline will increase the latency cycle, and the parallelism will need additional hardware.

Another way to realize low power consumption is to lower the operation frequency. The power dissipation is proportional to the operation frequency. The lower frequency decreases the performance and throughput.

In highly synchronous system, the 20% - 45% of overall power dissipation is contributed by the clocking network, 90% of which is consumed by the flip-flops and the last branches of the clock distribution network [1]. Latches and flip-flop have all pervasive applications in sequential circuit design, especially in pipelined circuit, signal processing and communication system. High performance, low power consumption, and robustness are the basic principles to design these storage elements.

The transparent latches are usually fast and small, but it causes some problem. The race condition is caused by the feedback circuit from the latch output to the latch input or the overlap of clock and opposite clock, the output data of the latch is influenced by the input data directly in race condition [2].

One way to avoid a race condition is to use the edge-triggered flip-flop. It only samples the data at the clock edges. However the classical master – slave flip-flop is composed by two cascaded transparent latches control by true and inverse clocks. Because of this, the master-lave flip-flop consumes more power dissipation and decreases the speed than transparent latches [3].

Recently, edge triggered pulse latches have been designed to reduce the latency caused by flip-flop. The edge-triggered latches use a narrow pulse signal that is generated locally to trigger the transparent latch instead of using the normal clock signal with half duty cycle. The transparent latch is active only in the narrow pulse clock, so it works as an edge-triggered flip-flop.

In this chapter, I will propose a new pulse latch, and compare the power and performance with other flip-flop.

## 3.2 Flip-flop Characterization Metrics

## 3.2.1 Timing Metrics

Flip-flops and latches are crucial elements of a design from both a delay and energy standpoint. The basic flip-flop timing parameters are setup time, hold time, and clock to Q delay. These are the indexes to judge the performance of the flip-flop.

Setup time is the stable required time ( before the clock edge ) for data input correctly latched by the flip-flop. Hold time is the stable required time ( after the clock edge ) for data input correctly latched by the flip-flop. The clock

to Q delay is the delay time from the active clock edge to the output.

In a digital system, it has to satisfy the equation (1) and (2) to avoid timing violation. The equation (1) shows the hold time margin, the sum of clock to Q delay and logic delay must be greater or equal to the sum of hold time and relative clock skew. The equation (2) shows the setup time margin, the sum of clock cycle time and relative clock skew must be greater or equal to the sum of setup time and clock to Q delay and logic delay. I use a simple example to examine the hold time and setup time margin in the Fig 2.1.



Fig 3.1 Definition of setup time and hold time

- Tq : clock to Q delay time
- Td : combinational logic delay time
- Tc : clock delay time
- Thold: hold time
- Tsetup : setup time
- Tcycle : cycle period time

## 3.2.2 Energy Metrics

Another key point of flip-flop design is the power consumption. The flip-flop usually consumes different power with different input patterns (0-0, 0-1, 1-0, 1-1). So I define the flip-flop switching probability as "switching activity". I have applied some different input pattern to test the power efficiency of flip-flop. The pattern, 0101010101 ( =1), means the input data is always switching, reflects the maximum active power consumption. The 0000000000 and 1111111111 =0 ), means the static pattern ( power consumption, reflects the precharged power of the internal nodes, leakage power consumption, and the clock power in process. The other patterns 0000100000 ( =0.2) or 0001000100 =0.4) or ( 0010010010 =0.6) reflects the different power consumption with ( several switch rates based various applications. The switch activity of flip-flop is different in various applications. For example: the switch activity of a 4-bit counter is about 0.53; the flip-flop used in the clock gating has a switch rate about 1 (non-blocked) or 0 (blocked); the flip-flop shift array used in the FIFO (First-In-First-Out Register) has a low switching rate less than 0.1; a typical switching factor for the instruction fetch in a RISC microprocessor is about 0.35. So use an applicable flip-flop to conform to the application is one way to meet low power target.

#### 3.2.3 Other Design Issue



## 3.3 Conventional edge-triggered latch

The flip-flops are widely used in the synchronous and high performance system design. There are many kinds of flip-flop had designed to meet the different system requirements. I will give an overview about each conventional flip-flop and illustrate the advantage and disadvantage of the flip-flop in this chapter. [4] [5]

## 3.3.1 Master - Slave Flip - Flop

Master-slave flip-flop is the typical edge-triggered flip-flop that is designed to immune the race condition. The master-slave flip-flop is composed of two latches, the master one is sensitive to the logic level 0, and the slave one is sensitive to the logic level 0. When the clock is low, the master one is active and restores the input data, and the slave one keeps the previous data in the feedback loop circuit. When the clock is high, the master one is frozen the state by clock, and the slave one is active and sensitive the data from the master latch in a short time. Due to the cascaded structure, the flip-flop works as the edge-triggered flip-flop.





Fig 3.2 Master-Slave Flip-flop: (a) IGFF (b) C<sup>2</sup>MOS-FF

The figure 2.2 (a) shows the transmission-gate type master-slave flip-flop (TGFF), the flip-flop is used in PowerPC 603. [6] The TGFF uses the transmission gates to isolate the master part and slave part, and the transmission gates are driven by opposite clock level. And it also speeds up the data pass-through time and reduces the power consumption by using the transmission gates. The other feature is that the TGFF adds the input gate isolation, and it can get better noise immunity. The advantages of the TGFF are short direct path and a low power feedback, but it has a big clock load which effects the power consumption of the clock tree.

Using the transmission gates as the isolation circuit has several merits, but it also causes some problem. The TGFF flip-flop may cause the race condition when the true clock and inverse clock have the overlapping range, and it is illustrated in the figure 2.3. If these two control clocks have a 1 - 1 overlapping condition, the data may pass through the NMOS gates from input to output. Similarly the data may pass through the PMOS gates from input to output when the 0 - 0 overlapping condition.



(b) The clock with 0 - 0 overlap

Fig. 3.3 The race condition in TGFF

In order to overcome the race problem, the TGFF is improved to the C2MOS flip-flop [7]. It replaces the transmission gates and inverter circuit by the C2MOS circuit. Compared with TGFF, the C2MOS has slower operation speed, but a better robustness. There are some other improved circuits of TGFF, ex: [8] [9]. They improve the performance or reduce the power consumption, but they are usually less robust.

#### 3.3.2 Pulse Triggered Latch

Compared with latches, the master-slave flip-flop has the larger clock load and larger latency. The larger clock load increases the capacitance in the clock tree, and increases the power consumption in overall system. The larger latency reduces the performance. In order to resolve these problems in high performance system, the pulsed triggered latch is proposed.

The pulsed latch works as the edge triggered flip-flop based on generating an explicit window where the transition is allowed instead of the master-slave structure. There are two ways to generate an explicit window used in pulsed latch in general. One is generating a real pulse by using a delay inverter chain and a NAND logic gate, and another is generating a virtual pulse that works as a real pulse by using a delay inverter chain and a stack NMOS gates. They are illustrated in figure 2.4.



#### Fig. 3.4 Pulse Generator

It greatly reduces the complexity of the locking mechanism by using the transparency windows, and strengthens the robustness to the uncertainty of the clock arrival. And the negative setup time of the pulsed latch allows the critical circuit to borrow time from the next cycle. This feature is known as soft-clock edge property.

The Hybrid latch flip-flop (HLFF) [10] is the one of the fastest flip-flops in the world, is shown in the figure 2.5 (a). It is the first one latch to construct the pulse latch mechanism to achieve high performance design. The HLFF is constructed of two stages, the first stage is a 3-input NAND gate coupled to clock, input, and delay clock, and the second stage is a static latch works as a

C2MOS latch. The node X of HLFF is always precharged except the time in the transparency window. If the input D is high in the evaluation phase, the node X is discharged through the pull-down path, and the PNOS of the second stage charges the output to level high. If the input D is low in the evaluation phase, the NAND gate keeps the node X in high level, and the output is pulled down to level 0 through the pull-down path in the second stage. The features of the HLFF are described above.

The semi-dynamic flip-flop (SDFF) [11] [12] is proposed to improve from the HLFF, is shown in the figure 2.5(b). The HLFF samples the data from input when both the clock and delay clock is on high level. The SDFF samples the data when the clock is high and the delay clock is low. It uses a NAND gate coupled to node X and delay clock to generate a sampling window when the clock is in positive edge. This structure reduces the time of the sampling window about one inverter delay time. It meads a shorter hold time and a better input noise rejection of the flip-flop.



However, HLFF and SDFF had achieved to high performance flip-flop, but the penalty of them is higher power consumption. The node X of these two flip-flops is always pre-charged even if the logic level of the output is same as the logic level of the input. The unnecessary switch power makes the larger power consumption than master-slave flip-flop. There are several proposed circuits are designed to reduce the unnecessary switch power, such as [13] [14] [18] [22]. They use the conditional capture technique to reduce the power of HLFF. The conditional capture skill will describe in the later section.

The K-6 dual rail edge triggered latch [15] is designed based on the HLFF, is shown in the figure 2.6. This latch acts as a buffer stage static and dynamic logic, and converts the single-ended signal to dual rail signals suitable for the dynamic application. Further, it acts as a flip-flop for a brief period determined by the reset path. It is very fast but has very high power consumption.



Fig 3.6 K-6 Dual Rail Edge triggered Latch

#### 3.3.3 Low Power Flip - Flop

In the modern VLSI design, the low power design is more and more important in every application especially for the mobile application. The clock power usually occupies large part of the total power in the synchronous system. The flip-flop acts as the clocked element and uses in each pipelining stage. The clock load of the flip-flop will effect on the power of the clock tree, and the internal power of the flip-flop will effect on the system power because of the large amounts of flip-flop in the system. To reduce the power of the flip-flop must be the first step to reduce the total power in the system. I will discuss the low power flip-flop design in this section.

#### 3.3.3.1 Reduced Clock Swing Flip - Flop

One way to reduce the power consumption of the clocking circuitry is to reduce clock swing of the flip-flop. The clock power is proportioned to the supply voltage of clock system. The half-swing flip-flop (HSFF) [16] is the typical one that uses the reduced clock swing technique to reduce the clock power, is shown in the figure 2.7. The voltage of the clock is reduced to half VDD. The HSFF uses four half-swing clocks to operate the flip-flop. The two upper two clocks are fed to PMOS, and the lower two are fed to the NMOS. This flip-flop can gain 75% power saving in clocking circuitry. The penalty of the large power saving is the lower operation speed of the flip-flop and to increase the complexity of the clock circuitry, and the clocking scheme suffers from skew problem.



Fig. 3.7 Half Swing Flip-flop (HSFF)

The reduced clock swing flip-flop (RCSFF) [17] is proposed to reduce the clock power and has less loss in performance, is shown in figure 2.8. The RCSFF is modified from the sense amplifier-based flip-flop [18]. It uses only one reduced swing clock that swings between zero and Vck. Because the reduced clock can't cutoff the precharged PMOS transistor P1 and P2, it has to reduce the leakage of these two PMOS by applying an n-well bias voltage. And it needs an additional area to separate the n-well region of clock-driven PMOS transistor P1 and P2 from the other transistors. There are some other

techniques to improve the power consumption or performance of the sense amplifier based flip-flop is illustrated in [19] [20] [21].



Fig 3.8 Reduced Clock Swing FF (RCSFF)

The NAND-type keeper flip-flop (NKDFF) [22] is proposed with the simpler configuration than above design. This flip-flop is similar as the HLFF. It uses a NAND gate to reduce the unnecessary transition in the node X between input and output, which can save the dynamic power. Another feature of the flip-flop is the reduced swing clock that swings between zero and Vck. Because all transistors driven by clock are NMOS transistor, it doesn't need an additional n-well region. It can save large clock power with tolerated loss in performance.



Fig. 3.9 NAND-type Keeper Flip-Flop (NKDFF)

#### 3.3.3.2 Conditional Capture Flip - Flop

Another way to reduce the power consumption is to use the conditional capture technique. It can reduce the unnecessary switching of the internal node to reduce the dynamic power consumption. There are two methodologies to achieve the conditional capture. The first one is clock gating technique, it gates the input clock of the flip-flop when the input is same as output. The second one is the data gating technique, and it reduces the transition in the internal node when input is unchanged.

The clock on demand flip-flop (COD-FF) [23], is shown in the figure 2.10, is one of the flip-flops that used the clock gating technique. It uses a XNOR gate to determine whether the transition is occurred or not, and generates the enable signals, CKP and CKN, to make the data is transmitted from input to output. After the successful transition, the XNOR gate and the pulse generator will activate to disable the enable signals, CKP and CKN, until the next transition is occurred. There are some other flip-flops which are designed with the clock gating are illustrated in [24] [25].





The differential conditional capturing energy recovery flip-flop (DCCER) [26] uses the data gating technique to reduce the unnecessary transition of the internal nodes. It acts as the dynamic flip-flop similarly. The DCCER flip-flop uses two precharge PMOS transistors and the NAND-based set/reset latch for the storage mechanism, and uses the NMOS transistors that are controlled by the feedback signal from output to gate the discharged action of the set/reset nodes.



Fig. 3.11 Differential Conditional capturing Energy Recovery Flip-Flop (DCCER)

There some other conditional capture has been proposed in [5] [9] [13] [14] [27] [28] [29], they use different circuit techniques to reduce the power in the unnecessary transition.

The clock gating technique can reduce the greater part of the total power of the flip-flop efficiently. But it has two disadvantages: the first one is the penalty of the gating circuit, it usually decreases the performance and consumes more dynamic power in transition; the second one is the penalty of the setup time. In order to activate the gating circuit, the input data has to arrive before the clock edge. So it will injure the advantage of the pulsed latch that has a zero or negative setup time. The data gating technique can save less power than the clock gating technique, but it usually designed with no performance loss. In otherwise, the delay chains of the pulsed latch will always switch with the clock transition, it consumes a dynamic power even if there is no transition between input and output.

## 3.4 Proposed Edge-Triggered Latch Design and Simulation Result

In this section, a new low power flip-flop is proposed. And I will compare the characterization metrics with the other conventional flip-flops.

## 3.4.1 Proposed Flip-Flop

In general, the pulsed latch has a lower clock load, and it can reduce the power consumption in the clock tree. But the pulsed latch has a penalty of the pulse generator. The pulse generator usually uses a delay inverter chain to produce the transition window. The delay chain is always switched as the clock switches, and it will consume large part of the total power even if the data has no transition. In the deep sub-micro IC design, the leakage has become the largest part of the static power consumption. The inverter delay chain of the pulsed latch will provide the more leakage path from the supply voltage to the ground, and it causes the larger static power consumption.

In order to design a low power flip-flop, I will improve the delay chain of the pulse latch, and combine the reduce swing technique and conditional capture technique to reduce both the dynamic power and static power of the flip-flop.

## 3.4.1.1 Static Power Reduction

In order to reduce the leakage current and static power consumption of the delay chain, the stack transistor technique is used. The stacked technique has been discussed in the chapter 2. The delay chain of the pulsed latch is designed to generate a transparent window, so the slight loss in performance could be tolerated. I choose the n-type stack technique to reduce the 85% of static power with the least loss in performance (about 18%).

## 3.4.1.2 Dynamic Power Reduction

Using the reduced swing clock is a way to reduce the dynamic power, but it needs an additional voltage island of the clock. If the system is designed for only one supply voltage, the reduced swing clock design will cause a troublesome matter for increasing the complexity of the chip. In this section, I will propose a reduced swing inverter chain by using the circuit technique, and it doesn't need an additional voltage supply.

There is one reduced swing inverter has been proposed in [30], it uses the property that the NMOS can only transmit a week logic 1 (Vdd-Vt) to make the inverter swinging between ground and Vdd –  $m^*Vt$ . The weight index m is the number of the NMOS transistors that are inserted into the inverter. The reduced swing voltage inverter is illustrated in the figure 3.14.



Fig. 3.12 Reduced Swing Inverter

The circuit can reduce the swing voltage of the inverter, but there is a crucial problem in it. We want to reduce the voltage swing to reduce the power consumption, but the reduced swing inverter consumes more power in static power. For example: we cascade two reduced swing inverters in series and input a signal of logic level 0to the inverter chain. The output of the first stage that is also the input of the second stage has a weak logic level 1 (Vdd – m\*Vt.), and the weak level 1 can drive the inverter to produce a logic level 0 in the second stage. But the PMOS transistor of the second stage can't be cut-off (Vsg > 0) with the weak level 1, it causes a direct current path form Vdd to the ground. It will consume large power in the unstable state.

A new reduced swing inverter chain is proposed which is illustrated in the figure 3.15. It inserts the NMOS transistor between the PMOS transistor of the inverter and Vdd, and the NMOS transistors can be driven by a control signal Vg. This structure has two advantages: the first one is to use the n-type stack technique to reduce the static power; the second one is to reduce the swing voltage of the inverter chain. The simulation result is shown in the table 3.2, and illustrated in the figure 3.16. As the simulation result, it can reduce 56% of static power and 25% of dynamic power in the delay chain.



Fig. 3.13 Proposed Low Swing Inverter Chain

|                              | Static Power |           | Dynamic Power |           | Average      | Average       |
|------------------------------|--------------|-----------|---------------|-----------|--------------|---------------|
|                              | Input ()     | Input 1   | 0>1           | 1>0       | Static Power | Dynamic Power |
| Typical inverter chain       | 7.1928-08    | 9.876E-08 | 1.847E-06     | 1.7916-06 | 8,534E-08    | 1.819E-06     |
| Reduced Swing inverter cahin | 4.242E-08    | 3.339E-08 | 1.440E-06     | 1.315E-06 | 3.790E-08    | 1.377E-06     |



Table 3.1 The Simulation Result of the Reduced Swing Inverter Chain

Inverter Chain and Low Swing Inverter Chain

3.4.1.3 Low Swing Conditional Capture Flip-Flop Design

The low swing conditional capture flip-flop is proposed, and illustrated in

the figure 3.17. In order to reduce the dynamic power consumption in the redundant transition of the internal nodes, I use the clock gating technique to reduce the power consumption. The clock gating technique has described in the section 3.3.3.2. I use the XOR gate to determine the transition is occurred or not. If there is a transition between input and output, the XOR gate will generate a control signal Vg that is in logic level 1 to enable the clock delay chain. And the active delay chain will generate a delay clock to form the sampling window and activate the pulsed latch to sample the data. After the data transition, the XOR gate detect the output value is same as the input's, it will disable the delay chain by sending a control signal Vg with logic level 0.

The delay chain is designed in the section 3.4.1.2, the gate of the NMOS transistors, Mn4, Mn5, is connected to Vg, but connected to Vdd in the flip-flop design. Because the delay clock clkd may not discharge to logic level 0 when the Vg is disabled by XOR gate. The delay clock clkd with weak logic level 1 maybe latch the wrong input data, it will increase the hold time of the flip-flop. So the gate of these two NMOS is connected to vdd to prevent a hold time violation.



Fig. 3.15 Proposed Low Swing Conditional Capture Flip-Flop (LSCCFF)
The main body of the flip-flop is the dual-rail static edge-triggered latch. When the input data is logic level 1 and a sampling window is generated, the node x is discharged along the pull down path, Mn6, Mn1, Mn2. The node y is charged by Mp2, and the output Q and Q is changed after a delay time of inverter. When the input data is logic level 0 and a sampling window is generated, the node y is discharged along the pull down path, Mn7, Mn1, Mn2. The node x is charged by Mp1, and the output Q and Q is changed after a delay time of inverter. The typical operation waveform of the flip-flop is illustrated in the figure 3.18.



3.4.2 Simulation Results and Comparisons

#### 3.4.2.1 Basic Comparison of Design Metrics

The comparisons of LSCCF with other flip-flop are shown in table 3.3. The result is simulated with TSMC 0.18 technology in 200MHz. The LSCCF has a setup time 200ps and a hold time 115ps by using the conditional capture technique. The transistor sizing is optimized for both speed and power consumption.

The first column shows the different flip-flops to be compared. The TGFF, HLFF, SAFF, DCCER have been introduced in the section 3.3. The dsETL and stETL are the dual rail edge triggered latch which is illustrated in [27]. The second and third columns of the table show the number of transistors and the total channel widths of the transistors, which can predict the layout area and complexity of the flip-flop. The fourth column is the average clock to Q delay time of 0 -> 1 transition and 1 -> 0 transition. The following three columns show the power consumption in different switching rate (= 0.2, 0.4, 0.6) that is defined in section 3.2.2. The final three columns show the power delay product with different (= 0.2, 0.4, 0.6). The detail comparison diagram of switch activity with a pseudo-random pattern is illustrated in figure 3.19.

As the result, the static power consumption can get more than 95% power saving than the TGFF. With a low switch activity (<0.4), the LSCCF is still the power efficient one than the others. With a higher switch activity (>0.5), the LSCCF still consumes less power than greater part of the others. Overall, the LSCCF has best power-delay-product (PDP) than the other pulsed latch, and has the better one than the master-slave flip-flop in low switch activity.

| Count  | Number         | Total width | Average CLK to | Power(µW) |       |       | PDP    |        |        |
|--------|----------------|-------------|----------------|-----------|-------|-------|--------|--------|--------|
| CECOR  | of transistors | (µm)        | Q delay (ps)   | α=0.2     | a=0.4 | α=0.6 | α =0.2 | α =0.4 | a =0.6 |
| TGFF   | 18             | 16.8        | 107.1          | 11.0      | 14.5  | 17.7  | 1.2    | 1.6    | 1.9    |
| HLFF   | 20             | 28.4        | 73.7           | 27.6      | 33.5  | 39,8  | 2.0    | 2.5    | 2.9    |
| SAFF   | 18             | 19.6        | 190.2          | 21.3      | 22.2  | 23.2  | 4.1    | 4.2    | 4.4    |
| dsETL  | 18             | 23.8        | 124.6          | 23,4      | 30.6  | 37.4  | 2.9    | 3.8    | 4.7    |
| DCCER. | 26             | 26.6        | 253.3          | 25.2      | 31.5  | 37.7  | 6.4    | 8.0    | 9.5    |
| stETL  | 16             | 17          | 123.5          | 16.6      | 20.7  | 24.7  | 2.1    | 2.6    | 3.1    |
| LSCCF  | 25             | 18.6        | 138.1          | 7.1       | 13.8  | 20.8  | 1.0    | 1.9    | 2.9    |

Table 3.2 Comparison of Flip-Flops



Switching Activity

#### 3.4.2.2 Comparison with Different Application

In this section, I use some simple real application to measure the power consumption with different flip-flops. The simulation result is shown in table 3.4 and illustrated in figure 3.20 and figure 3.21. The result is simulated with TSMC 0.18um technology in 250MHz.

|                                        | TGFF      | HLFF      | SAFF      | dsETL     | DCCER     | stETL     | LSCCF     |  |
|----------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|--|
| Power                                  | 1.535E-04 | 4.124E-04 | 2.709E-04 | 3.743E-04 | 2.836E-04 | 4.001E-04 | 1.406E-04 |  |
| (a) 8-bits counter                     |           |           |           |           |           |           |           |  |
| TGFF HLFF SAFF dsETL DCCER stETL LSCCF |           |           |           |           |           |           |           |  |
| Power                                  | 1.686E-04 | 4.020E-04 | 2.567E-04 | 2.286E-04 | 2.311E-04 | 2.286E-04 | 9.264E-05 |  |

(b) Shifter register

Table 3.3 Comparison of Power in Different Application

The simulation result shows the similar data with the data simulated in section 3.4.2.1. In a higher switching activity application, the LSCCF based counter gains the lower power saving from 8% to 66%. In a lower switching rate application, the LSCCF based shifter gains the higher power saving from 45% to 77%. According to the results, it shows the LSCCF is the most efficient flip-flop and is suitable for the low power application.







In deep submicron IC design, the leakage power of gate leakage and sub-threshold leakage consumes the greater part of the static power. In order to improve the static power consumption, the MTCMOS technique is used to reduce the static power with no loss in performance. The LSCCF can also be designed with MTCMOS technique to reduce the leakage power in deep submicron design. So the LSCCF with MTCMOS is proposed in figure 3.22.

In the MTCMOS based LSCCF, I use the low threshold voltage device in the critical path to increase the operation speed, and use the high threshold voltage device in the non-critical path and leakage path to reduce both the dynamic power and static power consumption. According to the design issue, the pre-charged PMOS transistor pair, M1 and M2, and the inverter pair, I1 and I2, are used the high Vt device to reduce the leakage from the pre-charged transistor. In the delay chain design, because the chain usually retain the node data in the non-transition condition as shown in the figure, the three transistors, M3, M4, M5, are also put the high Vt device to reduce the leakage power when a non-transition condition.



The simulation result is shown in table 3.5. The result shows comparison between the normal design and MTCMOS design with TSMC 0.1um multi-threshold voltage technology. As the result shown in the table, it can gain a 54% static power saving and 8% dynamic power saving, and reduce the 25% clock to Q delay time. The simulation result is illustrated in the figure 3.23.

|           | Static Douver | Dunomio Douvor   | Clock     | Average   |            |  |
|-----------|---------------|------------------|-----------|-----------|------------|--|
|           | Staue Fower   | L'ynaniic r owei | 0>1       | 1>0       | Clock To Q |  |
| Dual Vt   | 1.257E-07     | 1.239E-06        | 3.522E-11 | 3.935E-11 | 3.729E-11  |  |
| Single Vt | 2.722E-07     | 1.349E-06        | 4.611E-11 | 5.257E-11 | 4.934E-11  |  |

| Table 3.4 | Comparison of | LSCCFF in dual | ¥t and | single | ¥t |
|-----------|---------------|----------------|--------|--------|----|
|-----------|---------------|----------------|--------|--------|----|



The low power single edge flip-flop design could reduce the load of the clock tree and the end branch power of the clock elements. As the high performance requirement is emphasized today, the operation frequency has to step up to increase the throughput of the system. But the dynamic power consumption is proportioned to the operation frequency, so the high operation frequency will make the encumbrance of total power.

Therefore, the double edge-triggered flip-flop (DETFF) is proposed. It could use half of the operation frequency to achieve the same throughput, and gain the 50% dynamic clock power saving. We also could trade off the power and performance and find a best balance point between power and performance to achieve both high performance and low power requirement.

#### 3.5.1 Double Edge-Triggered Flip-Flop Design Methodology

A classical double edge-triggered flip-flop can be illustrated as shown in figure 3.23. It is implemented with two opposite level-sensitive latches (or two opposite edge-triggered flip-flops) and a multiple. As shown in the figure 3.23, the D1 is a positive level sensitive latch (positive edge-triggered flip-flop) and the D2 is a negative level sensitive latch (or negative edge-triggered flip-flop), and the multiplex uses clock signal to select which one latch (or flip-flop) to transmit the data from the internal node to the output Q. When the clock is high, the positive sensitive latch (or flip-flop) D1 is activated to store data from input, and the negative sensitive latch (or flip-flop) D2 is activate to transmit data from internal node to output Q. When the clock is low, the positive sensitive latch (or flip-flop) D1 is activate to store data from input, Q, and the negative sensitive latch (or flip-flop) D2 is activate to store data from utput Q.



Fig.3.22 Classical DETFF Design

Another type double edge-triggered flip-flop design uses only one latch (or flip-flop) to implement the circuit. The classical schematic circuit is shown in figure 3.24. The DETFF uses a pulse generator to generate a pulse to activate the latch (or flip-flop). It works as the single edge triggered pulsed latch but it generates the pulse at each clock edges. As shown in the figure 3.24, the clock and the opposite delay clock are sent into the pulse generator and generating a pulse which has a double operation frequency. The pulse signal is sent into the level sensitive latch or edge-triggered flip-flop, and the circuit can be worked as a double edge-triggered flip-flop.





Fig.3.23 Pulsed based Double edge-triggered Flip-flop

In these above design, it can build different variations of double edge-triggered flip-flop by using different type flip-flop or latch. The TSPC based DETFF, DSTC based DETFF, transmission gate based DETFF, and differential based DETFF use the opposite level sensitive latches to implement the DETFFs [31][32]. The PDET [33], SLDET[34], ep-DSFF[35], LSDFF[36], and DPDET[37] use the pulse generator to implement the double edge-triggered flip-flops.

ALLILLAN.

#### 3.5.2 New Double Edge-Triggered Latch Design

A new conditional precharge double edge-triggered flip-flop (CPDFF) has been proposed in this section. It is a modified design from the single edge-triggered Hybrid latch, as shown in the figure 3.25.



Fig. 3.24 Conditional Precharge Double Edge-Triggered Flip-Flop(CPDFF)

The HLFF has a defect about its high power consumption, so a conditional

precharge technique is proposed to resolve this problem in [13] [14]. In this DETFF design, the conditional precharge technique is used to save the dynamic power consumption. The precharge PMOS transistors, M1 ~ M5, is control by output signal. Only when the output is low, the precharge PMOS transistors are allowed to charge the node X. it could avoid the unnecessary precharge power consumption of the node X in the HLFF. The pull down NMOS transistors, M8 ~ M11, provide two pull down path to discharge the internal nodes at the clock edges. The timing diagram of the control signal of the pull down NMOS transistors, M8 ~ M11, is illustrated in figure 3.25(b). It generates a transparent window at both the clock edges, and makes the latch could transmit data from input D to output Q. The inverter pairs, I1 and I2, are used to keep the output at a stable logic level.

When input D is high, the NMOS transistor M12 turns on, and M13 turns off. It provides a discharge path of node X when a transparent window is generated, and the PMOS transistor M7 charges the output Q to logic level 1. When input D is low, the NMOS transistor M12 turns off, and M13 turns on. The node X is charged by the precharge PMOS transistors, M1 ~ M5, and kept at high logic level. The output Q is discharged from the pull down path when a transparent window is occurred.

1896



3.5.3 Simulation Results and Comparisons

The proposed CPDFF has a zero setup time and a hold time of 95ps. The comparison of CPDFF and other double edge triggered flip-flop is shown in table 3.6, and illustrated in figure 3.27. The SLDET is the single latch double

latch flip-flop [34]. The ep-DSFF is the explicit-pulsed static hybrid flip-flop [35]. The DPDET is the dual-pulse-clock double edge triggered D-flip-flop [37]. The simulation result is simulated with TSMC 0.18um technology in 200MHz. The static power means the average power of one non- transition cycle. The dynamic power means the average power of two successive transition cycle. The PDP means the power-delay-product, and it is a major index to measure a performance of a flip-flop.

|         | Number of   | Static Power | Dynamic Power | Clock To Q(s) |           | Average    | DDD       |
|---------|-------------|--------------|---------------|---------------|-----------|------------|-----------|
|         | Transistors | (W)          | (W)           | 0>1           | 1>0       | Clock To Q | TDT       |
| CPDFF   | 27          | 2.135E-05    | 5.134E-05     | 1.394E-10     | 1.490E-10 | 1.442E-10  | 7.403E-15 |
| SLDET   | 22          | 5.288E-05    | 6.133E-05     | 1.903E-10     | 5.628E-10 | 3.766E-10  | 2.309E-14 |
| ep-DSFF | 28          | 3.651E-05    | 5.560E-05     | 3.488E-10     | 3.142E-10 | 3.315E-10  | 1.843E-14 |
| DPDET   | 24          | 1.001E-04    | 1.065E-04     | 1.566E-10     | 2.221E-10 | 1.893E-10  | 2.016E-14 |

Table 3.5 Comparison of DETFFs

As the result shown in the figure 3.27, the proposed DET could save 42%  $\sim$  79% static power and 8%  $\sim$  52% dynamic power, and the average clock-to-Q is 24%  $\sim$  72% less than other DET. The PDP of the CPDFF is 60%  $\sim$  68% less than the others. The proposed CPDET is the fastest one and consumes the lowest power between these DETs. The CPDET is an excellent choice for high performance and low power application.



Fig. 3.26 Comparison of DETFF

#### 3.5.4 DETFF Design with MTCMOS Technique

The proposed double edge-triggered flip-flop could also be designed with the MTCMOS technique to reduce its power and increase its performance, the improved one is illustrated in figure 3.28. The simulation result of the MTCMOS based CPDFF is shown in table 3.7, which is simulated with TSMC 0.1um multi-threshold voltage technology.



Fig. 3.27 CPDFF with MTCMOS

Similar as design in the section 3.4.3, I replace the normal device with the high threshold voltage device in the non-critical path and leakage path to reduce the power consumption, and the other devices are placed the high threshold voltage device to speed up the performance. So the precharge PMOS transistors, M1 ~ M6, are used the high Vt device, and use the high Vt device inverters I1 ~ I3. The PMOS transistor M7 is the one of the leakage device, but also the critical path device. So I use the low Vt device to avoid the loss in the performance.

As the result shown in table 3.7, the MTCMOS based CPDFF can save 8% static power and 2% dynamic power, and reduce 39% of the clock-to-Q delay time and the simulation result is illustrated in the figure 3.30.

|           | Statia Darrar | Demonsio Dorror | Clock     | Average   |            |  |
|-----------|---------------|-----------------|-----------|-----------|------------|--|
|           | Static Fower  | Dynamic Power   | 0>1       | 1>0       | Clock To Q |  |
| Dual Vt   | 1.835E-06     | 3.810E-06       | 5.933E-11 | 6.495E-11 | 6.214E-11  |  |
| Single Vt | 1.994E-06     | 3.857E-06       | 8.296E-11 | 1.205E-10 | 1.017E-10  |  |

Table 3.6 Comparison of CPDFF in dual Vt and single Vt



## 3.6 Conclusion

Several conventional flip-flops are introduced and simulated in this chapter. The design issue and low power technique for the flip-flop design are examined and simulated. The characterization metrics of flip-flop offer a criterion to determine the performance and power consumption of the flip-flops.

The delay chain designed in section 3.4.1 can be used for improve the

power consumption of HLFF too, and it can gain the 30% dynamic power saving with 25% performance loss.

The MTCMOS technique, low swing voltage technique, and conditional capture technique are the important methodologies for low power flip-flop design. A new low power flip-flop is proposed for low power application. It is suitable for low power application with a lower data switching activity.

The digital system could use half of the operation frequency to operate the circuit function with no loss in its throughput by using the double edge-triggered flip-flop, and it could save 50% power of the clock circuitry. In other words, we could use the same operation frequency to double the throughput or choose a proper operation frequency to achieve the improvement both in power consumption and performance. The DDR ram (Double Data Rate ram) is the typical application for double edge-triggered latch.

### A STATUTE

Choose the applicable flip-flop to meet the digital design requirement, high performance or low power consumption, is an important thing that should be concerned.



## **Chapter 4**

# Low Power Clock Gating techniques and Scan – Retention Mechanism

### 4.1 Introduction

In the last chapter, I discuss the basic design of the flip-flops and propose two new types of low power flip-flops. In this chapter, I will keep on discussing some applications of flip-flops and design the circuits.

In a high performance and highly synchronous system, a great amount of the flip-flops and latches are widely used to sample and store the data at the clock edge in order to increase the throughput rate and clock rate. It follows that the clock tree would consume a significant part of the total power in the overall system because the clock tree is always switching with a great deal of the capacitance every clock cycle. In order to reduce the clock power for low power applications, the clock gating techniques are widely used. In the section 4.2, I will discuss the conventional gating techniques and propose a new gating circuit that is combined the three other gating techniques [1][2][3][4][5][6].

In the deep-submicron generation, the chip is integrated into million of transistors and embedded into more and more functionality. How to test the functionality of the chip is good or not is becoming a major concept in the modern semiconductor industry. The scan technique is developed to resolve this problem with a slight overhead. The scan transistor that is embedded in the flip-flops provides a data path for inputting the testing data. The scan mechanism is discussed in the section 4.3.1 [7][8][9][10][11][12][13].

The data retention technique is used to store the valid data in a sleep mode when the chip is turned off for low power requirement. The retention mechanism is discussed in the section 4.3.2 [14]. At the last, a scan – retention mechanism that is combine both the scan technique and the retention mechanism is discussed in the section 4.3.3 [15][16].

## 4.2 Clock Gating Techniques

#### 4.2.1 Basic Clock gating technique

In the deep-submicron era, the more and more transistors and functional block are integrated into a chip. Therefore, the clock tree of the chip becomes more complicated and leads to obtain a great deal of parasitical capacitance. Because the clock signal is always switching every cycle, the parasitical capacitance of clock signal will be charged and discharged every cycle and consume significant amount of power. Hence, the 20% ~ 45% of overall power dissipation is contributed by the clocking network, and 90% of which is consumed by the clocked storage elements of the clock distribution network.

In order to save the power of the clock tree, two conventional ways are approached. One is to design a low power clocked storage element to reduce the branch power, the other is using the clock gating techniques to reduce parasitic capacitance of the tree and turn off the unnecessary part of the clock tree. It has been discussed to reduce the clock tree power by using the low power flip-flop design in the chapter 3. The clock gating technique will be discussed in the section.

The figure 4.1 shows the conventional H-type clock tree distribution. It uses the balanced wire delay of the clock tree to reduce the clock skew problem. The block A, block B, block C, and block D are the different function block of the chip, and the unique clock signal would switch every cycle. At each switching cycle, the whole chip is activated and the loading capacitance of the clock tree is formed form all the block. The great deal of loading capacitance causes the great power consumption.



Fig 4.1 Clock tree distribution

The figure 4.2 shows the conventional clock gating technique, the circuit that is comparing to the figure 4.1 is adding the AND gates to the gating nodes. The AND gates could gate the clock by using the control signals, cg1, cg2, cg3, and cg4. When the block is not necessary to be activated, the corresponding control signal turns off to make the clock not to charge or discharge the loading capacitance and save the switch power of the clock tree. For example, if there are only two blocks that have to be activated, we could save 50% power dissipation of the clock power by using the clock gating technique. Another advantage is the reduced loading capacitance. The AND gates separate the clock tree into two part, global clock and local clock. The separated clock makes the smaller active parasitical capacitance. It could also reduce the switching power of the clock tree.



- (a) Normal Latch
  (b) Latch with gating device
  (c) Normal dynamic circuit
  (d) Dynamic circuit with gating device

Figure 4.3 shows the clock gating methodology. The figure 4.3(b) is the latch circuit with the clock gating technique. The clock signal only drives the gate capacitance of the NAND gates. And the global clock activates the equivalent capacitance, gClk. In some high performance requirement, the dynamic circuit is needed to increase the operation speed. The figure 4.3(d) is the dynamic circuit with the clock gating technique. The dynamic circuit could be activated regularly with only one gate delay penalty.

#### 4.2.2 Signal gating technique

Signal gating technique is the advance technique of the clock gating technique. The signal gating technique is usually used in some particular application and implemented in the algorithm level. The conventional signal gating technique is used for two purposes. The first one: it could be used to gate the unnecessary input data and save the dispensable dynamic power of the block. The second purpose: it could prevent the unnecessary glitches and transitions of the input signal. The variable glitch of the input data may activate the combinational circuit, and make a waste of the switching power.

Because the signal gating is always applied to the algorithm level, it is the application-dependent. In other words, the signal gating circuit should be redesigned for different applications, and doesn't have a fixed circuit like the clock gating circuit.



Fig. 4.4 Precision distribution of Add/Sub.

For example: the figure 4.4 shows the precision distribution of a 32 bits adder. It is obvious that the probability is concentrated on the low bits addition. So we could design a decision circuit to determine the operation bits of the adder. When a low bits addition is worked, the sign bits of the adder can use the signal gating technique to save the unnecessary power.

In a combination adder, the output data couldn't be stable when the output data isn't the valid data. The glitch of the unstable output data will make the next stage to operate with the wrong input data, and it would consume the redundant power. The signal gating technique could be used to cancel the redundant glitch of the output data and save the dynamic power.

#### 4.2.3 Pipelining gating technique

The pipeline gating technique could be regarded as the special case of the signal gating technique. It is used for the applications in the data path architecture. The pipeline technique is used to increase the performance and throughput in a highly sequential circuit. The pipeline gating technique is used to gate the idle clock of each pipeline stage and save the switch power of the idle stages.

Figure 4.5 shows the operation diagram of the pipeline gating. An additional clock generator is used to produce the controlled clock. When the data input is unnecessary to transport to the next stage, the corresponding clock would be turned off to save the dynamic power of the logic blocks. In the example of the figure 4.5, the input data with a value '00' would generate an invalid clock and sent into the data path. In the other words, the logic block is gating when the input signal is '00' and it is similar as the signal gating.



The figure 4.6 shows a data path pipeline diagram of the superscalar microprocessor. The each stage doesn't always have a gating circuit. For example, the fetch block and the decorder block are always working when a data input. They don't have to be gated with a pipeline gating technique. The execute block and the memory block don't work for every instruction, so they could be gated to save the power in a redundant clock cycle.





In this section, I will propose a gating circuit that combines the three kinds of the gating techniques above. The proposed circuit is illustrated in the figure 4.7. The clk signal is the global clock signal. It is used to activate the flip-flop in a transition state. The pclk signal is the signal gating control signal. It is used to gate the flip-flop in a pipeline gating state or signal gating state. The gclk signal is the clock gating control signal. It is used to gate the large-scale function block to turn off a larger block. The buffer delay of the input data is used to cancel the glitch transition in an unstable state.

ANILLER,

The K-map in the figure 4.7 describes the logic function about the input data D and the signal gating control signal pclk. Only when the input, the delay input and the inverse output have the same signal value, the flip-flop is activated by the pclk signal. The input D of the flip-flip also changes when both the input and delay input are the same.



Fig. 4.7 Proposed Gating technique



Fig. 4.8 Clock gating in each layer

In this chapter, I have discussed three kinds of the gating techniques. Each kind of gating technique could be used in the different way. The figure 4.8 shows a brief diagram to illustrate each gating technique in different layer of the chip. In a SoC era, several function blocks are integrated into a chip. The overall chip is activated by a global clock signal. And the clock gating technique is used to gate the power dissipation between different function blocks. Each function block is gated and generated a local clock in the function block. Every function block has itself logic function. Every logic function could be gated with the signal gating technique or pipeline gating technique to save dynamic power of the logic.

## 4.3 Scan-Retention Mechanism

#### 4.3.1 Scan Mechanism

In the deep-submicron era, the more and more transistors are integrated into one chip. It results in the challenges to test the function of the chip. In order to resolve this problem, the BIST (Built-In-Self-Testing) technique is used to provide convenience to test the function. The BIST is an additional circuit that is embedded in the chip. The BIST circuit is specially designed for the testing purpose. It purposes to reduce the time for testing with a little area penalty of the chip. A complete BIST circuit contains many parts of circuits, ex: random pattern generator, response analyzer, scan chain, test controller...etc. In this section, I only discuss the scan chain circuit that is based on the typical flip-flops.

The figure 4.9 shows the brief operation diagram of the scan chain mechanism. The scan chain is connecting all the flip-flops of the chip by another scan path. In a normal operation mode, the data is pass through the general function path. Only in the test mode, the input data is transmitted from the scan path. The serial scan chain provides a simple data path to transmit the testing pattern from only one input pad. It could test the chip with a little pad penalty.

The figure 4.9 also shows the operation waveform of the scan mechanism. In the beginning of the test mode, the test patterns that are generated from the random pattern generator are inputted from the scan\_in port. The scan-in operations continue to input the data for N cycles. The number N means the total numbers of scannable flip-flops of the chip. After all the flip-flops are scanned in the test pattern, the test control signal is turned off and the chip is operated in the normal mode for one cycle. In the normal operation mode, the

output data are operated in the combinational circuits and transported to the flip-flops in the next stage. In the later cycles, the operation mode is turned into the test mode and the scan-out operation is activated. The calculated output data are transmitted to the response analyzer from the scan-out path for N cycles. Finally, the response analyzer will analyze the calculated data and determine the function is correct or not. The above test steps will be repeated until all the patterns are tested and analyzed.





Fig. 4.10 Typical Scannable Master-Slave Flip-Flop

The figure 4.10 shows the diagram of typical scannable master-slave flip-flip. The scannable flip-flop adds two ports, scan\_in and scan\_out. The scan\_in port and the master latch are connected with a multiplexer that is controlled by a scan clock. The scan\_out port is connecting to the slave latch and the scan\_in port of the next stage. The scan\_in port and the input port are selected by the scan clock and global clock. In the normal operation mode, the scan clock is turned off and the flip-flop is activated by global clock. In the test mode, the global clock is turned off and the flip-flop will sample the data from the scan\_in port with scan clock.

Many kinds of scannable flip-flops had been designed. They are all designed with a small area overhead.

#### 4.3.2 Data Retention Mechanism

In the section 4.2, I have discussed the clock gating technique. This technique is used to gate the clock signal of the clocked elements, but the supply voltage of the logic block is still applied. The power gating technique is used to gate the supply voltage of the unused block. It could reduce both the dynamic power dissipation and static power dissipation. The power gating technique has discussed in the chapter 2.

The power gating technique is used for low power requirement. But when

the circuit is operated in the sleep mode, the stored value of the flip-flop will be destroyed because the leakage current discharges the internal gated nodes gradually. It makes the circuit to fail to recover the stored value when the sleep mode is turned back to the active mode. In order to resolve this problem, the data retention mechanism is proposed.



Fig. 4.11 Operation of Ballon circuit

The figure 4.11 shows the operation diagram of the ballon circuit. The ballon circuit uses an additional latch to store the value of the flip-flop in the sleep operation mode. The signal tg1 is usually used as the global clock signal, and the signals tg2 and tg3 are the sleep mode control signals. In a normal active mode, the control signals tg2 and tg3 are disabled and the tg1 is active to activate the flip-flop. In a sleep in mode, the value of the flip-flop is stored into the additional latch. So the tg1 and tg2 are active to transmit data to the latch. In the sleep mode, the supply power of the flip-flop is gated and the data is preserved in the latch. So the tg3 is active to retain the value, and signal tg1 and tg2 are disabled to save power. In the sleep out mode, it is the reverse operation of the sleep in mode. The value of the latch is read out to recover the value of the flip-flop. So the signal tg2 and tg3 are active to transmit the stored data to the flip-flop.

The figure 4.12 shows the master-slave flip-flop with ballon circuit. Compare to the pure master-slave flip-flop, the ballon DFF adds an additional ballon circuit that is used to store the data in the sleep mode. The data retention flip-flop is usually designed with MTCMOS technique. The low Vt devices could be used in the gating device to reduce the leakage current and the high Vt devices could be used in the critical path to reduce critical delay. As the result, the high Vt devices are usually used in the ballon circuit of a data retention flip-flop to reduce the leakage current in the sleep mode. It also reduces the static power consumption. The main body of the flip-flop is constructed with the low Vt devices, and it could reduce the performance decay that is caused by the power gating device.



4.3.3 Scan - Retention Mechanism

In the section 4.3.1 and section 4.3.2, I have discussed the scan mechanism and the data retention mechanism. The scan mechanism is used for BIST. The data retention mechanism is used for low power requirement. These two concepts are both the important issues in modern digital circuit design. In order to design a circuit with both techniques, Zyuban V. and Kosonocky S.V proposed a new scan-retention mechanism in [15].

The scan-retention mechanism is briefly illustrated in the figure 4.13. The scannable flip-flop and the data retention flip-flop have a common characteristic, and they both use a latch to store data. In the scan-retention flip-flop, it uses the additional latch to store retention data and transmit data to

the scan out.



The figure 4.13 also shows the operation waveform of the scan-retention mechanism. The clock C is the global clock that is used to activate the flip-flop in the normal mode. The clock B is the slave latch control clock that is used to store the data or scan the data out. The clock A is the master latch control clock that is used to recover the data or scan the data in. The restore signal is used to control the multiplexer that is used to decide the scan-in operation or recovery operation. The waveform of the figure 4.13 illustrates the data-retention operation. In the beginning of the operation, the clock B is active to store the data into the latch. In the sleep mode, the supply power is turned off and the internal nodes discharges to the undefined logic level gradually. In the retention operation, the clock A is active to recover the data from latch to flip-flop. Finally, it shows the normal operation, and the flip-flop sample data at every clock edge of the clock C.

#### 4.3.4 Proposed Scan – Retention DETFF

In the chapter 3, I had proposed a new double edge-triggered flip-flop, and introduced the scan-retention mechanism in the section 4.3. In this section, I redesign the proposed CPDFF with the scan-retention function.

The figure 4.14 shows the circuit diagram of the scan-retention CPDFF. The inverters, I1 ~ I4, are the additional latch, and they also provide the path of the scan out. The NMOS transistors, M1 ~ M8, are used as the selective pull down paths. The control signals are the same as the ones in the figure 4.13. The selective pull down circuit that is controlled by the signal restore could be used to decide the operation of the flip-flop. When the restore signal is active, the flip-flop is working in the recovery operation. Then the stored data could be restored into the slave latch. When the restore signal is disable, the flip-flop is working in the scan in operation. Then the slave latch could be stored data from the scan\_in port.



Fig. 4.14 CPDFF with Scan-Retention Mechanism



Fig. 4.15 Operation Waveform of Proposed Scan-retention DETFF

The figure 4.15 and figure 4.16 show the waveforms of the data retention operation. The figure 4.15 shows the normal operation of the double edge triggered flip-flop in the beginning. The flip-flop samples the input data at every clock edge. In the later, the flip-flop is operated in the store-in mode to store the data into the storage latch. In the sleep mode, the value of the input data and output data are destroyed, and it is shown in the figure obviously.

The time in the figure 4.16 is later than the time in the figure 4.15 about 6 usec. It shows the internal nodes of the flip-flop have been discharged to the logic level 0. In the figure 4.16 also shows the retention operation of the

flip-flop. When the sleep mode is disabled and the clock A is active, the data is restored from the storage latch to the flip-flop.



When the flip-flop is working in the test mode, the clock A and clock B are active alternately to repeat the scan-in operation and scan-out operation. Then the clock C is active to operate the flip-flop in the normal mode and calculate the function results. The detailed operation has been discussed in the section 4.3.1. This proposed flip-flop is suitable to use in the double clock rate system and with the scan-retention characteristic.

## 4.4 Conclusion

In this chapter, some applications of flip-flop are discussed. The clock gating technique is used in the algorithm layer of the chip o reduce power consumption. The clock gating technique can be easily used and implemented with small overhead. It could reduce the power consumption efficiently by separating the parasitical capacitance and turning off the unused blocks. The signal gating could be designed specially for different applications. It could cancel the signal glitch and reduce the redundant dynamic power. The pipeline gating could be used in the data path to save the unnecessary logic block power but having an overhead to construct the complicated control logic.

A new gating circuit is proposed. It combines the three kinds of clock gating techniques, and is suitable for most of applications.

The scan chain technique is used for testing the function of the chip. The data retention flip-flop uses the data recovery skill to restore the data with a power gating technique. The scan-retention is discussed that combine both two techniques to realize a new scan-retention flip-flop circuit. The proposed double edge triggered flip-flop is redesigned with the scan-retention function. The scan-retention CPDFF is suitable for the low power, testable system.

4000

## Chapter 5

# Low Power and Reconfigurable First-In-First-Out Register File (FIFO) Design

## 5.1 Introduction

High speed and low power semiconductor memory is the key component in the communication systems and digital signal process systems. A large portion of the silicon area of many digital applications is used to store the data values and program instructions.

Most memories works in random access class, which means memory locations could be read or written in a random order. These kinds of memories are called RAM ( random access memory ), the SRAM ( static random access memory ) and the DRAM ( dynamic random access memory ) are the representative ones in this class.

Some memory types restrict the order of access, which result in either faster access times, smaller area, or a memory with a special functionality. Examples of such are the serial access memories: the FIFO (first- in-first-out) [1][2][3][4][5][6][7][8], the LIFO (lest-in-first-out, usually used as the stack) [9], and shift register. In most of communication systems and digital signal process systems, the data is inputted and outputted sequentially, the random access is not required. So the FIFO and LIFO get more and more important in the modern technical application.

In this chapter, a low power and reconfigurable FIFO register file is proposed, it could operate at 1.11GHz and consume dynamic power in several mW.

## 5.2 Proposed Reconfigurable FIFO design

FIFO cells are widely used as the data buffer in the communication network and DSP chips. In the future, the FIFO would also play an important role as the bus data buffer in the NoC (Network on Chips) generation and SoC (System on Chip) generation. In order to satisfy the diverse requirement of data bandwidth and storage volume for different applications, a flexible size configuration is proposed in the chapter. This feature makes the FIFO IPs to be used more efficiently. And we don't have to design a new FIFO cell for every new application, we only have to decide the memory size that we need and import the appropriate control signal, then we could get the memory cells that we need. It could shorten the time to market efficiently and reduce the research cost greatly.

5.2.1 Overall FIFO Design Concept

In this chapter, size reconfiguration and size expansion are the main concepts in the FIFO design. The size reconfiguration ability demands that the FIFO cell could vary its effective memory storage by using different control signals. The size expansion ability demands that a large-scale FIFO cell could be established by composing of several essential FIFO cells in serial or in parallel.

AND LEAR

In order to design the proposed FIFO cell, there are more input and output pins are needed than an ordinary FIFO cell. The input and output pins of proposed FIFO cell are described in the figure 5.1, and a 128 words X 8 bits FIFO is designed in this work. The write\_data[0:7] are the input data which are needed to store in the FIFO cell from the previous stage. The write\_enable is a control signal to notify the FIFO cell that the write\_data is valid. The read\_data[0:7] are the output data which are needed to read out from the FIFO cell to the subsequent stage. The read\_enable is a control signal to notify the data that is stored in the FIFO. The master/slave signal and control[1:0] are the control signals which are used for size expansion and size reconfiguration. The full/empty signals are the flag signals that are used to display the storage volume of the FIFO cell. When all the memory cells of the FIFO are stored with valid data, the full flag is changed to level 1 and forbid the write operation of the FIFO until the stored data are read

out. When all the memory cells of the FIFO are empty, the empty flag is changed to level 1 and forbid the read operation of the FIFO until the previous stage writes new data into the memory. The four signals, WFFO, RFFO, WFFI, and RFFI, are the feedback loop signals of the address pointer that are designed for size expanding operation. The final four signals, F\_flag\_out, E\_flag\_out, F\_flag\_in, and E\_flag\_in, are the control signals that are used to generate the full/empty flag signals. The detailed operation of the FIFO will be described in the section 5.3.1.



The figure 5.2 is the internal cell view of the proposed FIFO. The dual port RAM cell is the main storage block of the FIFO. The address pointers are used as the decoder to select the proper wordline when the read operation or write operation is activated. The initial pulse block is used to generate an initial pulse to activate the address pointer when the beginning of the read operation and write operation. The sense amplifier and bias circuit are the general block that are used to shorten the readout time in the memory cell. Each block would be described in detail in the next section, and the detailed operation of the FIFO cell would be described in the section 5.3.1.



5.2.2 Dual Port RAM Cell Design

In this proposed FIFO design, the dual port RAM cell is used [10]. This feature makes the read operation and the write operation activating with separated port, it makes the FIFO cell could write in data and read out data in one cycle. The figure 5.3 shows the general dual port RAM cells that are all SRAM-like memory cell. The RAM cell in the figure 5.3(a) is consisted of a flip-flop and two pairs of access transistors. The flip-flop is used to store the valid data and the two pairs of transistors are used for separated read port and write port. The RAM cell in the figure 5.3(b) is a modification of the figure 5.3(a), the read port is modified to gate driven type. This type of read port consumes less power and needs larger area. But this type of read port should be operated with specific bias circuit and sense amplifier. Therefore I use the traditional one ( figure 5.3(a) ) in order to co-operate with the bias circuit and sense amplifier in section 5.2.3.



5.2.3 Sense amplifier & Bias Circuit

The bitline node is always with large parasitic capacitance at the bitline because there are lots of words connecting to the bitline node. It would cause the large power consumption and increase the access time when data is written in or read out. The traditional sense amplifier is the voltage sense mode, and the large parasitic capacitance would cause a serious influence on the access time and access power of the sense amplifier. In order to reduce the influence of the parasitic capacitance, a current mode sense amplifier is used in this work.

The clamped bitline sense amplifier (CBL) [11][12] is illustrated in figure 5.4. It operates with a fast response time and almost independent of the bitline parasitic capacitance. This good ability is achieved because it relocates the nodes that have to be charged or discharged in a read operation instead of charging or discharging the bitline nodes directly. Therefore, the clamped bitline sense amplifier could get a very fast respond time. The clamped bitline sense amplifier also consumes less power except the fast response time. These two nodes, bitline, and inverse bitline, are always biased on the virtual ground, as shown in the figure 5.4. It means that these two nodes would make a much smaller variation on that's voltage level than the other conventional sense amplifiers. It makes the CBL to reduce the large switching power on the bitline node.


#### 5.2.4 Address Pointer Design

In the serial access memory, the address pointer is used to achieve the higher operation speed instead of decoder in the random access memory [13]. The address pointer is composed of a serial chain of flip-flops. In order to reduce the power and area, a low power and small area flip-flop has to be designed.

The address pointer is illustrated in the figure 5.5. When the FIFO cell is activated, the address will form a shift-register-like function to select the proper wordline to write data or read data. The selected wordline would keeps at the level 1, and the others keep at the level 0. In the other words, the address pointer is almost keeping at the logic level 0 except for the selected one wordline, and it also means the address pointer has a low switching activity. A new conditional capture flip-flop has been designed in the section 3.4, and it is illustrated in the figure 5.6. It is suitable for the low switching activity application in this work. It could efficiently save about 90% power of the address pointer.



Fig. 5.5 Address Pointer Design



Fig. 5.6 Proposed Low Swing Conditional Capture Flip-Flop (LSCCFF)

In the other aspect, in order to design a reconfigurable FIFO cell, an improved address pointer is proposed. The figure 5.6 shows the improvement of the address pointer. In figure 5.7 (a), a traditional address pointer is built up by connecting the flip-flops in serial. The cyclic type flip-flop chain is cut off to form two end points for the size expansion application. The end point FFO is the output pin of the serial flip-flop chain, and the FFI is the input pin of the serial flip-flop chain. In the single chip normal operation state, the FFI and FFO

should be linked together to reform the cyclic flip-flop chain.

The figure 5.7 (b) shows the reconfigurable flip-flop chain improvement. In this work, we distribute the chain into four groups, the size of the four groups are 16 bits, 16 bits, 32 bits, and 64 bits. And we use a 4 to 1 multiplexer that is controlled by control[1:0] to select which one cyclic path that we want. It is also necessary to add the multiplexer between each two groups to avoid consuming power in the non-working serial flip-flop cells. This proposed architecture could adjust the effective memory size for different application and save the unnecessary power consumption.



5.2.5 Initial Circuit Design

The initial circuit block is a pulse generator and used to generate a short pulse to activate the address pointer. The initial circuit block is illustrated in the figure 5.8. The rst signal is the negative reset signal, and it resets the block state at the logic level 0. The master/slave control is used to confirm that what the operation the FIFO cell works. If the FIFO cell is operated in the single mode, the M/S control signal is handle at logic level 1. If the FIFO cells are operated in the serial mode, the M/S control signal of the first one is selected to

master one, and the other are selected to the slave ones.



#### 5.2.6 Flag Logic Design

The flag logic block is the most important part in a serial access memory design, and it usually decides the performance that the FIFO cell could achieve. The figure 5.9 (a) shows the overview diagram of the flag logic design in a FIFO block. The counters are used to record the pointer position of the address pointers. In order to differentiate what state the FIFO is in, the counter uses an additional bit to record the overflow state. If both the overflow bit and the value of the read counter are equal to the ones of the write counter, if means that an empty state is occurred. If the value of the two counters aren't equivalent, if means that a full state is occurred. In the other words, the flag logic block is a compared logic and is composed of the comparators and the decision circuits.

The figure 5.9 (b) shows the traditional variable reference-value flag logic design scheme. It connects the decision logic and compared logic in serial. The advantage of this design is that it is a regular design and easy to implement. But the serial architecture make the critical path delay time is proportion to size of the counter, and it will affect the operation speed of the FIFO cell when a large size FIFO cell is designed.

The figure 5.9 (c) shows the improved circuit of the flag logic block. This work uses a paralleled design instead of the serial one. In this work, the critical path delay is proportion to the logarithm value of the counter size. It could increase the operation speed effectively by this work. The reconfigurable

design is also shown in the figure 5.9 (c). The ct1 and ct2 are the control signals that are used to decide the flag logic decision path and the effective memory storage size. When a smaller memory size is activated, the lower bit should be assigned to the full/empty decision bit, and the decision circuit should be made changes to fit the different memory size.

In the figure 5.9 (d) shows how the full/empty flags are generated when there are several chips that are connected in the serial mode. The E\_flag\_in signal of the master one should be biased on the logic level 1, and the E\_flag\_in signal of the other slave ones is connected to the E\_flag\_out signal of the previous stage. Each FIFO cell will generate the local flag signals itself, and all the local flag signals are figured out the final value of the flag signal by a serial AND logic gates, such as shown in the figure 5.9 (d).



Fig. 5.9 Reconfigurable Flag Logic Design

#### 5.2.7 Reconfigurable Power Gating Design

A reconfigurable power gating technique is also used in current work. A brief diagram is illustrated in the figure 5.10. By using a PMOS pass gate transistor that is biased by the reconfigurable control signals, we could turn off the supply voltage of the unnecessary memory RAM cells and flip-flop chains. The power gating technique could reduce both the dynamic power and static power of the chip effectively.



### 5.3 Simulation Result

#### 5.3.1 Simulation Result and Waveforms

The simulation result of the proposed FIFO cell is shown and discussed in this chapter. The result is simulated with TSMC 0.13um technology file and with a supply voltage of 1.2V.

A 128 words X 8 bits FIFO cell is designed with TSMC 0.13um 1.2V technology. The maximum operation frequency of the cell is 1.11GHz. The power dissipation of read operation is 3.15 mW, the power dissipation of write operation is 2.15mW, and the power dissipation of both the read & write operation is 4.65mW. The static power dissipation of the cell is 65.4uW with a power gating technique. The experiment result is summarized in the table 5.1.

| FIFO Memory Size                     | 128 Words X 8 Bits |
|--------------------------------------|--------------------|
| Process Technology                   | TSMC 0.13 um       |
| Supply Voltage                       | 1.2 V              |
| Max. Operation Frequency             | 1.11 GHz           |
| Average Read Power at 1.11 GHz       | 3.146335 mW        |
| Average Write Power at 1.11 GHz      | 2.145781 mW        |
| Average Read&Write Power at 1.11 GHz | 4.653373 mW        |
| Static Power at 1.11 GHz             | 65.45098 u₩        |

| Table 5.1 The Simulation Result of the FIFO C |
|-----------------------------------------------|
|-----------------------------------------------|

The figure 5.11 shows the operation waveform of the FIFO cell. The clk is the clock signal of the chip. The rw[127:0] and ww[127:0] are the wordline position of the read / write address pointer, and it shows the wordline which is activated. The empty / full are the flag signals that are generated by the flag logic block. The input[7:0] is the 8 bits input bus and the output[7:0] is the output bus. The wen is the local write enable signal. When the write enable signal is inputted from the previous stage, the local enable signal wen is generated as a short pulse to activate the write operation if the memory isn't full ( the full flag signal is inputted from the read enable signal. When the read enable signal is inputted from the next stage, the local enable signal. When the read enable signal is inputted from the next stage, the local enable signal ren is generated as a short pulse to activate the read operation if the memory isn't empty ( the empty flag signal isn't active ).





The pin connections of the FIFO cell are illustrated in the figure 5.12. It is operated in the single chip mode. In the beginning of the operation, a negative reset signal is activated to reset all the circuit. The counter and the output nodes of the flip-flops in the chip are reset to 0, and the initial circuit is reset to the logic level 1. When the write /read enable signal is inputted into the chip, the local enable signal wen / ren is generated and transmitted into the address pointer, initial circuit and the input circuitry. The initial circuit generates a short pulsed into the address pointer and makes the address pointer to work as the shift-register-like circuit. It shifts the pulsed when the wen / ren signal is activated at each time. The output of the address pointer is connected to the wordline node, it activate the selected RAM cell to proceed the read / write operation. When the write / read operation is proceeded, the value of the counter circuit is increased and the flag logic block is activated to judge the storage volume of the memory cell. If all the memory cells are stored with the valid data, the flag logic circuit would activate the full flag signal to deny the write operation. If all the memory cells are unoccupied, the flag logic circuit would generate the empty flag signal to deny the read operation. If the address pointer points at the last one wordline of the memory chip, the feedback branch provides the cyclic path to continue the next one operation. For example, the chip is controlled to activate only 16 words X 8 bits memory cells in the figure 5.11. When the 16 words of the memory cells are stored with valid data, the full signal is activated and stopping the write operation until the read operation is proceeded. And the next one active wordline is back to the first one wordline.



Fig. 5.12 The operation of Single Chip mode

#### 5.3.2 Simulation Result of the Reconfigurable Feature

There is one special feature in this work is the reconfigurable design. We could vary the valid storage volume of the memory cell for different applications by changing the input control signal. And we could use the PMOS gating device to reduce the dynamic power and static power of the unused memory cells. The table 5.2 shows the simulation results about the reconfigurable design and is illustrated in the figure 5.13.

| Size      | Static      | Write       | Read        | Read & Write |
|-----------|-------------|-------------|-------------|--------------|
| 16 Words  | 4.47123E-05 | 2.06839E-03 | 3.12769E-03 | 4.61596E-03  |
| 32 Words  | 4.74125E-05 | 2.13789E-03 | 3.12992E-03 | 4.63711E-03  |
| 64 Words  | 5.34458E-05 | 2.14188E-03 | 3.13527E-03 | 4.64583E-03  |
| 128 Words | 6.54510E-05 | 2.14578E-03 | 3.14634E-03 | 4.65337E-03  |

Table 5.2 The Comparison of the FIFO cell with different size



According to the result, we could save 19.3% of the static power consumption and 0.6% to 3.6% of the dynamic power consumption with the 16 words chip; save 27.6% of the static power consumption and a few of the

dynamic power consumption with the 32 words chip; save 18.3% of the static power consumption and a few of the dynamic power consumption with the 64 words chip. As the result, this work could save the static power consumption effectively and reduce the dynamic power consumption slightly.

#### 5.3.3 Size Expansion of Proposed FIFO Register File

Another one feature of current work is the size expansion design. The proposed FIFO is easily expanded in width and depth. The figure 5.14 shows the width expansion example of two FIFO cells, and the figure 5.15 shows the depth expansion example of two FIFO cells.



Fig. 5.14 The Operation of the Multi-Chip Parallel mode

By connecting the two FIFO cell in parallel, the FIFO cells could be used to buffer the input data with the multiple times the bandwidth of the bus. As shown in the figure 5.14, the two FIFO cells are inputted the same signals, clock, write enable, read enable, control [1:0], to make the FIFO cells to work in a synchronous condition. And it also make each one FIFO cell to generate the same local flag signals to ensure the same operation condition. Each one FIFO cell works as the single chip mode of the FIFO cell and the synchronous input data leads the synchronous output data.



Fig. 5.15 The Operation of the Multi-Chip Serial mode

By connecting the two FIFO cell in serial, the FIFO cells could be used to buffer the input data with the multiple times the storage size of the memory. The two FIFO cells are inputted the same signals, clock, write enable, read enable, write data [7:0], read data [7:0], to synchronize these two FIFO cells. And these two chips are activated in the different chip mode, one is operated as the master one, and the other is operated as the slave one. The majority difference between the two operation modes is the initial state of the circuitry when the FIFO is reset in the beginning of the operation. As shown in the figure 5.15, the connection circuits of the two cells form two feedback loops. The loops are used to generate the unique flag signal and to activate the unique wordline to operate the cells correctly. The detailed flag logic generated methodology is accounted in the section 5.2.6. These circuitries make the FIFO cells to operate correctly in the multi-chip serial mode.

There is one thing that should be noticed, the control signals of these two chips could be inputted with the different values. In the other words, the reconfigurable feature can also be used in the multi-chip serial mode and multi-chip parallel mode. And it is possible to make the variety of the size configuration of the memory.

## 5.4 Conclusion

In this chapter, a 128 words X 8bits reconfigurable FIFO (First-In-First-Out) cell is designed ad discussed. A reconfigurable design of the FIFO cell is used, and it could reconfigure the valid size of the memory to satisfy the different requirement for every application. Another one feature is the size expansion design, and it makes the FIFO easy to combine several cells into one larger one to expanse the width size and depth size of the FIFO memory storage.

1111

A parallel architecture of the flag logic is designed to shorten the critical path delay and increase the operation speed. The FIFIO also uses low power flip-flops and the reconfigurable power gating technique to reduce both the dynamic power and static power.

In the final, we briefly discuss the LIFO (Last-In-First-Out) register architecture and design. The LIFO is also the serial access memory, it usually uses as the stack memory.

# Chapter 6 Conclusions and Future works

## 6.1 Conclusions

This thesis has presented several clocked storage elements design. They are all designed for different purposes.

The LSCCFF is designed with the conditional capture technique. So it benefits the low switch activity applications. A LSCCFF based 8 bits counter consumes  $8\% \sim 66\%$  dynamic power less than the others. And the LSCCFF based shift register consumes  $45\% \sim 77\%$  dynamic power less than the others.

The CPDFF is designed to sample the input data at both the clock edge. It is suitable for low power or double data rate applications. In a constant throughput rate system, it is possible to reduce half of the clock power by replacing the single edge-triggered flip-flops with the double edge-triggered flip-flops. In a constant clock rate system, the CPDFF could be used to double the throughput of the system. The proposed CPDFF is also compared with the other double edge-triggered flip-flops. It consumes 8% ~ 52% dynamic power less than the others.

In the chapter 4, the clock gating techniques are discussed. The clock gating techniques have been used widely in the low power application, since it is easy to implement and could save power efficiently. A hybrid type clock gating circuit is presented. It is suitable for low power microprocessor. A scan-retention CPDFF is presented and simulated. The scan feature is used for function testing. The data retention feature is used in the sleep mode operation for a low power application.

In the chapter 5, a reconfigurable FIFO cell is designed and simulated. The FIFO is implemented with a good ability in size expansion. It is easy to expand the memory storage depth and the width of the in/out data bus. The reconfigurable feature is used to reconfigure the effective memory storage size. It could also turn off the unused gates to reduce the power consumption. The FIFO cell is easily applied to the memory compiler program.

#### 6.2 Future Works

In the future, there are several related topics could be continued to research.

In the SoC era, more and more function units are embedded into a chip. Each function unit may operate with different supply voltage. The voltage island concept has become a hot topic in recent year. In order to resolve the link problem between the different supply voltage function blocks, the level converter flip-flop is presented. The level converter flip-flop is embedded in the level conversion function. It is used to resolve this problem with small overhead.

The second one is the comparison between the pipeline systems and paralleled systems. In the chapter 4, 1 had discussed some clock gating techniques that is used in the pipeline system. It is possible to realize a simple microprocessor with the hybrid gating circuit and compare the efficiency with the two systems.

In a multi-frequency system, an asynchronous FIFO cell could be used to buffer the data between two systems that are operated in different frequency. The read clock and write clock of the FIFO cell are separated and operated in the asynchronous mode. This FIFO cell could be also used in the NoC (Network-on-Chip) era.

The pipeline gating system, level converter flip-flop, and asynchronous FIFO cell may become the next research topics in the future.

# Reference

## **Reference of Chapter 2**

[2.1] Jam M. Rabaey : "Digital Integrated Circuits : A Design Perspective" 1996.

[2.2] Shigematsu, S.; Mutoh, S.; Matsuya, Y.; Tanabe, Y.; Yamada, J.;: "A 1-V *high-speed MTCMOS circuit scheme for power-down application circuits*" Solid-State Circuits, IEEE Journal of , Volume: 32 , Issue: 6 , June 1997 Pages:861 – 869

[2.3] Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Yamada, J.: "1V high-speed digital circuit technology with 0.5µm multi-threshold CMOS" ASIC
 Conference and Exhibit, 1993. Proceedings., Sixth Annual IEEE
 International, 27 Sept.-1 Oct. 1993 Pages:186 - 189

[2.4] Harada, M.; Douseki, T.; Tsuchiya, T.;: "Suppression of threshold voltage variation in MTCMOS/SIMOX circuit operating below 0.5 V" VLSI Technology, 1996. Digest of Technical Papers. 1996 Symposium on , 11-13 June 1996 Pages:96 - 97

[2.5] Shigematsu, S.; Mutoh, S.; Matsuya, Y.; Tanabe, Y.; Yamada, J.;: "A 1-V high-speed MTCMOS circuit scheme for power-down application circuits" Solid-State Circuits, IEEE Journal of , Volume: 32 , Issue: 6 , June 1997 Pages:861 - 869

[2.6] Mutoh, S.; Shigematsu, S.; Gotoh, Y.; Konaka, S.;: "Design method of *MTCMOS power switch for low-voltage high-speed LSIs*" Design Automation Conference, 1999. Proceedings of the ASP-DAC '99. Asia and South Pacific , 18-21 Jan. 1999 Pages:113 - 116 vol.1

[2.7] Kapadia, H.; Benini, L.; De Micheli, G.;: "Reducing switching activity on datapath buses with control-signal gating" Solid-State Circuits, IEEE Journal of, Volume: 34, Issue: 3, March 1999 Pages:405 - 414 [2.8] Hai Li; Bhunia, S.; Chen, Y.; Vijaykumar, T.N.; Roy, K.;: "Deterministic clock gating for microprocessor power reduction" The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003.
 Proceedings., 8-12 Feb. 2003 Pages:113 - 122

[2.9] Hee-Bok Kang; Hun-Woo Kye; Geun-II Lee; Je-Hoon Park; Jun-Hwan Kim; Seaung-Suk Lee; Suk-Kyoung Hong; Young-Jin Park; Jin-Yong Chung;: *"A hierarchy bitline boost scheme for sub-1.5 V operation and short precharge time on high density FeRAM"* Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002 IEEE International , Volume: 1, 3-7 Feb. 2002 Pages:158 - 159 vol.1

[2.10] Margala, M.; *"Low-power SRAM circuit design"* Memory Technology, Design and Testing, 1999. Records of the 1999 IEEE International Workshop on , 9-10 Aug. 1999 Pages:115 - 122

[2.11] Gerosa, G.; Gary, S.; Dietz, C.; Dac Pham; Hoover, K.; Alvarez, J.; Sanchez, H.; Ippolito, P.; Tai Ngo; Litch, S.; Eno, J.; Golab, J.; Vanderschaaf, N.; Kahle, J.: "*A 2.2 W, 80 MHz superscalar RISC microprocessor*" Solid-State Circuits, IEEE Journal of , Volume: 29 , Issue: 12 , Dec. 1994 Pages:1440 - 1454

[2.12] Suzuki, Y.; Odagawa, K.; Abe, T.: "Clocked CMOS calculator circuitry" Solid-State Circuits, IEEE Journal of , Volume: 8, Issue: 6, Dec 1973
Pages:462 - 469

[2.13] Qiu Xiaohai; Chen Hongyi: "Discussion on the low-power CMOS latches and flip-flops" Solid-State and Integrated Circuit Technology, 1998.
Proceedings. 1998 5th International Conference on , 21-23 Oct. 1998
Pages:477 - 480

[2.14] Markovic, D.; Nikolic, B.; Brodersen, R.W.: *"Analysis and design of low-energy flip-flops"* Low Power Electronics and Design, International Symposium on, 2001., 6-7 Aug. 2001 Pages:52 – 55

[2.15] Partovi, H.; Burd, R.; Salim, U.; Weber, F.; DiGregorio, L.; Draper, D.: *"Flow-through latch and edge-triggered flip-flop hybrid elements"* 

Solid-State Circuits Conference, 1996. Digest of Technical Papers. 43rd ISSCC., 1996 IEEE International , 8-10 Feb. 1996 Pages:138 - 139

[2.16] Klass, F.: "Semi-dynamic and dynamic flip-flops with embedded *logic*" VLSI Circuits, 1998. Digest of Technical Papers. 1998 Symposium on , 11-13 June 1998 Pages:108 – 109

[2.17] Klass, F.; Amir, C.; Das, A.; Aingaran, K.; Truong, C.; Wang, R.; Mehta, A.; Heald, R.; Yee, G.: *"A new family of semidynamic and dynamic flip-flops with embedded logic for high-performance processors"* Solid-State Circuits, IEEE Journal of , Volume: 34 , Issue: 5 , May 1999 Pages:712 - 716

[2.18] Nedovic, N.; Oklobdzija, V.G.: *"Hybrid latch flip-flop with improved power efficiency"* Integrated Circuits and Systems Design, 2000. Proceedings. 13th Symposium on , 18-24 Sept. 2000 Pages:211 - 215

ATHILLER,

[2.19] Nedovic, N.; Aleksic, M.; Oklobdzija, V.G.: "Conditional techniques for low power consumption flip-flops" Electronics, Circuits and Systems, 2001.
ICECS 2001. The 8th IEEE International Conference on , Volume: 2 , 2-5 Sept. 2001 Pages:803 - 806 vol.2

[2.20] Mishra, S.M.; Rofail, S.S.; Yeo, K.-S.: "High performance double edge-triggered flip-flop using a merged feedback technique" Circuits, Devices and Systems, IEE Proceedings [see also IEE Proceedings G- Circuits, Devices and Systems], Volume: 147, Issue: 6, Dec. 2000 Pages:363 - 368

[2.21] Pontikakis, B.; Nekili, M.: "A novel double edge-triggered pulse-clocked TSPC D flip-flop for high-performance and low-power VLSI design applications" Circuits and Systems, 2002. ISCAS 2002. IEEE International Symposium on , Volume: 5 , 26-29 May 2002 Pages:V-101 -V-104 vol.5

[2.22] Johnson, T.A.; Kourtev, I.S.: "A single latch, high speed double-edge triggered flip-flop (DETFF)" Electronics, Circuits and Systems, 2001. ICECS 2001. The 8th IEEE International Conference on , Volume: 1, 2-5 Sept. 2001 Pages:189 - 192 vol.1

[2.23] Tschanz, J.; Narendra, S.; Zhanping Chen; Borkar, S.; Sachdev, M.; Vivek De: "Comparative delay and energy of single edge-triggered and dual edge-triggered pulsed flip-flops for high-performance microprocessors" Low Power Electronics and Design, International Symposium on, 2001., 6-7 Aug. 2001 Pages:147 - 152

[2.24] Kang, S.M.S.: *"Elements of low power design for integrated systems"* Low Power Electronics and Design, 2003. ISLPED '03. Proceedings of the 2003 International Symposium on , 25-27 Aug. 2003 Pages:205 - 210

[2.25] Kuo-Hsing Cheng; Yung-Hsiang Lin: **"A dual-pulse-clock double edge** *triggered flip-flop for low voltage and high speed application"* Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-425 - V-428 vol.5

[2.26] Zhijun Huang; Ercegovac, M.D.;: "On signal-gating schemes for *low-power adders*" Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on , Volume: 1 , 4-7 Nov. 2001 Pages:867 - 871 vol.1

[2.27] Hai Li; Bhunia, S.; Yiran Chen; Roy, K.; Vijaykumar, T.N.;: "*DCG: deterministic clock-gating for low-power microprocessor design*" Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , Volume: 12 , Issue: 3 , March 2004 Pages:245 - 254

[2.28] Hai Li; Bhunia, S.; Chen, Y.; Vijaykumar, T.N.; Roy, K.;: "Deterministic clock gating for microprocessor power reduction" The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003.
 Proceedings., 8-12 Feb. 2003 Pages:113 - 122

[2.29] Edirisooriya, G.; Edirisooriya, S.;: **"Scan chain fault diagnosis with fault dictionaries"** Circuits and Systems, 1995. ISCAS '95., 1995 IEEE International Symposium on , Volume: 3 , 28 April-3 May 1995 Pages:1912 - 1915 vol.3

[2.30] Chang, D.; Lee, M.T.-C.; Cheng, K.-T.; Marek-Sadowska, M.;: *"Functional scan chain testing"* Design, Automation and Test in Europe, 1998., Proceedings, 23-26 Feb. 1998 Pages:278 - 283

[2.31] Nicolici, N.; Al-Hashimi, B.M.;: *"Multiple scan chains for power minimization during test application in sequential circuits"* Computers, IEEE Transactions on , Volume: 51 , Issue: 6 , June 2002 Pages:721 - 734

[2.32] Zyuban, V.;: "Optimization of scannable latches for low energy"
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , Volume:
11, Issue: 5, Oct. 2003 Pages:778 - 788

[2.33] Zyuban, V.; Kosonocky, S.V.;: "Low power integrated scan-retention mechanism" Low Power Electronics and Design, 2002. ISLPED '02.
Proceedings of the 2002 International Symposium on , 12-14 Aug. 2002
Pages:98 - 102

[2.34] Shibata, N.; Watanabe, M.; Tanabe, Y.: "A current-sensed high-speed and low-power first-in-first-out memory using a wordline/bitline -swapped dual-port SRAM cell" Solid-State Circuits, IEEE Journal of, Volume: 37, Issue: 6, June 2002 Pages:735 - 750

[2.35] Yazawa, M.; Hosotani, S.; Imamura, Y.; Amishiro, H.; Okada, K.;: "A dynamic voltage sensing FIFO suitable for multi-format video systems"
Custom Integrated Circuits Conference, 1995., Proceedings of the IEEE 1995, 1-4 May 1995 Pages:159 – 162

[2.36] Kawasaki, H.; Long, S.I.;: "A low-power 128×1-bit GaAs FIFO for ATM packet switcher" Solid-State Circuits, IEEE Journal of , Volume: 31 , Issue: 10 , Oct. 1996 Pages:1547 - 1555

[2.37] Fenstermaker, L.R.; O'Conner, K.J.;: "A low-power generator-based *FIFO using ring pointers and current-mode sensing*" Solid-State Circuits Conference, 1993. Digest of Technical Papers. 40th ISSCC., 1993 IEEE International , 24-26 Feb. 1993 Pages:242 - 243, 295

[2.38] Gholipour, M.; Afzali-Kusha, A.; Nourani, M.; Khademzadeh, A.;: "An efficient asynchronous pipeline FIFO for low-power applications" Circuits and Systems, 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on , Volume: 2 , 4-7 Aug. 2002 Pages:II-481 - II-484 vol.2

[2.39] Brackenbury, M.L.L.;: "An instruction buffer for a low-power DSP"

Advanced Research in Asynchronous Circuits and Systems, 2000. (ASYNC 2000) Proceedings. Sixth International Symposium on , 2-6 April 2000 Pages:176 - 186

[2.40] Feldman, A.R.; Van Duzer, T.;: *"Hybrid Josephson-CMOS FIFO"* Applied Superconductivity, IEEE Transactions on , Volume: 5 , Issue: 2 , Jun 1995 Pages:2648 - 2651

[2.41] Wyland, D.;: "New features in synchronous FIFOs" WESCON/'93.Conference Record, , 28-30 Sept. 1993 Pages:580 - 585

[2.42] Jakobsen, J.;: "GaAs multiplexer chip for ATM switching" Gallium Arsenide Integrated Circuit (GaAs IC) Symposium, 1995. Technical Digest 1995., 17th Annual IEEE , 29 Oct.-1 Nov. 1995 Pages:35 - 38

[2.43] Kanuma, A.; Yaguchi, T.; Tanaka, K.; Katsumata, E.; Fujimoto, K.; Miyazawa, Y.; Iida, S.I.; Yamamoto, T.;: "A CMOS 510 K-transistor single-chip token-ring LAN controller (TRC) compatible with IEEE802.5 MAC protocol" Solid-State Circuits, IEEE Journal of , Volume: 25, Issue: 1, Feb. 1990 Pages:132 – 141

[2.44] Cao, J.; Green, M.; Momtaz, A.; Vakilian, K.; Chung, D.; Keh-Chee Jen; Caresosa, M.; Wang, X.; Wee-Guan Tan; Yijun Cai; Fujimori, L.; Hairapetian, A.;: *"OC-192 transmitter and receiver in standard 0.18-/spl mu/m CMOS"*Solid-State Circuits, IEEE Journal of , Volume: 37 , Issue: 12 , Dec. 2002
Pages:1768 - 1780

1896

[2.45] Garrett, D.; Stan, M.; "Power reduction techniques for a spread spectrum based correlator" Low Power Electronics and Design, 1997.
Proceedings., 1997 International Symposium on , 18-20 Aug. 1997 Pages:225 - 230

[2.46] Chattopadhyay, A.; Zilic, Z.;: *"High speed asynchronous structures for inter-clock domain communication"* Electronics, Circuits and Systems, 2002. 9th International Conference on , Volume: 2, 15-18 Sept. 2002
Pages:517 - 520 vol.2

## **Reference of Chapter 3**

[3.1] SAKURAI, T., and KURODA, T. : *"Low-Power Circuit design for multimedia CMOS VLSI's."* Proceedings of conference on Synthesis and system integration mixed technology (SASIMI), 1996, pp. 3-10

[3.2] Jam M. Rabaey : "Digital Integrated Circuits : A Design Perspective" 1996, pp.338 - 339

[3.3] Jam M. Rabaey : "Digital Integrated Circuits : A Design Perspective" 1996, pp.338 - 353

[3.4] Stojanovic, V.; Oklobdzija, V.G.; Bajwa, R.: "A unified approach in the analysis of latches and flip-flops for low-power systems" Low Power Electronics and Design, 1998. Proceedings. 1998 International Symposium on , 10-12 Aug. 1998 Pages:227 - 232

[3.5] Markovic, D.; Nikolic, B.; Brodersen, R.W.: *"Analysis and design of low-energy flip-flops"* Low Power Electronics and Design, International Symposium on, 2001., 6-7 Aug. 2001 Pages:52 - 55

[3.6] Gerosa, G.; Gary, S.; Dietz, C.; Dac Pham; Hoover, K.; Alvarez, J.; Sanchez, H.; Ippolito, P.; Tai Ngo; Litch, S.; Eno, J.; Golab, J.; Vanderschaaf, N.; Kahle, J.: "*A 2.2 W, 80 MHz superscalar RISC microprocessor*" Solid-State Circuits, IEEE Journal of , Volume: 29 , Issue: 12 , Dec. 1994 Pages:1440 - 1454

[3.7] Suzuki, Y.; Odagawa, K.; Abe, T.: "Clocked CMOS calculator circuitry" Solid-State Circuits, IEEE Journal of , Volume: 8 , Issue: 6 , Dec 1973
Pages:462 - 469

[3.8] Qiu Xiaohai; Chen Hongyi: "Discussion on the low-power CMOS latches and flip-flops" Solid-State and Integrated Circuit Technology, 1998.
Proceedings. 1998 5th International Conference on , 21-23 Oct. 1998
Pages:477 - 480

[3.9] Markovic, D.; Nikolic, B.; Brodersen, R.W.: "Analysis and design of

*low-energy flip-flops"* Low Power Electronics and Design, International Symposium on, 2001., 6-7 Aug. 2001 Pages:52 – 55

[3.10] Partovi, H.; Burd, R.; Salim, U.; Weber, F.; DiGregorio, L.; Draper, D.: *"Flow-through latch and edge-triggered flip-flop hybrid elements"* Solid-State Circuits Conference, 1996. Digest of Technical Papers. 43rd ISSCC., 1996 IEEE International , 8-10 Feb. 1996 Pages:138 - 139

[3.11] Klass, F.: "Semi-dynamic and dynamic flip-flops with embedded *logic*" VLSI Circuits, 1998. Digest of Technical Papers. 1998 Symposium on , 11-13 June 1998 Pages:108 – 109

[3.12] Klass, F.; Amir, C.; Das, A.; Aingaran, K.; Truong, C.; Wang, R.; Mehta, A.; Heald, R.; Yee, G.: *"A new family of semidynamic and dynamic flip-flops with embedded logic for high-performance processors"* Solid-State Circuits, IEEE Journal of , Volume: 34 , Issue: 5 , May 1999 Pages:712 - 716

[3.13] Nedovic, N.; Oklobdzija, V.G.: *"Hybrid latch flip-flop with improved power efficiency"* Integrated Circuits and Systems Design, 2000.
Proceedings. 13th Symposium on , 18-24 Sept. 2000 Pages:211 - 215

[3.14] Nedovic, N.; Aleksic, M.; Oklobdzija, V.G.: "Conditional techniques for low power consumption flip-flops" Electronics, Circuits and Systems, 2001.
ICECS 2001. The 8th IEEE International Conference on , Volume: 2 , 2-5 Sept. 2001 Pages:803 - 806 vol.2

[3.15] Draper, D.; Crowley, M.; Holst, J.; Favor, G.; Schoy, A.; Trull, J.; Ben-Meir, A.; Khanna, R.; Wendell, D.; Krishna, R.; Nolan, J.; Mallick, D.; Partovi, H.; Roberts, M.; Johnson, M.; Lee, T.: "Circuit techniques in a 266-MHz MMX-enabled processor" Solid-State Circuits, IEEE Journal of , Volume: 32, Issue: 11, Nov. 1997 Pages:1650 - 1664

[3.16] Kojima, H.; Tanaka, S.; Sasaki, K:; *"Half-swing clocking scheme for* **75% power saving in clocking circuitry**" Solid-State Circuits, IEEE Journal of , Volume: 30 , Issue: 4 , April 1995 Pages:432 - 435

[3.17] Kawaguchi, H.; Sakurai, T.: "A reduced clock-swing flip-flop (RCSFF)

*for 63% power reduction"* Solid-State Circuits, IEEE Journal of , Volume: 33 , Issue: 5 , May 1998 Pages:807 - 811

[3.18] Matsui, M.; Hara, H.; Uetani, Y.; Lee-Sup Kim; Nagamatsu, T.; Watanabe, Y.; Chiba, A.; Matsuda, K.; Sakurai, T.: **"A 200 MHz 13 mm<sup>2</sup> 2-D DCT macrocell using sense-amplifying pipeline flip-flop scheme"** Solid-State Circuits, IEEE Journal of , Volume: 29 , Issue: 12 , Dec. 1994 Pages:1482 - 1490

[3.19] Nikolic, B.; Stojanovic, V.; Oklobdzija, V.G.; Wenyan Jia; Chiu, J.; Leung,
M.: *"Sense amplifier-based flip-flop"* Solid-State Circuits Conference, 1999.
Digest of Technical Papers. ISSCC. 1999 IEEE International , 15-17 Feb. 1999
Pages:282 - 283

[3.20] Nikolic, B.; Oklobdzija, V.G.; Stojanovic, V.; Wenyan Jia; James Kar-Shing Chiu; Ming-Tak Leung, M.: *"Improved sense-amplifier-based flip-flop: design and measurements"* Solid-State Circuits, IEEE Journal of, Volume: 35, Issue: 6, June 2000 Pages:876 - 884

[3.21] Jin-Cheon Kim; Young-Chan Jang; Hong-June Park: "CMOS sense amplifier-based flip-flop with two N-C<sup>2</sup>MOS output latches" Electronics Letters , Volume: 36 , Issue: 6 , 16 March 2000 Pages:498 - 500

1111

[3.22] Tokumasu, M.; Fujii, H.; Ohta, M.; Fuse, T.; Kameyama, A.: "A new reduced clock-swing flip-flop: NAND-type keeper flip-flop (NDKFF)" Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002, 12-15 May 2002 Pages:129 - 132

[3.23] Hamada, M.; Terazawa, T.; Higashi, T.; Kitabayashi, S.; Mita, S.; Watanabe, Y.; Ashino, M.; Hara, H.; Kuroda, T.: *"Flip-flop selection technique for power-delay trade-off"* Solid-State Circuits Conference, 1999. Digest of Technical Papers. ISSCC. 1999 IEEE International , 15-17 Feb. 1999 Pages:270 - 271

[3.24] Strollo, A.G.M.; Napoli, E.; De Caro, D.: "New clock-gating techniques for low-power flip-flops" Low Power Electronics and Design, 2000. ISLPED
'00. Proceedings of the 2000 International Symposium on , 26-27 July 2000
Pages:114 – 119

[3.25] Xia, Y.; Almaini, A.E.A.: *"Differential CMOS edge-triggered flip-flop with clock-gating"* Electronics Letters , Volume: 38 , Issue: 1 , 3 Jan. 2002 Pages:9 - 11

[3.26] Cooke, M.; Mahmoodi-Meimand, H.; Roy, K.: **"Energy recovery clocking scheme and flip-flops for ultra low-energy applications"** Low Power Electronics and Design, 2003. ISLPED '03. Proceedings of the 2003 International Symposium on , 25-27 Aug. 2003 Pages:54 - 59

[3.27] Li Ding; Mazumder, P.; Srinivas, N.: "*A dual-rail static edge-triggered latch*" Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on , Volume: 2 , 6-9 May 2001 Pages:645 - 648 vol. 2

[3.28] Bai-Sun Kong; Sam-Soo Kim; Young-Hyun Jun: "**Conditional-capture flip-flop for statistical power reduction**" Solid-State Circuits, IEEE Journal of , Volume: 36 , Issue: 8 , Aug. 2001 Pages:1263 – 1271

ATTILLER .

[3.29] Zhao, P.; Darwish, T.K.; Bayoumi, M.A.: *"High-Performance and Low-Power Conditional Discharge Flip-Flop"* Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , Volume: 12 , Issue: 5 , May 2004 Pages:477 – 484

[3.30] Rjoub, A.; Koufopavlou, O.; Nikolaidis, S.: *"Low-power/low-swing domino CMOS logic"* Circuits and Systems, 1998. ISCAS '98. Proceedings of the 1998 IEEE International Symposium on , Volume: 2 , 31 May-3 June 1998 Pages:13 - 16 vol.2

[3.31] Mishra, S.M.; Rofail, S.S.; Yeo, K.S.: "Design of high performance double edge-triggered flip-flops" Circuits, Devices and Systems, IEE Proceedings [see also IEE Proceedings G- Circuits, Devices and Systems], Volume: 147, Issue: 5, Oct. 2000 Pages:283 - 290

[3.32] Mishra, S.M.; Rofail, S.S.; Yeo, K.-S.: *"High performance double edge-triggered flip-flop using a merged feedback technique"* Circuits, Devices and Systems, IEE Proceedings [see also IEE Proceedings G- Circuits, Devices and Systems], Volume: 147, Issue: 6, Dec. 2000 Pages:363 - 368

[3.33] Pontikakis, B.; Nekili, M.: "A novel double edge-triggered

*pulse-clocked TSPC D flip-flop for high-performance and low-power VLSI design applications*" Circuits and Systems, 2002. ISCAS 2002. IEEE International Symposium on , Volume: 5 , 26-29 May 2002 Pages:V-101 - V-104 vol.5

[3.34] Johnson, T.A.; Kourtev, I.S.: "A single latch, high speed double-edge triggered flip-flop (DETFF)" Electronics, Circuits and Systems, 2001. ICECS 2001. The 8th IEEE International Conference on , Volume: 1 , 2-5 Sept. 2001 Pages:189 - 192 vol.1

[3.35] Tschanz, J.; Narendra, S.; Zhanping Chen; Borkar, S.; Sachdev, M.; Vivek De: "Comparative delay and energy of single edge-triggered and dual edge-triggered pulsed flip-flops for high-performance microprocessors" Low Power Electronics and Design, International Symposium on, 2001., 6-7 Aug. 2001 Pages:147 - 152

[3.36] Kang, S.M.S.: *"Elements of low power design for integrated systems"* Low Power Electronics and Design, 2003. ISLPED '03. Proceedings of the 2003 International Symposium on , 25-27 Aug. 2003 Pages:205 - 210

[3.37] Kuo-Hsing Cheng; Yung-Hsiang Lin: *"A dual-pulse-clock double edge triggered flip-flop for low voltage and high speed application"* Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-425 - V-428 vol.5

## **Reference of Chapter 4**

[4.1] Zhijun Huang; Ercegovac, M.D.;: "On signal-gating schemes for *low-power adders*" Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar Conference on , Volume: 1 , 4-7 Nov. 2001 Pages:867 - 871 vol.1

[4.2] Kapadia, H.; Benini, L.; De Micheli, G.;: *"Reducing switching activity on datapath buses with control-signal gating"* Solid-State Circuits, IEEE Journal of , Volume: 34 , Issue: 3 , March 1999 Pages:405 - 414

[4.3] Huang, Z.; Ercegovac, M.D.;: *"Two-dimensional signal gating for low-power array multiplier design"* Circuits and Systems, 2002. ISCAS 2002. IEEE International Symposium on , Volume: 1 , 26-29 May 2002 Pages:I-489 - I-492 vol.1

Marken,

[4.4] Xanthopoulos, T.; Chandrakasan, A.P.;: "A low-power IDCT macrocell for MPEG-2 MP@ML exploiting data distribution properties for minimal activity" Solid-State Circuits, IEEE Journal of , Volume: 34 , Issue: 5 , May 1999 Pages:693 - 703

[4.5] Hai Li; Bhunia, S.; Yiran Chen; Roy, K.; Vijaykumar, T.N.;: "*DCG: deterministic clock-gating for low-power microprocessor design*" Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , Volume: 12, Issue: 3, March 2004 Pages:245 - 254

[4.6] Hai Li; Bhunia, S.; Chen, Y.; Vijaykumar, T.N.; Roy, K.;: "Deterministic clock gating for microprocessor power reduction" The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings., 8-12 Feb. 2003 Pages:113 - 122

[4.7] Edirisooriya, G.; Edirisooriya, S.;: **"Scan chain fault diagnosis with fault dictionaries"** Circuits and Systems, 1995. ISCAS '95., 1995 IEEE International Symposium on , Volume: 3 , 28 April-3 May 1995 Pages:1912 - 1915 vol.3

[4.8] Yuejian Wu;: "Diagnosis of scan chain failures" Defect and Fault

Tolerance in VLSI Systems, 1998. Proceedings., 1998 IEEE International Symposium on , 2-4 Nov. 1998 Pages:217 - 222

[4.9] Chang, D.; Lee, M.T.-C.; Cheng, K.-T.; Marek-Sadowska, M.;: *"Functional scan chain testing"* Design, Automation and Test in Europe, 1998., Proceedings, 23-26 Feb. 1998 Pages:278 - 283

[4.10] Nicolici, N.; Al-Hashimi, B.M.;: *"Multiple scan chains for power minimization during test application in sequential circuits"* Computers, IEEE Transactions on , Volume: 51 , Issue: 6 , June 2002 Pages:721 - 734

[4.11] Zyuban, V.;: "Optimization of scannable latches for low energy" Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , Volume: 11, Issue: 5, Oct. 2003 Pages:778 - 788

[4.12] Xiaodong Zhang; Roy, K.;: "Power reduction in test-per-scan BIST"
On-Line Testing Workshop, 2000. Proceedings. 6th IEEE International , 3-5
July 2000 Pages:133 - 138

[4.13] Rosinger, P.M.; Al-Hashimi, B.M.; Nicolici, N.;: "Scan architecture for shift and capture cycle power reduction" Defect and Fault Tolerance in VLSI Systems, 2002. DFT 2002. Proceedings. 17th IEEE International Symposium on , 6-8 Nov. 2002 Pages:129 - 137

[4.14] Shigematsu, S.; Mutoh, S.; Matsuya, Y.; Tanabe, Y.; Yamada, J.;: **"A 1-V** *high-speed MTCMOS circuit scheme for power-down application circuits*" Solid-State Circuits, IEEE Journal of , Volume: 32 , Issue: 6 , June 1997 Pages:861 – 869

[4.15] Zyuban, V.; Kosonocky, S.V.;: "Low power integrated scan-retention mechanism" Low Power Electronics and Design, 2002. ISLPED '02.
Proceedings of the 2002 International Symposium on , 12-14 Aug. 2002
Pages:98 - 102

[4.16] Louis L. Hsu, Fishkill; Wei Hwang, Armonk, both of NY (US); Stephen V.
Kosonocky, Darien, CT(US); Li-Kong Wang, Montvale, NJ(US);: "Data Retention Registers" United States Patent, Patent No. US6,437,623 B1. Aug. 20, 2002

## **Reference of Chapter 5**

[5.1] Shibata, N.; Watanabe, M.; Tanabe, Y.: "A current-sensed high-speed and low-power first-in-first-out memory using a wordline/bitline -swapped dual-port SRAM cell" Solid-State Circuits, IEEE Journal of, Volume: 37, Issue: 6, June 2002 Pages:735 - 750

[5.2] Yazawa, M.; Hosotani, S.; Imamura, Y.; Amishiro, H.; Okada, K.;: "A dynamic voltage sensing FIFO suitable for multi-format video systems"
Custom Integrated Circuits Conference, 1995., Proceedings of the IEEE 1995, 1-4 May 1995 Pages:159 – 162

[5.3] Kawasaki, H.; Long, S.I.;: "A low-power 128×1-bit GaAs FIFO for ATM packet switcher" Solid-State Circuits, IEEE Journal of , Volume: 31 , Issue: 10 , Oct. 1996 Pages:1547 - 1555

#### and the second

[5.4] Fenstermaker, L.R.; O'Conner, K.J.;: "A low-power generator-based **FIFO using ring pointers and current-mode sensing**" Solid-State Circuits Conference, 1993. Digest of Technical Papers. 40th ISSCC., 1993 IEEE International , 24-26 Feb. 1993 Pages:242 - 243, 295

[5.5] Gholipour, M.; Afzali-Kusha, A.; Nourani, M.; Khademzadeh, A.;: "An efficient asynchronous pipeline FIFO for low-power applications" Circuits and Systems, 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on , Volume: 2 , 4-7 Aug. 2002 Pages:II-481 - II-484 vol.2

[5.6] Brackenbury, M.L.L.;: "An instruction buffer for a low-power DSP"
Advanced Research in Asynchronous Circuits and Systems, 2000. (ASYNC 2000) Proceedings. Sixth International Symposium on , 2-6 April 2000
Pages:176 - 186

[5.7] Feldman, A.R.; Van Duzer, T.;: *"Hybrid Josephson-CMOS FIFO"* Applied Superconductivity, IEEE Transactions on , Volume: 5 , Issue: 2 , Jun 1995 Pages:2648 - 2651

[5.8] Wyland, D.;: "New features in synchronous FIFOs" WESCON/'93.Conference Record, , 28-30 Sept. 1993 Pages:580 - 585

[5.9] McDonnell, M.; Winters, K.;: "A dynamically allocated CMOS dual-LIFO register stack" Solid-State Circuits, IEEE Journal of, Volume: 25, Issue: 5, Oct. 1990 Pages:1287 - 1290

[5.10] Wang, H.; Liu, P.C.; Lau, K.T.;: *"Low power dual-port CMOS SRAM macro design"* Electronics Letters , Volume: 32 , Issue: 15 , 18 July 1996 Pages:1354 - 1356

[5.11] Blalock, T.N.; Jaeger, R.C.;: "A high-speed clamped bit-line current-mode sense amplifier" Solid-State Circuits, IEEE Journal of, Volume: 26, Issue: 4, April 1991 Pages:542 - 548

[5.12] Chrisanthopoulos, A.; Moisiadis, Y.; Tsiatouhas, Y.; Arapoyanni, A.;: *"Comparative study of different current mode sense amplifiers in submicron CMOS technology"* Circuits, Devices and Systems, IEE Proceedings [see also IEE Proceedings G- Circuits, Devices and Systems], Volume: 149, Issue: 3, June 2002 Pages:154 - 158

[5.13] Wang, H.; Liu, P.C.;: "Double-edge-triggered address pointer for low-power high-speed FIFO memories" Electronics Letters, Volume:
33, Issue: 5, 27 Feb. 1997 Pages:387 - 389

in man



# Vita

## **PERSONAL INFORMATION**

| Birthdate:      | January 25, 1979                      |
|-----------------|---------------------------------------|
| Birthplace:     | Taipei, Taiwan, R.O.C                 |
| Address:        | Department Of Electronics Engineering |
|                 | National Chiao-Tung University        |
|                 | 1001 Ta-Hsueh Road                    |
|                 | Hsinchu, Taiwan, 30050, R.O.C         |
| E-Mail Address: | bluemoody.ee86@nctu.edu.tw            |

## EDUCATIONS

| B.S. | [2002] | Department of Electronics Engineering, National          |
|------|--------|----------------------------------------------------------|
|      |        | Chiao-Tung University                                    |
| M.A. | [2004] | Institute of Electronics, National Chiao-Tung University |
|      |        | 1896 IV                                                  |