# 國立交通大學

### 電子工程學系 電子研究所 碩士班

### 碩士論文

給低功率元件式設計使用的 改良型叢聚式電壓調降法

Improved Clustered Voltage Scaling for Low Power Cell-Based Design

研究生:譚雙議 (Shwang-Yi Tan) 指導教授: 陳宏明 教授 (Prof. Hung-Ming Chen)

中華民國九十四年七月

#### 給低功率元件式設計使用的改良型叢聚式電壓調降法

#### Improved Clustered Voltage Scaling for Low Power

#### Cell-Based Design

| 研 | 究 | 生: | ;譚雙議 | Student : Shwang-Yi Tan |
|---|---|----|------|-------------------------|
|   |   |    |      |                         |

指導教授:陳宏明 教授 Advisor: Professor Hung-Ming Chen



Submitted to Department of Electrical Engineering & Institute of Electronics

College of Electrical Engineering and Computer Science

National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master in Electronics Engineering

July 2005 Hsinchu, Taiwan, Republic of China

中華民國九十四年七月

給低功率元件式設計使用的改良型叢聚式電壓調降法

#### 學生:譚雙議

#### 指導教授:陳宏明 教授

國立交通大學 電子工程學系 電子研究所 碩士論文

#### 摘 要

隨著半導體製程進步,元件尺寸縮小,積體電路可以操作在更高的頻率而達到更好的效能。但操作頻率越高意味著功率耗費的問題越大, 而且尺寸縮小還會造成漏電流增加。功率消耗的問題會增加設計電池 驅動類的產品之困難度,同時也會影響到一般類型產品的上市所需時 間、成本、和可靠度。

CVS(叢聚型電壓調降)是一種能有效降低積體電路功率的方法。 CVS 是利用電路裡的多餘的寬鬆時間而將其拿來換得功率消耗的減 低。以 CVS 為基礎之節省耗電的方法已經被研究了好幾年。我們在 這篇論文裡提出一種改良的 CVS 方法,雙側 CVS(BCVS),同時由實 驗所得的數據來研究為何我們的改進有效的原因。

#### 致 謝

能完成這篇論文,我最感謝的當然是我的指導教授陳宏明教授。 由於他兩年來為我們付出無比的耐心和關懷,循循善誘地指引我們進 入超大型積體電路實體層的研究領域,讓我們從矇懂生疏一路成長, 漸漸領悟到更多巧妙的實體層最佳化技術和觀念。陳教授是一位難得 的良師兼益友,在學識上和生活上都給我們完善的照顧和關懷,本人 非常榮幸能在陳教授門下學習做研究。

同時也感謝口試委員李毅郎教授、江蕙如教授還有黃俊銘博士的 指導和啟發,讓我在口試時發覺原來我在自己研究的領域瞭解得還不 夠透徹,思考還不夠周全。

我也要感謝我的室友和同學,感謝他們的熱心幫助,讓我順利地 完成論文。

最後,我很感謝家人的支持,當我在研究遇到挫折時給我關心和 問候,讓我有力氣重新再出發。

### Improved Clustered Voltage Scaling for Low Power Cell-Based Design

Prepared by Shwang-Yi Tan

Directed by Prof. Hung-Ming Chen

for the Degree of Master of Science

Department of Electronics Engineering

National Chiao Tung University

Hsinchu, Taiwan 300, R.O.C.

E-mail: sytan.ee92g@nctu.edu.tw

#### Abstract

As the semiconductor technologies make progress by scaling-down the feature size, integrated circuits can operate at higher frequencies and achieve higher performance. However, increasing operating frequencies means deteriorating power dissipation problems; moreover, scaling-down causes larger leakage current. Power consumption problems increase the design difficulty for battery powered applications, and also affect ordinary designs in terms of time to market, cost, and reliability.

1896

CVS (Clustered Voltage Scaling) is an effective way to reduce IC power consumption. CVS utilizes the excess time slacks inside circuits and trade them for power reduction. Methods based on CVS for saving power have been studied for years. We propose an improved CVS method, Bilateral CVS (BCVS). BCVS is a general Clustered Voltage Scaling method which subsumes both CVS and ECVS. In this thesis, we also discuss why our improvements work by experimental results.

### Contents

| 1        | Inti | roduction                                       | 1  |
|----------|------|-------------------------------------------------|----|
|          | 1.1  | Organization of This Thesis                     | 2  |
| <b>2</b> | Clu  | stered Voltage Scaling (CVS) and the Extensions | 3  |
|          | 2.1  | Cluster Voltage Scaling (CVS)                   | 3  |
|          | 2.2  | Extended Clustered-Voltage-Scaling (ECVS)       | 5  |
|          | 2.3  | Greedy-ECVS (GECVS)                             | 6  |
|          | 2.4  | Problem Formulation                             | 8  |
| 3        | Bila | ateral Clustered Voltage Scaling (BCVS)         | 9  |
|          | 3.1  | Motivation of BCVS                              | 9  |
|          | 3.2  | Wave Front Propagation                          | 10 |
|          | 3.3  | Priority Criterion                              | 10 |
|          | 3.4  | BCVS algorithm                                  | 11 |
| 4        | Exp  | perimental Results                              | 13 |
|          | 4.1  | Experimental Setup and Modeling                 | 13 |
|          | 4.2  | Results and Discussions                         | 14 |

5 Conclusions and Future Works



# List of Figures

| 2.1 | Output timing distribution of some design   |
|-----|---------------------------------------------|
| 2.2 | Cells with different supply voltage [12]    |
| 2.3 | Static Weakly-ON Leakage Current [12]       |
| 2.4 | Extended CVS (ECVS) [12]. $\ldots$ 6        |
| 2.5 | Labeling the level in a logic circuit [12]. |
| 3.1 | Bilateral wave fronts.                      |
| 3.2 | BCVS algorithm                              |
| 3.3 | BCVS flow diagram                           |

### List of Tables

| 4.1 | Descriptions of testi | ng circuits                                                                                                    | 14 |
|-----|-----------------------|----------------------------------------------------------------------------------------------------------------|----|
| 4.2 | Setup conditions .    |                                                                                                                | 15 |
| 4.3 | Results of setup 1    |                                                                                                                | 15 |
| 4.4 | Results of setup 2    | STATE OF STATE                                                                                                 | 16 |
| 4.5 | Results of setup 3    |                                                                                                                | 16 |
| 4.6 | Results of setup 4    | 1896                                                                                                           | 16 |
| 4.7 | Results of setup 5    | The second s | 17 |
| 4.8 | Results of setup 6    |                                                                                                                | 17 |

# Chapter 1 Introduction

Power dissipation is an important design parameter in the design of microelectronic circuits nowadays, especially in portable computing devices and personal communication applications. A design might be considered not valuable because it consumes too much power. For consumer electronics market, battery life is a very important market requirement that makes people's choice. Even though the advancement of battery technology progresses slowly, the requirement of built-in high performance device such as MPEG decoder still increases, which means people's thirst for low power seems insatiable. The low power skills become more significant than before.

Because of the requirement for reduction of power dissipation, we study the methodologies for low power design automation. In this thesis, we put our attention mainly to the cell-based design automation and optimization. Meanwhile, we are interested in work about power minimization. We have investigated some works which put research topic on Clustered Voltage Scaling (CVS), such as [10] [12] [8] [11] [3]. We also have studied some literatures about relevant technologies, such as Level Converters [5], Dual Threshold Voltage [2] [1] [7] [6], Voltage Islands [9], and surveys of low power methods [4].

### 1.1 Organization of This Thesis

In this thesis, we give an introduction to clustered voltage scaling (CVS) and relevant background in Chapter 2. In Chapter 3, we describe the major part of our research and propose bilateral clustered voltage scaling (BCVS) algorithm. We then demonstrate our experimental results to show the effectiveness of our approach in Chapter 4. We conclude the thesis in Chapter 5 and present possible future works.



### Chapter 2

### Clustered Voltage Scaling (CVS) and the Extensions

Voltage scaling is one of the most effective techniques in reducing the power consumption of CMOS circuits. However, decreasing VDD leads to increase in circuit delay. In the designs of most microprocessors or ASIC chips, the operating frequency is set by the design specification according to the target market. The timing constraints in chips are in turn set by the operating frequency. Designers need to optimize designs to reduce power consumption within the specified timing constraints. If the supply voltage is reduced while Vth remains constant, the critical-path delay will not meet the timing constraints. CVS is a technology which partially reduces the supply voltage. It utilizes the excess time slack within circuits and then trades the time slack for lower saving.

### 2.1 Cluster Voltage Scaling (CVS)

Clustered Voltage Scaling, firstly proposed by Usami et al. [10], is a simple and practical technique for low power design. The essence of such technology is based on the utilization of excess timing slack in synchronous circuits.

As shown in in Fig. 2.1, the output arrival time of a circuit usually distributes over a range. After lowering down the supply voltage for low power operation, the



Figure 2.1: Output timing distribution of some design.

output arrival time migrates to a slower range. If the required timing constraint lies on an interval, such as 5.5~6ns in the shown case, we might fail to create a voltage island [9] to save power for this circuit.

A possible way to extort power saving is to partially lower down the supply of the cells which have timing slacks, as shown in Fig. 2.2.



Figure 2.2: Cells with different supply voltage [12].



Figure 2.3: Static Weakly-ON Leakage Current [12].

Apparently, such optimization relies on the inner excess time slack inside circuit blocks. Since most circuits have a critical path and other non-critical paths, we usually have the opportunity to minimize power consumption by virtue of CVS.

Note that we can not make a gate supplied by  $VDD_L$  directly fan out to another gate which is supplied by  $VDD_H$ . As shown in Fig. 2.3, the sub-threshold current (even worse, a static turn on current) would nullify the efforts done to power saving. We need level converters to shift up signal voltage level so as to drive the succeeding logic gates. Unfortunately, such circuits are relatively large and power consumptive. They form the main overhead of clustered-type multiple-supply-voltage low-power designs when we try to drive  $VDD_H$  gates with  $VDD_L$  gates for possibly more power saving.

Usami et al. used a kind of specially designed flip-flop with built-in level conversion function (the LCFF) in their CVS technique [10]. To save the overhead induced by level converters, the original CVS paper proposed an algorithm that performs Depth-First-Search (DFS) from each output pins backward toward the input pins to achieve a converter-free solution.

#### 2.2 Extended Clustered-Voltage-Scaling (ECVS)

Usami et al. had proposed two ways to improve CVS in [12]. Firstly, they allow the insertion of level converter. As shown in Fig. 2.4, ECVS algorithm extends CVS



Figure 2.4: Extended CVS (ECVS) [12].

algorithm with a hill-climbing possibility. If the  $VDD_L$  assignment to cell G3 is feasible (considering the cost of level converter insertion, if necessary) and the total power consumption increment is within a margin, apply it. Secondly, they applied the concept of the stage level of gates, instead of original DFS operation, as the new way to decide the order of  $VDD_L$  assignments. As shown in Fig. 2.5, they labeled gates and sorted the labeled number as the priority of  $VDD_L$  assignment.

### 2.3 Greedy-ECVS (GECVS)

Srivastava et al. had proposed a way to further improve ECVS in [8]. In this paper, they put emphasis on the priority of  $VDD_L$  assignment. They introduced a concept of sensitivity measurement for further power minimization.

They defined the sensitivity of a gate 'x' as:

$$Sensitivity_x = \frac{\Delta Power \times sensitivity \ at \ gate \ output}{\Delta Delay}$$



 $\Delta Delay = Change in arrival time at gate output due to move$ 

They pointed out a concept that we can exploit the movements according to the best power savings per unit delay penalty. This is a good idea which directly targets at the primitive goal of CVS: trade the excess delay for power saving. Intuitively, this sensitivity measurement seems to give a perfect and non-improvable guideline. In the next Chapter, we provide a better approach to further lowering power consumption in cell-based design.

#### 2.4 Problem Formulation

In the following chapters, we formulate our problem as finding the best power saving without violating the timing requirements. The objective is to trade the excess time slacks for most power reduction. Please note that we set up the timing requirements by the Back-roll ratio. The Back-roll ratio means the percentage of increment of the critical path delay. For example, if the Back-roll ratio is 10%, that means the timing requirement is set to 1.1 times the critical path delay. The default value of Back-roll ratio is 0, that is, the timing requirement is equal to the critical path delay.



### Chapter 3

# Bilateral Clustered Voltage Scaling (BCVS)

Original CVS does not require any insertion of stand-alone level converters. Therefore, it is a more practical approach than ECVS, especially when the overheads of level converters are still high. As the research and improvement in level converters progress, the overheads of level converters are lowered. We can then utilize more excess slacks by ECVS if the circuit structure and the timing specification allow. Furthermore, GECVS gives a guideline of how to trade slacks for power in an efficient way. In this chapter, we want to show an improved approach to implement Clustered Voltage Scaling.

#### 3.1 Motivation of BCVS

The term "bilateral" means that we push our clusters both from the output side and input side. The motivation is that we want to try to push the clusters from both sides alternatively for more possibility to reach the optimal solution. Originally, we try to push both of the wave fronts just n-levels in each step. But the experimental data shows that if n is small, the resulting quality is deteriorated. Therefore, we let n be very large so that the optimality for each wave front is not sacrificed by the action of push of other one. We have done lots of experiments and observed that such process does little improvement for our benchmark circuits under test. But it true that this process has very good performance for our testing circuits. We ascribe this phenomenon to the difference between circuit structures. For the sake of the adaptability of our algorithm to different circuitry, we reserve this mechanism.

So, we start our optimization procedure firstly from the output side and grow the cluster as large as possible if slacks allow. During the wave front traversing on circuit, we mark the best movement sequence of power reduction. As it is finished, we push the other wave front from the input side in the same way". After one such iteration had been completed, we compare the results. If the solution is better than the previous optimal results, we re-apply the sequence of movement to the marked position and then go on the next iteration.

### 3.2 Wave Front Propagation

We utilize a wave front propagator as the engine of our optimizer. As shown in in Fig. 3.1, the wave front starts from the output pins, propagates to the fan-in cells if the timing slacks allow. We also implement a reverse wave front which behaves symmetrically to the ordinary wave front. It starts from the input pins, propagates to the fan-out cells, and automatically includes level converters if necessary.

.....

We have designed several testing circuits to test the ability of the wave front and make sure it can find the best solution.

### 3.3 Priority Criterion

We have tried three types of propagation priority. The first one is "single key:  $slack^* \frac{\Delta Power}{\Delta Delay}$ , with decreasing order", which stands for the GECVS algorithm. The second is "double key:  $(slack, \frac{\Delta Power}{\Delta Delay})$ , both with increasing order", which is



the one we proposed. The last one is "double key: (*slack*,  $fan_{in|out}number$ ), both with increasing order". The reason for choosing minimal fan number is that we want to do least perturbation to the slack distribution of the whole circuit after each  $VDD_L$  assignment".

### 3.4 BCVS algorithm

The BCV algorithm is shown in Fig. 3.2 .

The flow diagram of BCVS is shown in Fig. 3.3.

```
put all output nets into the wave front;
put all input nets into the reverse-wave front;
ini_wave_front = wave front;
ini_reverse_wave_front= reverse_wave_front;
MIN_POWER=total_power;
do{
    wave front= ini_wave_front; power=MIN_POWER;
    while(propagation is possible){
        propagate from output side to input side;
         if(update_total_power()< power){
            mark the movement in the moving sequence;
            power=total_power;
         3
    }
   undo all movements;
   reverse wave front= ini reverse wave front;
    reverse power=MIN POWER;
    while(reverse_propagation is possible){
        reverse_propagate from input side to output side;
        if(update_total_power ( )< reverse_power){
            mark the movement in the moving sequence;
            reverse power=total power;
   ]
    undo all reverse movements;
    if(power or reverse_power < MIN_POWER){
        update MIN_POWER;
        redo corresponding movement to marked position;
        update both wave fronts (and initial wave fronts );
    }
    else break;
}while(better solution is found);
```

Figure 3.2: BCVS algorithm.



Figure 3.3: BCVS flow diagram.

### Chapter 4

### **Experimental Results**

In this chapter, we show our experimental results and give explanation for the resulting data.

## 4.1 Experimental Setup and Modeling

We set our target to find the most power-saving solution on the condition that the maximal input to output arrival time between all the I/O pins remains the same. As the problem formulation in Section 2.4, the program automatically gives timing constraint according to the result of the initial Static Timing Analysis (STA). Then it starts to trade the excess timing slack inside the circuit for best power saving and make sure the timing constraint is still satisfied after each movement. For simplicity, we do not aim to the uphill climbing ability about the timing constraint but set our focus on the strategy to exploit all feasible movement without timing violation, and then mark the most power saving solution we have reached. If the uphill movement support is demanded, we can implement it with special care to the evaluation of timing requirements.

To simplify the timing analysis, we set up all gates with the same timing and power parameters. We omitt the information about rise/fall transition time at the I/O pin of each gate so as to focus on the slack/power relation to the wave

| TT 1 1 4 1  | $\mathbf{D}$ | c        | · · ·   | •    | • .   |
|-------------|--------------|----------|---------|------|-------|
|             | Descriptions | $\cap t$ | tosting | circ | 11110 |
| 1 abit 4.1. |              | ΟI       | ucoung  | UIIU | urus  |
|             | 1            |          | 0       |      |       |

| Circuit name | Circuit function               | of standard cells |
|--------------|--------------------------------|-------------------|
| cla          | 128-bit carry look-ahead adder | 1911              |
| csm          | 128-bit conditional sum adder  | 1701              |
| add_bk       | 128-bit BK adder               | 1942              |
| mult32       | 32-bit Booth multiplier        | 3418              |

propagation inside the circuitry. The reason is our primary goal was to exploit all the feasible movements without timing violation, and secondary mark the best sequence with most power saving.

We ignore the portion of power which depends on rise/fall transition time, therefore, the power consumption depends on supply voltage only. In this way, we can emphasize on the relationship between  $VDD_L$  assignment and resulting reduction on power consumption.

We also want to examine the sensitivity criterion proposed by GECVS. So we use UMC 0.18 standard cell library and set different leakage power to each type of cells according to this cell library.

Finally, we set the delay/power of level converters to be multiples of unit gate delay/power, respectively

#### 4.2 **Results and Discussions**

We use some real designs as our test bench. They are listed in Table 4.1. Then we test them under five types of setup conditions, which are listed in Table 4.2

We have observed that there are two groups of strange data set. First, in Table 4.6, the performance of GECVS seems to be too bad. The reason is that GECVS mixed up the information of timing slack with  $\frac{\Delta Power}{\Delta Delay}$  and the key of selection criterion. So it can not make the right decision that "the cells with larger timing slack

| Setup No. | $VDD_H$ | $VDD_L$ | $V_{TH}$ | Level converter delay cost | Level converter power cost | Back roll ratio( $\%$ ) |
|-----------|---------|---------|----------|----------------------------|----------------------------|-------------------------|
| 1         | 1.8     | 1.2     | 0.5      | 1.0                        | 1.0                        | 0                       |
| 2         | 1.8     | 1.2     | 0.5      | 4.0                        | 4.0                        | 0                       |
| 3         | 1.8     | 1.2     | 0.5      | 0.0                        | 0.0                        | 0                       |
| 4         | 1.8     | 0.9     | 0.4      | 1.0                        | 1.0                        | 0                       |
| 5         | 1.8     | 1.2     | 0.5      | 1.0                        | 1.0                        | 10                      |
| 6         | 1.8     | 1.2     | 0.5      | 1.0                        | 1.0                        | 20                      |

#### Table 4.2: Setup conditions



| VDD <sub>H</sub>     | $=1.8, VDD_L = 1.2, V_{TH} = 0.5, Lev$ | el converte | r delay/ | power cost: | 1.0/1.0, Back-rol                                   | l=0%        |
|----------------------|----------------------------------------|-------------|----------|-------------|-----------------------------------------------------|-------------|
| Circuit name         |                                        | original    | CVS      | GECVS       | $(\text{slack}, \frac{\Delta Power}{\Delta Delay})$ | (slack,fan) |
|                      | #cells                                 | 1911        |          | 6           |                                                     |             |
|                      | critical path delay(ns)                | 2.00        |          |             |                                                     |             |
|                      | #H cells                               | 1911        | 1210     | 1210        | 1210                                                | 1210        |
| cla                  | #L cells                               | 0           | 701      | 701         | 701                                                 | 701         |
|                      | #Level converters                      | 0           | 0        | 0           | 0                                                   | 0           |
|                      | avg. output arrival time (ns)          | 1.51        | 1.66     | 1.66        | 1.66                                                | 1.66        |
|                      | power (%)                              | 1.0         | 0.84     | 0.84        | 0.84                                                | 0.84        |
|                      | #cells                                 | 1701        |          |             |                                                     |             |
|                      | critical path delay(ns)                | 1.50        |          |             |                                                     |             |
|                      | #H cells                               | 1701        | 872      | 872         | 241                                                 | 231         |
| $\operatorname{csm}$ | #L cells                               | 0           | 829      | 829         | 1460                                                | 1470        |
| csm                  | #Level converters                      | 0           | 0        | 0           | 203                                                 | 196         |
|                      | avg. output arrival time (ns)          | 1.20        | 1.30     | 1.30        | 1.42                                                | 1.42        |
|                      | power (%)                              | 1.0         | 0.74     | 0.74        | 0.65                                                | 0.64        |
|                      | #cells                                 | 1942        |          |             |                                                     |             |
|                      | critical path delay(ns)                | 1.40        |          |             |                                                     |             |
|                      | #H cells                               | 1942        | 721      | 930         | 721                                                 | 721         |
| add_bk               | #L cells                               | 0           | 1221     | 1012        | 1221                                                | 1221        |
|                      | #Level converters                      | 0           | 0        | 0           | 0                                                   | 0           |
|                      | avg. output arrival time (ns)          | 1.09        | 1.27     | 1.24        | 1.27                                                | 1.27        |
|                      | power (%)                              | 1.0         | 0.71     | 0.73        | 0.71                                                | 0.71        |
|                      | #cells                                 | 3418        |          |             |                                                     |             |
|                      | critical path delay(ns)                | 3.10        |          |             |                                                     |             |
|                      | #H cells                               | 3418        | 3085     | 2829        | 1712                                                | 1588        |
| mult32               | #L cells                               | 0           | 333      | 589         | 1706                                                | 1830        |
|                      | #Level converters                      | 0           | 0        | 344         | 550                                                 | 499         |
|                      | avg. output arrival time (ns)          | 2.73        | 2.89     | 2.94        | 2.98                                                | 2.98        |
|                      | power (%)                              | 1.0         | 0.97     | 0.90        | 0.82                                                | 0.81        |

| $VDD_H=1.8, VDD_L=1.2, V_{TH}=0.5$ , Level converter delay/power cost: 4.0/4.0, Back-roll=0% |                               |          |        |              |                                                     |             |
|----------------------------------------------------------------------------------------------|-------------------------------|----------|--------|--------------|-----------------------------------------------------|-------------|
| Circuit name                                                                                 |                               | original | CVS    | GECVS        | $(\text{slack}, \frac{\Delta Power}{\Delta Delay})$ | (slack,fan) |
|                                                                                              | #cells                        | 1911     |        |              |                                                     |             |
|                                                                                              | critical path delay(ns)       | 2.00     |        |              |                                                     |             |
|                                                                                              | #H cells                      | 1911     | 1210   | 1210         | 1210                                                | 1210        |
| cla                                                                                          | #L cells                      | 0        | 701    | 701          | 701                                                 | 701         |
|                                                                                              | #Level converters             | 0        | 0      | 0            | 0                                                   | 0           |
|                                                                                              | avg. output arrival time (ns) | 1.51     | 1.66   | 1.66         | 1.66                                                | 1.66        |
|                                                                                              | power $(\%)$                  | 1.0      | 0.84   | 0.84         | 0.84                                                | 0.84        |
|                                                                                              | #cells                        | 1701     |        |              |                                                     |             |
|                                                                                              | critical path delay(ns)       | 1.50     |        |              |                                                     |             |
|                                                                                              | #H cells                      | 1701     | 872    | 872          | 872                                                 | 872         |
| csm                                                                                          | #L cells                      | 0        | 829    | 829          | 829                                                 | 829         |
|                                                                                              | #Level converters             | 0        | 0      | 0            | 0                                                   | 0           |
|                                                                                              | avg. output arrival time (ns) | 1.20     | 1.30   | 1.30         | 1.30                                                | 1.30        |
|                                                                                              | power (%)                     | 1.0      | 0.74   | 0.74         | 0.74                                                | 0.74        |
|                                                                                              | #cells                        | 1942     |        |              |                                                     |             |
|                                                                                              | critical path delay(ns)       | 1.40     |        |              |                                                     |             |
|                                                                                              | #H cells                      | 1942     | 721    | 930          | 721                                                 | 721         |
| add_bk                                                                                       | #L cells                      | 0        | 1221   | 1012         | 1221                                                | 1221        |
|                                                                                              | #Level converters             | -0       | 0      | 0            | 0                                                   | 0           |
|                                                                                              | avg. output arrival time (ns) | 1.09     | 1.27   | 1.24         | 1.27                                                | 1.27        |
|                                                                                              | power (%)                     | 1.0      | 0.71   | 0.73         | 0.71                                                | 0.71        |
|                                                                                              | #cells                        | 3418     | 181    | 12           |                                                     |             |
|                                                                                              | critical path delay(ns)       | 3.10     | 5      | 1 - 1        |                                                     |             |
|                                                                                              | #H cells                      | 3418     | 3085 - | <b>3</b> 085 | 3085                                                | 3085        |
| mult32                                                                                       | #L cells                      | 0        | 333    | 333          | 333                                                 | 333         |
|                                                                                              | #Level converters             | 0        | 0      | 0            | 0                                                   | 0           |
|                                                                                              | avg. output arrival time (ns) | 2.73     | 2.89   | 2.89         | 2.89                                                | 2.89        |
|                                                                                              | power (%)                     | 1.0      | 0.97   | 0.97         | 0.97                                                | 0.97        |
|                                                                                              |                               |          |        |              |                                                     | -           |

Table 4.4: Results of setup 2

"Annu"

Table 4.5: Results of setup 3

| $VDD_H=1.8, VDD_L=1.2, V_{TH}=0.5$ , Level converter delay/power cost: 0.0/0.0, Back-roll=0% |                               |          |      |       |                                                     |             |  |
|----------------------------------------------------------------------------------------------|-------------------------------|----------|------|-------|-----------------------------------------------------|-------------|--|
| Circuit name                                                                                 |                               | original | CVS  | GECVS | $(\text{slack}, \frac{\Delta Power}{\Delta Delay})$ | (slack,fan) |  |
|                                                                                              | #cells                        | 3418     |      |       |                                                     |             |  |
|                                                                                              | critical path delay(ns)       | 3.10     |      |       |                                                     |             |  |
|                                                                                              | #H cells                      | 3418     | 3085 | 2307  | 1410                                                | 1214        |  |
| mult32                                                                                       | #L cells                      | 0        | 333  | 1110  | 2008                                                | 2204        |  |
|                                                                                              | #Level converters             | 0        | 0    | 411   | 447                                                 | 463         |  |
|                                                                                              | avg. output arrival time (ns) | 2.73     | 2.89 | 2.92  | 2.97                                                | 2.96        |  |
|                                                                                              | power (%)                     | 1.0      | 0.97 | 0.72  | 0.63                                                | 0.65        |  |

Table 4.6: Results of setup 4

| $VDD_H=1.8, VDD_L=0.9, V_{TH}=0.4$ , Level converter delay/power cost: 1.0/1.0, Back-roll=0% |                               |          |      |       |                                                     |             |  |
|----------------------------------------------------------------------------------------------|-------------------------------|----------|------|-------|-----------------------------------------------------|-------------|--|
| Circuit name                                                                                 |                               | original | CVS  | GECVS | $(\text{slack}, \frac{\Delta Power}{\Delta Delay})$ | (slack,fan) |  |
|                                                                                              | #cells                        | 3418     |      |       |                                                     |             |  |
|                                                                                              | critical path delay(ns)       | 3.10     |      |       |                                                     |             |  |
|                                                                                              | #H cells                      | 3418     | 3231 | 3229  | 2096                                                | 2033        |  |
| mult32                                                                                       | #L cells                      | 0        | 187  | 189   | 1322                                                | 1385        |  |
|                                                                                              | #Level converters             | 0        | 0    | 119   | 568                                                 | 561         |  |
|                                                                                              | avg. output arrival time (ns) | 2.73     | 2.94 | 2.97  | 3.05                                                | 3.04        |  |
|                                                                                              | power (%)                     | 1.0      | 0.97 | 0.95  | 0.82                                                | 0.83        |  |

Table 4.7: Results of setup 5

| $VDD_H=1.8, VDD_L=1.2, V_{TH}=0.5$ , Level converter delay/power cost: 1.0/1.0, Back-roll=10% |                               |          |      |       |                                                     |             |  |
|-----------------------------------------------------------------------------------------------|-------------------------------|----------|------|-------|-----------------------------------------------------|-------------|--|
| Circuit name                                                                                  |                               | original | CVS  | GECVS | $(\text{slack}, \frac{\Delta Power}{\Delta Delay})$ | (slack,fan) |  |
|                                                                                               | #cells                        | 3418     |      |       |                                                     |             |  |
|                                                                                               | critical path delay(ns)       | 3.10     |      |       |                                                     |             |  |
|                                                                                               | #H cells                      | 3418     | 2314 | 1940  | 1115                                                | 1148        |  |
| mult32                                                                                        | #L cells                      | 0        | 1104 | 1478  | 2303                                                | 2270        |  |
|                                                                                               | #Level converters             | 0        | 0    | 542   | 350                                                 | 361         |  |
|                                                                                               | avg. output arrival time (ns) | 2.73     | 3.16 | 3.26  | 3.24                                                | 3.23        |  |
|                                                                                               | power (%)                     | 1.0      | 0.88 | 0.77  | 0.65                                                | 0.67        |  |

#### Table 4.8: Results of setup 6

| $VDD_{H}=1.8, VDD_{L}=1.2, V_{TH}=0.5$ , Level converter delay/power cost: 1.0/1.0, Back-roll=20% |                               |          |      |       |                                                     |             |
|---------------------------------------------------------------------------------------------------|-------------------------------|----------|------|-------|-----------------------------------------------------|-------------|
| Circuit name                                                                                      |                               | original | CVS  | GECVS | $(\text{slack}, \frac{\Delta Power}{\Delta Delay})$ | (slack,fan) |
| mult32                                                                                            | #cells                        | 3418     |      |       |                                                     |             |
|                                                                                                   | critical path delay(ns)       | 3.10     |      |       |                                                     |             |
|                                                                                                   | #H cells                      | 3418     | 1003 | 1577  | 695                                                 | 778         |
|                                                                                                   | #L cells                      | -0       | 2415 | 1841  | 2723                                                | 2640        |
|                                                                                                   | #Level converters             | 0        | 0    | 406   | 206                                                 | 181         |
|                                                                                                   | avg. output arrival time (ns) | 2.73     | 3.39 | 3.51  | 3.43                                                | 3.56        |
|                                                                                                   | power (%)                     | 1.0      | 0.61 | 0.63  | 0.54                                                | 0.55        |
|                                                                                                   |                               |          |      |       |                                                     |             |

should have higher priority to  $VDD_L$  assignment". Please note that we inhibited all the power except the leakage power, as well as set the transition time to be constant. So GECVS had detected a larger  $\frac{\Delta Power}{\Delta Delay}$  while the actual delay remains constant. This is the reason for the unexpected results. Second, in Table 4.5, the (slack,fan) set obtains much more number of  $VDD_L$  cells than the (slack,  $\frac{\Delta Power}{\Delta Delay}$ ) set. But the final power ratio seems to be inconsistent. The reason is the (slack,fan) criterion can not detect the difference in power saving between cells.

The criterion proposed by GECVS multiplies  $\frac{\Delta Power}{\Delta Delay}$  with slack, so that the information of slack is blurred. It can not determine whether the cell has a large slack or a large  $\frac{\Delta Power}{\Delta Delay}$ . As we know, the slacks carrie information about the circuit topology, so we can use it as an observer of topology/timing behavior of the circuit. However, our target is the most power saving rather than the largest slack utilization. That is why we need two keys, one observe the topology and timing, the other measure the location of most power saving.

In general, if a function  $\alpha$  can be a good measurement of the topology/timing information for the propagation algorithm, while our final target is to get the most change in function  $\beta$ , we should use  $\alpha$  as the primary key and  $d\beta/d\alpha$  as the secondary key. That is the reason that we propose slack as the first key and  $\frac{\Delta Power}{\Delta Delay}$  as the secondary key.



### Chapter 5

### **Conclusions and Future Works**

We have successfully improved Clustered Voltage Scaling technologies by assigning better priority/sensitivity. Through well-defined cost function, we have shown that our priority criterion performs better than the one defined in GECVS (the sensitivity).

The short circuit power contributes a large portion of total power consumption. However, to analyze this effect, we need more precise timing analysis to evaluate transition time and its sensitivity. Such a work requires much more efforts, especially if we want to merge it into our algorithms in an efficient way. So our first future work is to try to upgrade our optimizer so that we can perform STA with more practical precision.

Our secondary future work is to solve the power scheme problems in an efficient way. And hope that we can integrate CVS and power scheme optimization to obtain better solutions.

### Bibliography

- Anirban Basu, Sheng-Chih Lin, Vineet Wason, Amit Mehrotra, and Kaustav Banerjee. "Simultaneous Optimization of Supply and Threshold Voltages for Low-Power and High-Performance Circuits in the Leakage Dominant Era". In Proceedings IEEE/ACM Design Automation Conference, pages 884–887, 2004.
- [2] Chandrakasan, A.P., S. Sheng, Brodersen, and R.W. "Low power CMOS digital design". *IEEE Journal of Solid-State Circuits*, 27(4):473–484, April 1992.
- [3] Chunhong Chen, Ankur Srivastava, and Majid Sarrafzadeh. "On Gate Level Power Optimization Using Dual-Supply Voltages". *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 9(5):616–629, October 2001.
- [4] Srinivas Devadas and Sharad Malik. "A Survey of Optimization Techniques Targeting Low Power VLSI Circuits". In Proceedings IEEE/ACM Design Automation Conference, pages 242–247, 1995.
- [5] K. Joe Hass and David F. Cox. "Level Shifting Interfaces for Low Voltage Logic". In Proceedings NASA Symposium on VLSI Design, pages 3.1.1–3.1.7, 2000.
- [6] James T. Kao and Anantha P. Chandrakasan. "Dual-Threshold Voltage Techniques for Low-Power Digital Circuits". *IEEE Journal of Solid-State Circuits*, 35(7):1009–1018, July 2000.

- [7] Tanay Karnik, James Tschanz Yibin Ye, Liqiong Wei, Steven Burns, Venkatesh Govindarajulu, Vivek De, and Shekhar Borkar. "Total Power Optimization By Simultaneous Dual-Vt Allocation and Device Sizing in High Performance Microprocessor". In *Proceedings IEEE/ACM Design Automation Conference*, pages 486–491, 2002.
- [8] Sarvesh H. Kulkarni, Ashish N. Srivastava, and Dennis Sylvester. "A New Algorithm for Improved VDD Assignment in Low Power Dual VDD Systems". In Proceedings ACM International Symposium on Low Power Electronics and Design, pages 200–205, 2004.
- [9] David E. Lackey, Paul S. Zuchowski, Thomas R. Bednar, Scott W. Gould Douglas W. Stout, and John M. Cohn. "Managing Power and Performance for System-on-Chip Designs using Voltage Islands". In Proceedings IEEE/ACM International Conference on Computer-Aided Design, pages 195–202, 2002.
- [10] Kimiyoshi Usami and Mark Horowitz. "Clustered Voltage Scaling Technique for Low-Power Design". In Proceedings ACM International Symposium on Low Power Design, pages 3–8, 1995.
- [11] Kimiyoshi Usami, Mutsunori Igarashi, Takashi Ishikawa, Masahiro Kanazawa, Masafumi Takahashi, Mototsugu Hamada, Hideho Arakida, Toshihiro Terazawa, and Tadahiro Kuroda. "Design Methodology of Ultra Low-power MPEG4 Codec Core Exploiting Voltage Scaling Techniques". In Proceedings IEEE/ACM Design Automation Conference, pages 483–488, 1998.
- [12] Kimiyoshi Usami, Mutsunori Igarashi, Fumihiro Minami, Takashi Ishikawa, Masahiro Kanazawa, Makoto Ichida, and Kazutaka Nogami. "Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor". *IEEE Journal of Solid-State Circuits*, 33(3):463–472, March 1998.