# 國立交通大學

## 電子工程學系 電子研究所

### 碩士論文

基於增強和高效的離散餘弦變換的熱模型的構建及 其應用於熱感知擺放

Enhanced and Efficient DCT Based Thermal Model Construction and Its Application to Thermal-Aware Placement

> 研 究 生: 吳永證 指導教授:陳宏明 教授

中華民國一00年七月

# 基於增強和高效的離散餘弦變換的熱模型的構建及 其應用於熱感知擺放

# Enhanced and Efficient DCT Based Thermal Model Construction and Its Application to Thermal-Aware Placement

| 研 究 生:吳永證 | Student : Yong-Zheng Wu  |
|-----------|--------------------------|
| 指導教授:陳宏明  | Advisor : Hung-Ming Chen |



Submitted to Department of Electronics Engineering and Institute of Electronics College of Electrical and Computer Engineering National Chiao Tung University in partial Fulfillment of the Requirements for the Degree of Master of Science in

**Electronics Engineering** 

July 2011 Hsinchu, Taiwan, Republic of China

中華民國一()年七月

基於增強和高效的離散餘弦變換的熱模型的構建

#### 及其應用於熱感知擺放

學生: 吳永證 指導教授: 陳宏明 教授

國立交通大學 電子工程學系 電子研究所 碩士班

#### 摘 要

在這篇論文中,我們提出了一種快速、準確的熱感知解析式擺置器 (Thermal-Aware Analytical Placer)。熱模型(Thermal model)是以格林函數(Green Function)建構,並以增強的離散餘旋變換(Enhanced-DCT)來產生完整晶片的溫度 曲線(Full-Chip Temperature Profile)。與以往其他的熱感知擺置器相比,我們的熱 模型與力平衡擺置器緊密地結合。我們提出基於二維高斯模型(2D Gaussian model)的熱散播力(Thermal Spreading Force)以及動態熱區域控制,以降低最高晶 片溫度,能優化總半周長(HPWL)和晶片上的溫度分佈。

我們的熱模型已被最新的商業工具驗證,平均偏差在 6.5%內,並加速 240 倍。比起 Capo 與 APlace2,我們的擺置器可以加速 3-4 倍並達到同樣的結果。實 驗是以 ISPD2005 年的測資(Test Bench)測試,基準高達 200 萬邏輯閘的設計。其 結果更進一步以 GSRC 計算器評估總 HPWL,並用商用軟體 ICEPAK 做溫度分 佈。

### Enhanced and Efficient DCT Based Thermal Model Construction and Its Application to Thermal-Aware Placement



Hsinchu, Taiwan 300, R.O.C.

### 2011-07

(This page will be replaced by a pdf file which is transformed from the formal cover page edited with Microsoft Word.)

### Enhanced and Efficient DCT Based Thermal Model Construction and Its Application to Thermal-Aware Placement

Student: Yong-Zheng Wu

Advisor: Prof. Hung-Ming Chen

Department of Electronics Engineering Institute of Electronics National Chiao Tung University

### ABSTRACT (Chinese)

(This page will be replaced by a pdf file which is transformed from the Chinese abstract edited with Microsoft Word.)

### Enhanced and Efficient DCT Based Thermal Model Construction and Its Application to Thermal-Aware Placement

Student: Yong-Zheng Wu

Advisor: Prof. Hung-Ming Chen

Department of Electronics Engineering Institute of Electronics National Chiao Tung University

#### ABSTRACT

In this thesis, we proposed a fast and accurate thermal aware analytical placer. Thermal model is constructed based on Green function with enhanced DCT to generate full chip temperature profile. Unlike other previous thermal aware placers, our thermal model is tightly integrated with a flat force directed placement. A thermal spreading force based on 2D Gaussian model is proposed to reduce maximum on-chip temperature with dynamic hot region size control, optimizing between total half-perimeter wirelength (HPWL) and on-chip temperature distribution.

Our thermal model is verified by the most recent commercial tool and has an average deviation within 6.5% with 240x speed up. Our placer can reach the same quality compared to Capo and APlace2 with 3-4x speed up. Experiments are tested using ISPD 2005 benchmark with up to 2 million gate design. The results are further evaluated using GSRC Bookshelf Evaluator for total HPWL, and using ICEPAK for temperature distribution.

#### ACKNOWLEDGEMENTS

First and foremost, I would like to acknowledge the advice and guidance of my professor Hung-Ming Chen. I also thank my lab mates for all their help and take care of me, especially Ph.D Snieaulu for all his encouragement and suggestions. Special thanks to professor Yu-Min Lee and his Ph.D student Pei-Yu Huang without their knowledge and assistance this thesis would not have been successful.

The major experiment tools for this study was provided by National Chip Implementation Center (CIC), Xin-Zhu, Taiwan. I would like to thank their support and their co-operation, especially Mrs. Jing-Ru Chu, the research assistant of chip stack structure and thermal analysis, without her help and tutor this experimental results could not have been finished.

Finally, I thank to my parents for supporting me throughout all my studies and thank to the National Chiao Tung Unviversity (NCTU) for providing a very good and beautiful environment.

## Contents

| A        | bstra | act (Chinese)                                    | i |
|----------|-------|--------------------------------------------------|---|
| A        | bstra | ict                                              | i |
| A        | cknov | wledgements ii                                   | i |
| Co       | onter | nts ES A                                         | V |
| Li       | st of | Tables v                                         | i |
| Li       | st of | Figures vi                                       | i |
| 1        | Intr  | roduction                                        | L |
|          | 1.1   | Previous Work                                    | 1 |
|          |       | 1.1.1 Placement                                  | 1 |
|          |       | 1.1.2 Thermal Model                              | 2 |
|          |       | 1.1.3 Thermal Aware Placement                    | 3 |
|          | 1.2   | Our Contributions                                | 4 |
| <b>2</b> | The   | ermal Model                                      | 3 |
|          | 2.1   | Analytical Thermal Model Based on Green Function | 3 |

|   |     | 2.1.1 Homogeneous Problem                  | 8  |
|---|-----|--------------------------------------------|----|
|   |     | 2.1.2 Inhomogeneous Problem                | 10 |
|   | 2.2 | Enhancement of DCT                         | 13 |
|   |     | 2.2.1 Even Extension of DFT for DCT        | 15 |
|   |     | 2.2.2 Reordering Input Data                | 16 |
| 3 | App | olication to Thermal Aware Placement       | 18 |
|   | 3.1 | Placement Algorithm                        | 19 |
|   | 3.2 | Considering Thermal Effect in Placement    | 20 |
|   |     | 3.2.1 Obtaining Thermal Anchor             | 21 |
|   |     | 3.2.2 Determine Magnitude of Thermal Force | 23 |
| 4 | Exp | perimental Results                         | 25 |
|   | 4.1 | Thermal Model                              | 25 |
|   | 4.2 | Placement                                  | 26 |
|   | 4.3 | Thermal Aware Placement                    | 27 |
| 5 | Con | nclusions                                  | 31 |

# List of Tables

| 1.1 | Previous work on thermal aware placement         |
|-----|--------------------------------------------------|
| 4.1 | Temperature profile comparison with Icepak       |
| 4.2 | Placement results in ISPD 2005 benchmarks        |
| 4.3 | Various approaches to reduce maximum temperature |

# List of Figures

| 2.1 | Summary overall thermal analysis flow chart                                | 6  |
|-----|----------------------------------------------------------------------------|----|
| 2.2 | Schematic of cross-sectional view of a VLSI chip with packaging $\ . \ .$  | 7  |
| 2.3 | Illustration of the simplified chip model                                  | 8  |
| 2.4 | The illustration of power density on meshing top surface of chip $\ . \ .$ | 11 |
| 2.5 | The illustration of butterfly algorithm for DFT                            | 14 |
| 2.6 | (a) The original input sequence. (b) The even extension of original input  | 15 |
| 2.7 | The sequence of input patterns after reordering. (a) The original          |    |
|     | input sequence. (b) The sequence of $v_n$ and $w_n$                        | 16 |
| 3.1 | Placement of cells to demonstrate interleaving mechanism between           |    |
|     | legalization and CG for adaptec1.                                          | 18 |
| 3.2 | Illustration of the legalization process.                                  | 20 |
| 3.3 | Flow Diagram of our thermal aware placement.                               | 21 |
| 3.4 | Forces exerted on a cell by thermal anchor and legalizing anchor.          |    |
|     | Thermal anchor is placed on a circle and red rectangle is the identified   |    |
|     | hot region.                                                                | 22 |
| 3.5 | An illustration of 2-D Gaussian force spread model with center posi-       |    |
|     | tioned at center of hot region.                                            | 24 |

- 4.2 Temperature Distribution before and after thermal aware placement. 30



# Chapter 1 Introduction

The demand to alleviate temperature variation and increase chip reliability increases as technology continue to advance. Recent studies have shown that chip performance and reliability can deteriorate to a certain magnitude due to temperature variation. Evidence have shown that a 10°C variation in temperature can induce 5% increase in interconnect delay, 25% increase in crosstalk-induced noise and reduce component life time by 50% [7][6].

The situation for on-chip temperature and variation will only increase as technology advances. As node size scales down, components will be placed more compact which entails higher current density per unit area. Shrinking in interconnect dimension will increase interconnect resistance which results in higher power and heat dissipation. Viewed in this light, the effect of chip temperature and variation must be addressed to when considering chip performance and reliability.

### 1.1 Previous Work

#### 1.1.1 Placement

Regarding to current state-of-the-art, placement can be categorized to three categories: partition based, linear force-directed and non-linear force directed. Partitionbased placer such as Capo[21] partition the given placement using Fidducia-Mattheyes in a top-down fashion. Although it does not require any model to approximate total HPWL, it trails behind force-directed and non-linear force directed placer in both quality and run time.

Non-linear forced directed placer such as NTUPlace3[9], mPL6[5] and APlace2[15] uses log-sum-exp model to approximate total HPWL. The log-sum-exp model is a very accurate model to approximate HPWL. However, log-sum-exp model is relatively complex which requires longer run time and is patented which requires license to use. To accelerate run time, non-linear force directed placer generally coarsen the given placement by clustering cells to reduce problem size.

Linear force directed placer such as RQL[24], simPL[16], FastPlace3[25] and KraftWerk2[22] use bound-to-bound, star-net or clique model to approximate total HPWL. The simplicity of linear force directed model implies faster run time but is less accurate compare to log-sum-exp model. However, recent development in [24] and [16] have outperformed all non-linear force directed placer in total HPWL by a notch and is approximately 3-5 times faster mmm

#### 1.1.2Thermal Model

Regarding to thermal model, most recent approaches can be categorized to numerical and analytical approach. Numerical approach generally uses finite-difference method(FDM) [12] or finite-element method(FEM) [23] by meshing given silicon substrate and then solve a set of linear equation to obtain temperature profile. FDM discretizes the partial differential equation of heat conduction and uses forwarddifference method to approximate temperature profile. FEM discretizes the given temperature for each grid point in design space and then points elsewhere is calculated using interpolation. Generally, numerical approach can achieve high accuracy at the expense of relatively long run time making it suitable for post-layout thermal verification.

Analytical approach solves the differential equation of heat problem by first generating the fundamental solution of one unit heat source and then exploit the linearity of the heat equation to obtain the general solution of overall temperature distribution [27][26][13]. In contrast to numerical approach, analytical approach can quickly obtain an approximate solution in closed form representation without an amount volume of meshing making it suitable for where approximate solution is adequate.

After two classified approaches, there is two states of analysis to be concerned, transient-state and steady-state. The transient-state is concerned about temperature in time-varying but the steady-state is interested in the stabilized of temperature in long term state. In this paper, we proposed to consider the steady-state of temperature.

### 1.1.3 Thermal Aware Placement

To implement a thermal aware placement, two main components are usually required: a thermal model to conduct full chip thermal analysis and a placement mechanism to consider the optimization between thermal effect and wirelength.

Regarding to previous works on thermal aware placement, Table 1.1 summarizes previous work on thermal aware placer based on their thermal model and placement algorithm. Tsai et al.[23] constructs lumped RC matrix to model substrate heat conduction and obtain thermal profile using FDM, then simulated annealing is applied evenly spread out hot spot. Kahng et al. [14] adopts its thermal model from [23] and integrate the model to its previously published placer, APlace [15]. Chen et al.[8] simplifies the model in [23] and applies partition based placement to consider thermal effect. Goplen et al.[11] uses FEM method to conduct full chip temperature analysis and uses linear force directed placer based on star net model to mediate thermal effect. Jing et al.[17] constructs RC equivalent matrix to model heat transfer and obtain thermal profile using FDM, Fiduccia-Mattheyses partition is then applied as their placement strategy. Bernd et al.[19] proposes a methodology to consider the impact of dynamic power density by adding additional temperature cost obtained from FDM into the iterative annealing optimization.

| Thermal Model Placement   | FEM  | FDM      |
|---------------------------|------|----------|
| Simulated Annealing       |      | [23]     |
| Partition-Based           | [11] | [8],[17] |
| Linear Force Directed     |      | [19]     |
| Non-Linear Force Directed |      | [14]     |

Table 1.1: Previous work on thermal aware placement

# 1.2 Our Contributions

In spite of past effort, thermal aware placement today still suffers deficiency in terms of (i). accuracy of full-chip analysis, (ii). quality of placement and (iii). fast execution time. The accuracy of some thermal models are unknown since it lacks evaluation with accurate model. The quality of algorithms for placement is deficient either because it greedily focuses on thermal distribution or its placement methodology is relatively naive. Some methods are impractical to deal with million gate design due to execution speed when using simulated annealing or solving complex thermal model requires long execution time. In addition, none of our surveyed thermal-aware placers has tested on million gate design.

In this thesis, we presented a thermal aware placement to reduce maximum temperature using analytical thermal model combined with linear force-directed placer. Both of our thermal model and placer are analytical making it inherently fast. The run time bottleneck of Green function based thermal analysis is in its postprocess DCT calculation and we reduced the complexity to O(NlogN) by applying even extension and input reordering algorithm. Our thermal model is verified with commercial tool ANSYS Icepak and demonstrated that the deviation of our thermal model is within 6.5% and is 242x times faster.

The thermal model is integrated with a flat (no clustering is required) global placement based on linear-force model which is 3-4x faster compare to non-linear and partitioned based placer with reasonable placement quality. We experimented our thermal-aware placement on a set of open source benchmarks released by IBM and evaluated our the thermal profile using Icepak[1].

We proposed the concept of thermal anchor combined with dynamic resizing scanning of hot region to optimize the trade-off between maximum temperature and total HPWL. The experiments have performed on three kinds of weight adjustment which are linear force model, quadratic force model and quadratic Gaussian force model to demonstrate the effectiveness of our thermal aware placement.

The following chapters are organized as follows: Chapter 2 describes the analytical thermal model and Chapter 3 depicts the application of thermal for placement. Chapter 4 presents the experimental results and finally Chapter 5 concludes the thesis.

### Chapter 2

### Thermal Model

## 2.1 Analytical Thermal Model Based on Green Function

In early stage of physical design, the primary concern of thermal analysis focuses more on its speed rather than accuracy. Therefore, we adopted our thermal model using Green function to quickly perform thermal analysis. The concept of the Green function is to first derive the fundamental solution then use linear superposition to obtain the general solution for overall chip temperature distribution.



(a) Thermal analysis flow

Figure 2.1: Summary overall thermal analysis flow chart

As shown in Figure 2.1, the general structure of this chapter is organized as

follows, first the fundamental bases solution which satisfies boundary conditions is generated from homogeneous equation. Then, the general solution can be obtained by applying generated bases solution to Green function to express power density and temperature distribution. Finally, the solution to the heat conduction based on many power sources is in a form of DCT and IDCT. Thus, the input reordering is adopted to enhance DCT and IDCT execution time. As shown in Figure 2.1,

Given the law of conservation of energy, it states that the total amount of energy in a system remains constant over time. That means the changing rate of heat energy equals to the summation of conduction heat flowing into the unit volume and its pre-existing power density. Hence, the temperature distribution inside the chip is defined as

$$\sigma \frac{\partial T(r,t)}{\partial t} = \nabla \cdot (\kappa \nabla T(r,t)) + p(r,t) \quad r \in D$$
(2.1)

which is also known as heat diffusion equation. T denotes temperature in Celsius (°C), the domain  $r = (x, y, z) \kappa$  denotes thermal conductivity  $(W/m^{\circ}C)$  and  $\sigma = c_p \rho$  has unit of  $(J/m^{3\circ}C)$ .  $c_p$  is the specific heat,  $\rho$  is the heat capacity, p(r, t) is the power density of heat source  $(W/m^3)$  and  $D = (0, L_x) * (0, L_y) * (-L_z, 0)$  are the dimensions of die.



Figure 2.2: Schematic of cross-sectional view of a VLSI chip with packaging

Figure 2.2 depicts a flip-chip model in which top and bottom surfaces of die are contacted by printed circuit board (PCB) and heat sink.  $h_p$  denotes primary heat flow to the heat sink and  $h_s$  denotes secondary heat flow to the PCB. The model is assumed that four chip sidewalls are insulated from the ambient environment illustrated in Figure 2.3. The boundary conditions of such assumption are derived as

$$\frac{\partial T(r,t)}{\partial x}\Big|_{x=0,L_x} = \frac{\partial T(r,t)}{\partial y}\Big|_{y=0,L_y} = 0$$
(2.2)

$$\kappa \frac{\partial T(r,t)}{\partial z} \mid_{z=-L_z} = h_p T(x, y, -L_z, t)$$
(2.3)

$$\kappa \frac{\partial T(r,t)}{\partial z} \mid_{z=0} = -h_s T(x,y,0,t)$$
(2.4)

Since steady-state temperature analysis is the primary concern in this thesis, temperature and power density are assumed to be stable. The temperature distribution equation (2.1) can be written as a form of inhomogeneous equation.



To solve the inhomogeneous equation and satisfy the boundary conditions, the solution bases of homogeneous equation needs to be solved to satisfy all boundary conditions. Finally, the general solution of temperature distribution in R domain can then be derived by substituting the bases solution from homogeneous equation into inhomogeneous equation (2.5).

#### 2.1.1 Homogeneous Problem

In this subsection, the objective is to find the solution which satisfies the homogeneous equation and inhomogeneous conditions in top and bottom boundary. The homogeneous equation is first defined by setting the external force to zero which correspond to the power density in (2.1). That is,

$$\sigma \frac{\partial T(r,t)}{\partial t} = \nabla \cdot (\kappa \nabla T(r,t))$$
(2.6)

Given that  $\Theta(r, t)$  is the solution to homogeneous equation, it is then substituted into homogeneous equation (2.6).

$$\frac{\partial \Theta(r,t)}{\partial t} = \gamma \frac{\partial^2 \Theta}{\partial r^2} \tag{2.7}$$

which  $\gamma$  is a constant equal to  $\kappa/\sigma$ 

 $\Theta(r,t)$  is a function of space domain R and time domain t in which space domain and time domain are independent to each other. Hence,  $\Theta(r,t)$  can be separated to two functions of variable r and t respectively.

$$\Theta(r,t) = \tau(t)\upsilon(r) \tag{2.8}$$

(2.8) substitutes to (2.7) then gather time-dependent terms on one side and space-dependent terms on the other side.

$$\frac{\tau' - \gamma \underline{v''}}{1896}$$
(2.9)

The left hand side is a function of time t and the right hand side is a function of space R. Thus, both sides of (2.9) must be equal to the same constant. By denoting the constant as  $-\alpha$  and let  $\alpha = \gamma \omega^2$  with  $\omega > 0$ .

$$\tau' + \alpha \tau = 0 \tag{2.10}$$

$$\upsilon'' + \omega^2 \upsilon = 0 \tag{2.11}$$

The fundamental solution of  $\tau(t)$  and v(r) that satisfies the homogeneous boundary conditions can then be obtained by using complex variable method.

$$\tau(t) = e^{-\alpha t} \tag{2.12}$$

$$v(r) = \cos(\omega_x x)\cos(\omega_y y)v(z) \tag{2.13}$$

For v(z), a combination of trigonometric function is applied to satisfy the inhomogeneous of top and bottom conditions. That is,

$$\upsilon(z) = \kappa \cos(\omega_z(z+L_z)) + \frac{h_p}{\omega_z} \sin(\omega_z(z+L_z))$$
(2.14)

where  $\omega_z$  satisfies

$$\frac{\kappa^2 \omega_z^2 - h_p h_s}{\kappa \omega_z (h_p + h_s)} = \cot(\omega_z L_z)$$
(2.15)

Since cosine is periodic,  $\omega L$  must be an integer multiple of  $\pi$ . The general solution of homogeneous heat equation at infinite time is expressed as the summation of numerous solution bases v(r).

$$\Theta(x, y, z, \infty) = \sum_{i=0}^{\infty} \sum_{j=0}^{\infty} \cos(\frac{i\pi}{L_x}x)\cos(\frac{j\pi}{L_y}y)\upsilon(z)$$
(2.16)

### 2.1.2 Inhomogeneous Problem

To solve the inhomogeneous equation subject to specific initial and boundary 1896 conditions, Green function denoted by G(r, r') is applied. Let G be the temperature distribution of a unit point power source denoted by  $\delta(r, r')$ . The complete equation of G is

$$\nabla^2 G(r, r') = -\frac{\delta(r, r')}{\kappa} \tag{2.17}$$

and the boundary conditions is

$$\frac{\partial G(r,r')}{\partial x}|_{x=0,a} = \frac{\partial G(r,r')}{\partial y}|_{y=0,b}$$
(2.18)

$$\kappa \frac{\partial G(r,r')}{\partial z}|_{z=-L_z} = h_p G(x,y,-L_z)$$
(2.19)

$$\kappa \frac{\partial G(r,r')}{\partial z}|_{z=0} = -h_s G(x,y,0) \tag{2.20}$$

where

$$\delta(r, r') = \begin{cases} 1 & r = r' \\ 0 & r \neq r' \end{cases}$$
(2.21)

The bases solution (2.16) obtained by solving homogeneous equation is a set of orthogonal bases which satisfies the characteristics of Green function with  $N_x, N_y$ and  $N_z$  truncation point. In addition, the integration of two orthogonal bases is equal to  $\delta$ . Therefore, (2.16) can be substituted to G(r, r').

$$G(r,r') = \frac{1}{\kappa\omega^2} \sum_{i=0}^{N_x-1} \sum_{j=0}^{N_y-1} \sum_{k=0}^{N_z-1} \upsilon_{ij}(x',y')\upsilon_k(z')$$
(2.22)

The general solution of T(r) is obtained by using superposition of integrals of power density distribution P(r) with G(r, r').

$$T(r,\infty) = \int_D G(r,r')P(r')dr'$$
(2.23)

Since power density is a function only of x and y direction, it may be also written in the form of superposition of orthogonal bases of x and y direction.

$$P(r) \approx P_{ij} = \int_{0}^{L_x} \int_{0}^{L_y} p(x', y') v_{ij}(x', y') dx' dy' \qquad (2.24)$$

Figure 2.4: The illustration of power density on meshing top surface of chip

In Figure 2.4, the overall chip power distribution is approximated by dividing chip into M \* N grids with power density  $p_{mn}$  in each grid. The power density in each bin is the summation of power density of contained cells. For cells that are placed on the boundary between two bins, power density of the cell is divided based on cell area covered to each bin. Following equation describes the approximated power density  $P_{ij}$  for each bin.

$$P_{ij} = \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} p_{mn} \int_{mL_x/M}^{(m+1)\frac{L_x}{M}} \int_{nL_y/N}^{(n+1)\frac{L_y}{N}} v_{ij}(x',y') dx' dy'$$
  
$$= \frac{4L_x L_y}{ij\pi^2} sin(\frac{i\pi}{2L_x}) sin(\frac{j\pi}{2L_y})$$
  
$$* \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} p_{mn} cos(\frac{i\pi(2m+1)}{2M}) cos(\frac{j\pi(2n+1)}{2N})$$
  
(2.25)

The second term on the right hand side of (2.25) is a form of type-I DCT, thus, the formula can be simplified as

$$P_{ij} = C_{ij} * \hat{P}_{ij} \tag{2.26}$$

where

$$C_{ij} = \frac{4L_x L_y}{ij\pi^2} sin(\frac{i\pi}{2L_x}) sin(\frac{j\pi}{2L_y})$$
(2.27)

$$\hat{P}_{ij} = DCT[p_{mn}] \, \emptyset \tag{2.28}$$

Similar to power density distribution, temperature distribution can also be approximated by dividing into numerous small bins. To describe temperature distribution  $T_{mn}$  in each grid, (2.23) is substituted by (2.25).

$$T(r,\infty) \approx T(m,n)$$
  
=  $\int_{0}^{N_{r}} \frac{1}{\kappa\omega^{2}} \sum_{i=0}^{N_{x}-1} \sum_{j=0}^{N_{y}-1} \sum_{k=0}^{N_{z}-1} \upsilon_{ij}(x',y')\upsilon_{k}(z')P_{ij}dr'$  (2.29)

Due to the bases in z direction described in (2.29) is independent to x and y directions of  $P_{ij}$  and  $v_{ij}$ , the temperature distribution equation (2.29) can then be

derived as

$$T(m,n) = K_z \sum_{i=0}^{N_x-1} \sum_{j=0}^{N_y-1} C_{ij} \hat{P}_{ij} \int_0^{N_x} \int_0^{N_y} v_{ij}(x',y') dx' dy'$$
  
$$= K_z \sum_{i=0}^{N_x-1} \sum_{j=0}^{N_y-1} C_{ij} D_{ij} \hat{P}_{ij} \cos\left(\frac{i\pi(2m+1)}{2M}\right) \cos\left(\frac{j\pi(2n+1)}{2N}\right)$$
  
$$= K_z \sum_{i=0}^{N_x-1} \sum_{j=0}^{N_y-1} E_{ij} \hat{P}_{ij} \cos\left(\frac{i\pi(2m+1)}{2M}\right) \cos\left(\frac{j\pi(2n+1)}{2N}\right)$$
  
$$= K_z \cdot IDCT[E_{ij} \hat{P}_{ij}]$$
  
(2.30)

where

$$K_{z} = \frac{1}{\kappa\omega^{2}} \int_{0}^{N_{z}} \upsilon_{k}(z') dz'$$
(2.31)

$$E_{ij} = C_{ij} * D_{ij} \tag{2.32}$$

$$D_{ij} = \frac{4N_x N_y}{ij\pi^2} \sin(\frac{i\pi}{2N_x}) \sin(\frac{j\pi}{2N_y})$$
(2.33)

When i = 0 and j = 0, (2.32) cannot be evaluated. l'Hopital's rule is then applied to approximate the value of  $E_{ij}$  when  $i \approx 0$  and  $j \approx 0$ .  $E_{ij}$  can then be written as

$$E_{ij} = \begin{cases} \frac{\Delta x \Delta y}{\frac{4NL_y \Delta x \sin^2(\frac{j\pi}{2N})}{j^2 \pi^2}} & i = 0, j = 0\\ \frac{4ML_x \Delta y \sin^2(\frac{i\pi}{2M})}{i^2 \pi^2} & i = 0, j \neq 0\\ \frac{4ML_x \Delta y \sin^2(\frac{i\pi}{2M})}{i^2 \pi^2} & i \neq 0, j = 0\\ \frac{16MNL_x L_y \sin^2(\frac{i\pi}{2M}) \sin^2(\frac{j\pi}{2N})}{i^2 j^2 \pi^4} & i \neq 0, j \neq 0 \end{cases}$$
(2.34)

(2.30) is also a form of type-I inverse DCT (IDCT) of  $E_{ij} * \hat{P}_{ij}$ . Solving DCT and IDCT directly requires complexity of  $O(N^4)$ . However, O(NlogN) can be achieved by applying input reordering. The runtime can be further enhanced by calculating  $K_z$  and  $E_{ij}$  beforehand since both terms are independent of the geometry of power density distribution.

### 2.2 Enhancement of DCT

DFT can achieve complexity of O(NlogN) by applying butterfly algorithm, it can also be applied in DCT by operating on real data with even symmetry. Figure 2.5 is a schematic of butterfly algorithm used in DFT.



Figure 2.5: The illustration of butterfly algorithm for DFT

Given N real number of data input x(n),  $0 \le n \le N - 1$ . DCT and DFT transform can described as follows in which  $C_k$  is denoted as DCT and  $F_k$  is denoted as DFT.

$$C_k = DCT[x_n] = \sum_{n=0}^{N-1} x_n \cos\frac{\pi(2n+1)k}{18 \cdot 2N}, 0 \le k \le N-1$$
(2.35)

$$F_k = DFT[x_n] = \sum_{n=0}^{N-1} x_n \cdot e^{-j\frac{2\pi nk}{N}}, \quad 0 \le k \le N-1$$
(2.36)

Note that,  $e^{jn}$  is a Euler's formula in which j is an imaginary number.

$$e^{jn} = \cos(n) + j\sin(n) \tag{2.37}$$

By defining  $W_N = e^{-j2\pi/N}$  as twiddle factor, DFT can be expressed as

$$F_k = \sum_{n=0}^{N-1} x_n W_N^{nk}$$
(2.38)

In (2.35) and (2.36), 4N inputs are required in DFT to generate N outputs for DCT with complexity of O(4Nlog4N). Even extension method[4] can be implemented to reduce complexity to O(2Nlog2N). Further improvement to reduce complexity to O(NlogN) can be achieved by reordering input patterns [18].

### 2.2.1 Even Extension of DFT for DCT

The concept of even extension is to make the mirror input of DFT to eliminate the imaginary number generated by DFT itself.



Figure 2.6: (a) The original input sequence. (b) The even extension of original input.

 $y_n$  is a series of 2N points that is the even extension of  $x_n$ 

$$y_n = \begin{cases} x_n & 0 \le n \le N - 1\\ x_{2N-n-1} & N \le n \le 2N - 1 \end{cases}$$
(2.39)

By substituting (2.39) into (2.38), 2N output points  $F_k$  from DFT can be obtained.

$$F_k = \sum_{n=0}^{N-1} x_n W_{2N}^{nk} + \sum_{n=N}^{2N-1} x_{2N-n-1} W_{2N}^{nk}$$
(2.40)

where  $0 \le k \le 2N - 1$ 

By changing the summation variable in the second term of the right hand side and factoring out the term  $W_{2N}^{-k/2}$ , equation can be derived as 2.41. Noting that  $W_{2N}^{2kN} = 1$  for integer k and  $W_{2N}^{2nk} = W_N^{nk}$ .

$$F_k = W_{2N}^{-k/2} \sum_{n=0}^{N-1} x_n [W_{2N}^{nk} W_{2N}^{k/2} + W_{2N}^{-nk} W_{2N}^{-k/2}]$$
(2.41)

which is,

$$F_k = W_{2N}^{-k/2} \cdot 2Re[W_{2N}^{k/2} \sum_{n=0}^{N-1} x_n W_{2N}^{nk}]$$
(2.42)

or

$$F_k = W_{2N}^{-k/2} \cdot 2\sum_{n=0}^{N-1} x_n \cos\frac{\pi(2n+1)k}{2N}$$
(2.43)

By substituting C(k) in (2.35) to (2.43), the relation between DFT and DCT can be found as

$$C_k = \frac{1}{2} W_{2N}^{k/2} F_k \quad , \quad 0 \le k \le 2N - 1 \tag{2.44}$$

#### 2.2.2 Reordering Input Data

Regarding to (2.42), applying even extension to input data still have complexity of O(2Nlog2N). Further improvement can be achieved by applying input reordering which will be explained in this subsection. The concept of input reordering is to avoid extending number of input data to 2N points. Thus, instead of mirroring real number, elimination of the imaginary number can also be achieved by reordering input.



Figure 2.7: The sequence of input patterns after reordering. (a) The original input sequence. (b) The sequence of  $v_n$  and  $w_n$ .

The reordered sequence  $v_n$  and  $w_n$  can be obtained by retrieving even and odd element from  $y_n$  which also corresponds to the original input sequence  $x_n$ .

$$v_n = \begin{cases} x_{2n} & 0 \le n \le \frac{N-1}{2} \\ x_{2N-2n-1} & \frac{N+1}{2} \le n \le N-1 \end{cases}$$
(2.45)

where  $w_n$  is a reverse of  $v_n$ . v(n) and w(n) are then substituted into DFT

$$F_k = \sum_{n=0}^{N-1} v_n W_{2N}^{2nk} + \sum_{n=0}^{N-1} w_n W_{2N}^{(2n+1)k}$$
(2.46)

$$F_k = \sum_{n=0}^{N-1} v_n W_{2N}^{2nk} + \sum_{n=0}^{N-1} v_{N-n-1} W_{2N}^{2nk} W_{2N}^k$$
(2.47)

With few manipulations of summation variable and utilizing the relation of  $C_k$  and  $F_k$ , equation (2.49) can then be obtained.

$$C_k = W_{2N}^{k/2} \left[\sum_{n=0}^{N-1} v_n W_N^{nk} + \sum_{n=0}^{N-1} v_n W_N^{-nk} W_{2N}^{-k}\right]$$
(2.48)

which is,

$$C_k = \sum_{n=0}^{N-1} v_n W_N^{nk} W_{4N}^k + \sum_{n=0}^{N-1} v_n W_N^{-nk} W_{4N}^{-k}$$
(2.49)

Then, (2.49) also be written as

$$C_k = 2Re[W_{4N}^k \sum_{n=0}^{N-1} v_n W_N^{nk}]$$
(2.50)

where  $0 \le k \le N - 1$ 

DCT can then be solved by re-ordering  $C_k$  back to original input patterns using N points in(2.50) with complexity of O(NlogN). IDCT can be solved using exact same approach as forward DCT.

### Chapter 3

## Application to Thermal Aware Placement



Figure 3.1: Placement of cells to demonstrate interleaving mechanism between legalization and CG for adaptec1.

In this chapter, the application of thermal analysis for placement is presented and the general flow is depicted in Figure 3.3. The idea of force directed placement is to model each net as a spring system. A force exists between two cells and pulls two cells toward each other if two cells are connected. To construct the force model for the given placement, each multi-pin net is converted to several two-pin nets and force is calculated for each two-pin net. The force between a two-pin net is defined as follows.

$$F_{ij} = W_{ij}[(x_i - x_j)^2 + (y_i - y_j)^2]$$
(3.1)

where  $x_i, y_i$  and  $x_j, y_j$  are coordinate of cell *i* and cell *j*. The summation of all the forces for every single two-pin net is

$$\phi(x,y) = \sum_{i,j} F_{ij} = \frac{1}{2} x^T Q x + d_x^T x + \frac{1}{2} y^T Q y + d_y^T y$$
(3.2)

(3.3)

The summation of force in x and y coordinate can be solved separately and force equilibrium can be obtained by solving  $Qx = -d_x$  and  $Qy = -d_y$ . 1896  $W_{ij} = \frac{2}{(n-1)|x_i - x_j|}$ 

After matrix is built, we applied Jacobi preconditioned Conjugate Gradient method and used Polak-Ribiere as line search direction parameter to solve the problem.

#### 3.1**Placement Algorithm**

The placement strategy implemented in this thesis is based on simPL[16]. Similar to simPL, two placement algorithms are implemented. One greedily minimizes the total wirelength using Precondition Conjugate Gradient Method(CG) and the other serves as a legalizer that quickly generate a rough legalized placement. CG will try to pull every cell together to minimize wirelength and legalizer will try to pull cell to free area to remove overlap. The result generated by legalizer will act as an anchor pulling cells toward free area. The two algorithms interleave with each other, solution generated by one algorithm will serve as input to another. The process will stop when solution generated by two algorithms are converge.



Figure 3.2: Illustration of the legalization process.

### 3.2 Considering Thermal Effect in Placement

To achieve a flat thermal profile, high power cells should be evenly spread across the chip. However, greedily spreading out cells is impractical since it dramatically increases the total HPWL which will cause resulting placement unroutable. In this thesis, we addressed thermal effect by minimizing maximum on-chip temperature. Experimental results show that a smoother temperature profile can be obtained by reducing maximum on chip

The placement implemented in this thesis is a flat placer which requires no clustering of cells. A flat placer is ideal to consider thermal effect since each cell can be treated individually. To consider thermal effect, additional anchor force can be added to pull cells away from high temperature region. However, additional anchor force will increase total HPWL. Thus, the trade off between reducing maximum temperature and increase total HPWL should be carefully dealt with.



#### 3.2.1 Obtaining Thermal Anchor

After hot region is identified, an imaginary circle with radius R using center from the obtained hot region will be formed. Anchors will be created on the perimeter of circle pulling cells away from the hot region. Each cell will be pulled by an anchor that is closest in distance.

Without considering thermal effect, a cell will only suffer forces exerted from every other node it connects to and the force exerted from the pseudo anchor produced during legalization stage. Considering thermal effect, additional thermal force will be exerted on the cells within hot region. Here, we defined the anchor that serves to reduce maximum temperature as **thermal anchor** and anchor that serves to remove overlap as **legalizing anchor**. Figure 3.4 illustrates the concept of thermal anchor and legalizing anchor.



Figure 3.4: Forces exerted on a cell by thermal anchor and legalizing anchor. Thermal anchor is placed on a circle and red rectangle is the identified hot region.

To obtain the thermal anchor position for each cell, the vector to direct cell away from center is calculated.  $(x_i, y_i)$  are the coordinate of cell *i* and  $(x_o, y_o)$  are the coordinate of center point of hot region. 96

$$\vec{x} = x_i - x_o, \, \vec{y} = y_i - y_o \tag{3.4}$$

Magnitude of the vector is calculated to position thermal anchor on the perimeter of the circle. R is the radius of the circle and  $(x_t, y_t)$  are the coordinates of the thermal anchor.

$$mag = \sqrt[2]{\frac{R}{\vec{x}^2 + \vec{y}^2}}$$
(3.5)

$$x_t = x_i + mag * x_i \tag{3.6}$$

$$y_t = y_i + mag * y_i \tag{3.7}$$

The radius of circle and force exerted from each anchor determines the magnitude of perturbation of cells and how far it will move away from its original position. To reduce maximum temperature, the larger the circle is, the farther away cell is pulled away from the center.

### 3.2.2 Determine Magnitude of Thermal Force

As iteration number increases, the general structure of placement begins to form as cells becomes less congested. The force exerted by thermal anchor begin to increase such that high power density cells can be spread out more evenly. However, as structure of placement begins to form, perturbing too many cells to reduce maximum temperature will deviate the general structure of placement. To avoid perturbing too many cell and deviate cell too much from its original position, the size of hot region and radius of circle will gradually decrease in size as iteration number increase. We initially set the size of hot region to 13 bins in width and height and decrease by average of 2.5% in each iteration.

In early iterations, cells are congested in center. Anchors generated by legalizer which pull cells from congested region towards white space also serves as a force to pull cells away from hot region. Thus, we let the overlap removal force dominate thermal driven force in early stage of iteration. To adjust the proportion of force between legalizing anchor and thermal anchor, the weight of the thermal anchor  $W_{thermal}$  in iteration n is set proportional to  $n^2/N$  and weight of legalizing anchor  $W_{legal}$  is set proportional to n. Note that, N is total iteration number. Thermal anchors have negligible effect on cells in early iterations and its impact will quadratically increase as iteration number increases.

To minimize perturbation of cell and to prevent sharp increase in temperature at the perimeter of hot region, a 2D Gaussian model shown in Figure 3.5 is used to determine the magnitude of the force exerted for each thermal anchor to connecting



Figure 3.5: An illustration of 2-D Gaussian force spread model with center positioned at center of hot region.

cells. The 2-D Gaussian has the form

$$g(x,y) = Ae^{-(\frac{(x_i - x_o)^2}{2\sigma_x^2} + \frac{(y_i - y_o)^2}{2\sigma_y^2})}$$
(3.8)

 $x_o$  and  $y_o$  are the center point of the hot region.  $\sigma_x$  and  $\sigma_y$  are standard deviation in  $x_i$  and  $y_i$  direction of the cells contained within hot region. Thus, the force exerted from each thermal anchor  $F_{therm}$  at iteration n can be derived as follow

$$F_{thermX} = g(x, y) \frac{n^2}{N} \frac{1}{|x_t - xi|}$$
(3.9)

$$F_{thermY} = g(x, y) \frac{n^2}{N} \frac{1}{|y_t - yi|}$$
(3.10)

where,  $(x_t, y_t)$  is the coordinate of thermal anchor and  $(x_i, y_i)$  is the coordinate of cell *i*.

# Chapter 4 Experimental Results

In this chapter, experimental results are presented in three sections. First section presents the accuracy of the thermal model comparing to Icepak. The second presents the results of our placer comparing with partition-based placer and nonlinear force directed placer. In third section, temperature distribution and total HPWL with and without considering thermal effect is presented.

The entire thermal aware placer is written in standard C++ language and compiled using g++ 4.1.2. The experiment is conducted using Intel Xeon E5530 Quad Core machine operating at 2.4GHz. ISPD 2005 placement benchmark [2] is used for input benchmark. Final legalization and detail placement is delegated to an external binary FastDP [20]. An open-source C-code based FFT solver [10] is used in computation to generate temperature profile. All of the testcases are performed using same parameter without tuning for a specific testcase. The total HPWL is evaluated using open source GSRC bookshelf evaluator [3] and every temperature profile is evaluated with Icepak after legalization.

### 4.1 Thermal Model

To examine the accuracy of our thermal model, we reference the parameters from [12] and apply in our thermal model. Chip width and height are given in each testcases and thickness of chip is set to 0.5 mm. Thermal conductivity  $\kappa$  of silicon is set to 148  $W/(m \cdot C)$ , heat transfer rate  $h_p$  and  $h_s$  is set to 8700  $W/(m^2 \cdot C)$  and 2017  $W/(m^2 \cdot C)$  respectively. Power density of each cell is a random value ranging from 0 to 2\*10<sup>6</sup>  $W/m^2$ . Ambient temperature is set to 22.02°C and the grid number  $N_x$  and  $N_y$  used in thermal model is set to 128.

Figure 4.1 is the comparison of temperature profile obtained by running our simulation model with ANSYS Icepak. The experimental results of max temperature and average temperature running in our thermal simulation and Icepak are presented as shown in Table 4.1. On average, our thermal simulation achieves an accuracy within 6.5% deviation compare to Icepak with 240 times speedup. Note that, the deviation of temperature distribution for each testcase is calculated by taking the average of the temperature difference for each grid between temperature calculated in our thermal simulation and temperature obtained from Icepak.

|          |       |       | $\mathbf{Wit}$ | hout ' | Thern | nal     |           | With Thermal |         |        |         |           |       | Max Temp. |  |
|----------|-------|-------|----------------|--------|-------|---------|-----------|--------------|---------|--------|---------|-----------|-------|-----------|--|
|          | Our   |       |                | Icepak |       |         |           |              | Dur     | Icepak |         |           | Red   | . (%)     |  |
| #        | Max.  | Avg.  | Time(s)        | Max.   | Avg.  | Time(s) | Deviation | Max.         | Time(s) | Max.   | Time(s) | Deviation | Our   | IcePak    |  |
| adaptec1 | 64.64 | 46.82 | 0.74           | 63.78  | 50.2  | 219.71  | 8.58      | 60.21        | 0.51    | 61.62  | 247.54  | 9.22      | 5.85  | 3.39      |  |
| adaptec2 | 52.67 | 35.05 | 0.64           | 48.02  | 35.71 | 235.02  | 5.75      | 50.45        | 0.43    | 48.38  | 235.82  | 5.37      | -0.02 | -0.75     |  |
| adaptec3 | 42.56 | 31.96 | 0.68           | 45.20  | 32.83 | 236.67  | 3.43      | 44.01        | 0.67    | 42.22  | 240.66  | 3.34      | 6.10  | 6.60      |  |
| adaptec4 | 47.57 | 32.73 | 0.71           | 46.82  | 32.65 | 242.68  | 3.13      | 45.18        | 0.71    | 45.86  | 250.83  | 3.13      | 2.98  | 2.05      |  |
| bigblue1 | 74.44 | 50.12 | 0.51           | 73.41  | 55.08 | 225.46  | 10.71     | 70.77        | 0.52    | 71.36  | 234.69  | 10.53     | 3.77  | 2.79      |  |
| bigblue2 | 55.07 | 39.82 | 0.75           | 53.56  | 40.82 | 209.26  | 6.11      | 51.03        | 0.75    | 53.25  | 247.39  | 6.36      | 2.03  | 0.58      |  |
| bigblue3 | 72.49 | 36.17 | 1.24           | 71.52  | 36.48 | 218.39  | 5.62      | 65.46        | 1.35    | 64.61  | 231.09  | 5.55      | 11.29 | 9.66      |  |
| bigblue4 | 75.35 | 43.02 | 2.19           | 77.67  | 45.06 | 222.37  | 6.76      | 72.37        | 2.42    | 75.49  | 234.55  | 8.41      | 3.95  | 2.81      |  |
| Avg.     | -     | -     | 0.93           | -      | -     | 226.20  | 6.26      | -            | 0.92    | -      | 240.32  | 6.49      | 4.50  | 3.39      |  |

Table 4.1: Temperature profile comparison with Icepak

### 4.2 Placement

In this section, placement results are compared with Capo [21], APlace2 [15], mPL6 [5] and simPL [16]. Capo is a partition based placer, APlace2 is a nonlinear force-directed placer, mPL6 is currently strongest non-linear force directed placer and simPL is currently one of the strongest placer in ISPD 2005 benchmark. Our placer is on par in terms of total HPWL compare to APlace2 and outperform Capo10.5 by a nose with 3-4 times speed up. Comparing with state-of-the-art placer, our placer trail behind by 10% and is 4 times slower compare to simPL and 2 times faster compare to mPL6. However, although total HPWL trails behind state-of-theart placers, the results presented in this section demonstrates that our placer can achieve reasonable placement quality within 10% comparing to the best placers with relatively fast execution speed comparing to participated and non-linear force directed placer as be shown in Table 4.2.

| benchmarks Cel | ll # | IIDWI  |         |         | Capo 10.5 |        | APlace 2.0 |        |         | mPL6   |         |
|----------------|------|--------|---------|---------|-----------|--------|------------|--------|---------|--------|---------|
|                |      | TL M L | Runtime | HPWL    | Runtime   | HPWL   | Runtime    | HPWL   | Runtime | HPWL   | Runtime |
| adaptec1 211   | 1K   | 87.69  | 7.82    | 88.14   | 25.95     | 78.35  | 35.02      | 77.73  | 2.27    | 77.93  | 18.36   |
| adaptec2 255   | 5K   | 98.22  | 12.33   | 100.25  | 36.06     | 95.70  | 50.57      | 90.36  | 3.48    | 92.04  | 19.91   |
| adaptec3 452   | 2K   | 242.49 | 25.03   | 276.80  | 78.19     | 218.52 | 119.53     | 208.95 | 7.04    | 214.16 | 58.92   |
| adaptec4 496   | 6K   | 212.09 | 24.40   | 231.30  | 79.32     | 209.28 | 131.57     | 187.40 | 5.30    | 193.89 | 55.95   |
| bigblue1 278   | 8K   | 107.46 | 13.20   | 110.92  | 41.78     | 100.02 | 44.91      | 97.42  | 4.01    | 96.80  | 22.82   |
| bigblue2 558   | 8K   | 155.53 | 22.57   | 162.81  | 80.55     | 153.75 | 100.96     | 145.78 | 8.28    | 152.34 | 61.55   |
| bigblue3 111   | 10K  | 378.09 | 79.05   | 405.40  | 182.94    | 411.59 | 209.24     | 339.78 | 13.79   | 344.10 | 85.23   |
| bigblue4 218   | 80K  | 897.69 | 194.00  | 1016.19 | 567.15    | 871.29 | 489.05     | 808.22 | 35.80   | 829.44 | 189.83  |
| Norm           | -    | 1.00   | 1.00    | 1.07    | 3.07      | 0.97   | 3.97 -     | 0.90   | 0.26    | 0.92   | 1.89    |

Table 4.2: Placement results in ISPD 2005 benchmarks

### 4.3 Thermal Aware Placement

In Table 4.3, when weight of each thermal anchor increases linearly in each iteration, max temperature is reduced by 2.2% at the expense of 15% increase in HPWL. This is primarily due to large perturbation of cells in early iterations. By adjusting weight of thermal anchor to increase quadratically, better max temperature reduction can be obtained with less increase in total HPWL. In addition, we observed that moving cells in the hottest region is most effective to reduce maximum temperature, however, moving every single cells inside the hot region will cause temperature increase around the perimeter of hot region. Thus, we apply Gaussian force model to let cells closer to the hot region have more energy to move away while cells farther away from the hot region have less energy to move. With additional of thermal force adjustment, 4.5% reduction in max temperature with 7% increase in total HPWL.

Figure 4.1 is the temperature distribution of adaptec1 with and without considering thermal effect. Note that region in red only denote the hottest region and does not correspond to a specific temperature. Maximum temperature for Figure 4.1(c) is 5.85% less than Figure 4.1(a) with 3.5% increase in total HPWL. It can be observed that a smoother temperature distribution can be obtained by considering thermal effect. Noting that in Table 4.3, LW is the linear width and QW is Quadratic Width.

Table 4.3: Various approaches to reduce maximum temperature

|            |                                     |        |       |                                                                          |       |         |        |       | 0                        | 1      |       |         |
|------------|-------------------------------------|--------|-------|--------------------------------------------------------------------------|-------|---------|--------|-------|--------------------------|--------|-------|---------|
|            | <b>Original LW</b> increase $W = n$ |        |       | $ \mathbf{QW} \text{ increase } W = n^2/N   \mathbf{QW} + \mathbf{Gaus}$ |       |         |        |       | ıssian                   |        |       |         |
| benchmarks | Cell#                               | HPWL   | Temp. | HPWL                                                                     | Temp. | Red.(%) | HPWL   | Temp. | $\operatorname{Red}(\%)$ | HPWL   | Temp. | Red.(%) |
| adaptec1   | 211K                                | 87.69  | 63.95 | 91.40                                                                    | 60.72 | 5.05    | 89.33  | 60.43 | 5.50                     | 90.76  | 60.21 | 5.85    |
| adaptec2   | 255K                                | 98.22  | 50.44 | 115.86                                                                   | 49.25 | 2.36    | 109.94 | 52.96 | -5.00                    | 105.76 | 50.45 | -0.02   |
| adaptec3   | 452K                                | 242.49 | 46.87 | 282.51                                                                   | 49.68 | -6.00   | 274.06 | 45.02 | 3.94                     | 258.81 | 44.01 | 6.10    |
| adaptec4   | 496K                                | 212.09 | 46.57 | 219.20                                                                   | 44.43 | 4.59    | 228.01 | 45.33 | 2.66                     | 219.12 | 45.18 | 2.98    |
| bigblue1   | 278K                                | 107.46 | 73.54 | 126.37                                                                   | 71.39 | 2.92    | 123.76 | 71.25 | 3.11                     | 118.61 | 70.77 | 3.77    |
| bigblue2   | 558K                                | 155.53 | 52.09 | 160.73                                                                   | 53.22 | -2.17   | 166.15 | 51.46 | 1.21                     | 163.46 | 51.03 | 2.03    |
| bigblue3   | 1110K                               | 378.09 | 73.79 | 524.75                                                                   | 67.42 | 8.63    | 474.70 | 62.28 | 15.60                    | 439.27 | 65.46 | 11.29   |
| bigblue4   | $2180 \mathrm{K}$                   | 897.69 | 75.35 | 935.53                                                                   | 66.55 | 2.05    | 947.73 | 66.88 | 11.24                    | 909.50 | 72.37 | 3.95    |
| Norm.      | -                                   | 1.00   | 1.00  | 1.15                                                                     | 0.98  | 2.18    | 1.11   | 0.95  | 4.79                     | 1.07   | 0.95  | 4.50    |
|            |                                     |        |       | 2                                                                        |       | 18      | 96     | 75    |                          |        |       |         |





Figure 4.1: (a). Power density of adaptec1. (b). Power density of bigblue3. (c). Temperature distribution of adaptec1. (d). Temperature distribution of bigblue3. (e). Temperature distribution of adaptec1 using Icepak before thermal aware placement. (f). Temperature distribution of adaptec1 using Icepak before thermal aware placement.



Figure 4.2: Temperature Distribution before and after thermal aware placement.

# Chapter 5 Conclusions

In this thesis, we proposed a thermal aware placement using analytical thermal model and used a force directed algorithm for our placer. The analytical thermal model is implemented using Green function and temperature profile is solved using DCT with input reordering to enhance speed. Analytical thermal model can offer closed form thermal representation with negligible run time. Although analytical thermal model is less accurate but can obtain a temperature profile much faster within reasonable accuracy. Such characteristic is very suitable for thermal analysis during placement or floorplanning stage where thermal analysis does not require to be exact.

Regarding to our placement, we adopt our placer based on [16] with few modifications of our own. Although it trails behind state-of-the-art placers, experimental result shows that the quality of our placer is within reasonable range. The general structure of our placer is relatively simple allowing thermal model easily integrated.

Considering thermal effect, we proposed a new methodology by adding thermal anchors around identified hot region to reduce maximum temperature. The size of hot region is dynamic such that perturbation of cells is limited without sacrificing too much HPWL. The magnitude of the thermal anchor is calculated using a 2-D Gaussian model to obtain a smooth temperature profile.

### Bibliography

- [1] Ansys icepak. http://www.ansys.com/Products/Simulation+Technology/ Fluid+Dynamics/ANSYS+Icepak.
- [2] Ispd 2005 benchmark. http://archive.sigda.org/ispd2006/contest.html.
- [3] S. N. Adya and I. L. Markov. Executable placement utilities. http://vlsicad. eecs.umich.edu/BK/PlaceUtils/.
- [4] N. Ahmed, T. Natarajan, and K. Rao. Discrete cosine transfom. volume C-23, pages 90 – 93, jan. 1974.
- [5] T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie. mpl6: enhanced multilevel mixed-size placement. In *International Symposium on Physical Design*, pages 212–214, 2006.
- [6] S. Chaudhury.
- [7] D. Chen, E. Li, E. Rosenbaum, and S.-M. Kang. Interconnect thermal modeling for accurate simulation of circuit timing and reliability. volume 19, pages 197 -205, feb 2000.
- [8] G. Chen and S. Sapatnekar. Partition-driven standard cell thermal placement. In Proceedings of the 2003 international symposium on Physical design, ISPD '03, pages 75–80, New York, NY, USA, 2003. ACM.

- [9] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W. Chang. Ntuplace3: An analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints. volume 27, pages 1228–1240, July 2008.
- [10] M. Frigo and S. G. Johnson. Fftw. http://www.fftw.org/links.html.
- [11] B. Goplen and S. Sapatnekar. Efficient thermal placement of standard cells in 3d ics using a force directed approach. In *Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design*, ICCAD 03, pages 86–, Washington, DC, USA, 2003. IEEE Computer Society.
- B. Goplen and S. S. Sapatnekar. Efficient thermal placement of standard cells in 3d ics using a force directed approach. In Proc. Int. Conf. on Computer-Aided Design, pages 86–89, 2003.
- [13] P.-Y. Huang, C.-K. Lin, and Y.-M. Lee. Full-chip thermal analysis for the early design stage via generalized integral transforms. In *Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific*, pages 462 –467, march 2008.
- [14] A. B. Kahng, S. mo Kang, W. Li, and B. Liu. Analytical thermal placement for vlsi lifetime improvement and minimum performance variation. In *International Conference on Computer Design*, pages 71–77, 2007.
- [15] A. B. Kahng and Q. Wang. A faster implementation of aplace. In International Symposium on Physical Design, pages 218–220, 2006.
- [16] M.-C. Kim, D.-J. Lee, and I. Markov. Simpl: An effective placement algorithm. In Computer-Aided Design (ICCAD), 2010 IEEE/ACM International Conference on, pages 649–656, nov. 2010.

- [17] J. Li and H. Miyashita. Thermal-aware placement based on fm partition scheme and force-directed heuristic. volume E89-A, pages 989–995, Oxford, UK, April 2006. Oxford University Press.
- [18] J. Makhoul. A fast cosine transform in one and two dimensions. volume 28, pages 27 – 34, feb 1980.
- [19] B. Obermeier and F. M. Johannes. Temperature-aware global placement. In Proceedings of the 2004 Asia and South Pacific Design Automation Conference, ASP-DAC '04, pages 143–148, Piscataway, NJ, USA, 2004. IEEE Press.
- [20] M. Pan, N. Viswanathan, and C. C. N. Chu. An efficient and effective detailed placement algorithm. In *International Conference on Computer Aided Design*, pages 48–55, 2005.
- [21] J. Roy. Capo: Robust and scalable open-source min-cut floorplacer. In International Symposium on Physical Design, pages 218–220, 2006.
- [22] P. Spindler, U. Schlichtmann, and F. M. Johannes. Kraftwerk2 a fast forcedirected quadratic placement approach using an accurate net model. volume 27, pages 1398–1411, 2008.
- [23] C. H. Tsai and S. M. Kang. Cell-level placement for improving substrate thermal distribution. In *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, volume 19, pages 253–266, Feb. 2000.
- [24] N. Viswanathan, G.-J. Nam, C. Alpert, P. Villarrubia, H. Ren, and C. Chu. Rql: Global placement via relaxed quadratic spreading and linearization. In *Design Automation Conference, 2007. DAC '07. 44th ACM/IEEE*, pages 453 -458, june 2007.

- [25] N. Viswanathan, M. Pan, and C. C. N. Chu. Fastplace 3.0: A fast multilevel quadratic placement algorithm with placement congestion control. In Asia and South Pacific Design Automation Conference, pages 135–140, 2007.
- [26] B. Wang and P. Mazumder. Accelerated chip-level thermal analysis using multilayer green's function. volume 26, pages 325 –344, feb. 2007.
- [27] Y. Zhan and S. Sapatnekar. High-efficiency green function-based thermal simulation algorithms. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 26(9):1661-1675, sept. 2007.

