# HiPRIME: Hierarchical and Passivity Preserved Interconnect Macromodeling Engine for RLKC Power Delivery Yu-Min Lee, *Member, IEEE*, Yahong Cao, Tsung-Hao Chen, Janet Meiling Wang, *Member, IEEE*, and Charlie Chung-Ping Chen, *Member, IEEE* Abstract—This paper proposes a general hierarchical analysis methodology, HiPRIME, to efficiently analyze RLKC power delivery systems. After partitioning the circuits into blocks, we develop and apply the IEKS (Improved Extended Krylov Subspace) method to build the multiport Norton equivalent circuits which transform all the internal sources to Norton current sources at ports. Since there are no active elements inside the Norton circuits, passive or realizable model order reduction techniques such as PRIMA can be applied. The significant speed improvement, 700 times faster than Spice with less than 0.2% error and 7 times faster than a state-of-the-art solver, InductWise, is observed. To further reduce the top-level hierarchy runtime, we develop a second-level model reduction algorithm and prove its passivity. Index Terms—Model order reduction, power distribution, power grid, signal integrity. ## I. INTRODUCTION WITH the ultra deep submicron (UDSM) technology, several features of today's chips (higher operating frequencies, larger number of transistors, smaller feature size, and lower power supply voltage) have pushed the power delivery noise analysis onto the designers' list of high priority concerns [1]–[4]. Basically, the power delivery noise consists of IR drop, Ldi/dt drop, and resonance fluctuations. The IR drop has been widely discussed and extensively studied in the literatures [5]–[7]. Due to the roaring clock frequency, increasing current consumption, and even the clock-gating feature, Ldi/dt noise is quickly emerging as another power fluctuation concern [6]. Power delivery noise causing the power voltage to deviate from the ideal value can severely degrade the performance and even Manuscript received October 22, 2002; revised March 19, 2003, March 11, 2004, and August 19, 2004. This work was supported in part by the National Science Foundation, in part by the National Science Council, in part by the Taiwan Semiconductor Manufacturing Company, Ltd., in part by the United Microelectronics Corporation, and in part by the Intel Corporation. This paper was recommended by Associate Editor S. S. Sapatnekar. - Y.-M. Lee is with the Department of Communication Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: yumin@cm.nctu.edu.tw). - Y. Cao is with the Cadence Design Systems, San Jose, CA 95134 USA (e-mail: ycao@cadence.com). - T.-H. Chen is with Synopsys Inc., Taipei 110, Taiwan, R.O.C. (e-mail: tsung-hao.chen@synopsys.com). - J. M. Wang is with the Electrical Computer Engineering Department, University of Arizona, Tucson, AZ 85721 USA (e-mail: wml@ece.arizona.edu). - C. C.-P. Chen is with the Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: cchen@cc.ee.ntu.edu.tw). Digital Object Identifier 10.1109/TCAD.2005.847938 make the gate function erroneously. Therefore, the extensive analysis of RLKC power delivery systems is required to ensure them to meet the targeted performance and reliability goals. Generally speaking, one of the major difficulties for the power delivery analysis is size explosion. Tens of millions of devices and parasitics are required to be modeled and simulated over a long time period. However, it is computationally expensive to simultaneously simulate all transistors with the power delivery structure. To enhance the simulation speed, it has been proposed to decouple the power delivery structure simulation and transistors' simulation [6]. First, the current profiles of transistors can be estimated by several current extraction methods [8], [9]. After that, the power delivery network is modeled by a suitable resistance-inductance-capacitance (RLC) circuit attached with many current sources. In this way, the simulation can be effectively done since there are fewer elements in the circuit, and the RLC circuit can be simulated with one LU decomposition. However, due to the large size and grid nature of linear circuits, the traditional circuit simulation engines such as SPICE [10] cannot fulfill the demanding task in a time efficient manner. For this reason, the hierarchical simulation technique has been applied by [6] to speed up the power delivery network simulation. The model order reduction technique is another efficient way which can be utilized to speed up the circuit analysis [11], [12], and has been widely studied and improved over the last decade [5], [13]–[16]. Starting from asymptotic waveform evaluation (AWE) [13] to the passive reduced-order interconnect macromodeling algorithm (PRIMA) [16], model order reduction techniques have been successfully extended to consider the inductance effects with reasonable accuracy. Later, an extended Krylov subspace method, extended Krylov subspace (EKS) [5], has been developed to simulate large-scale power delivery circuits with many piece wise linear (PWL) current sources. To resolve the source waveform modeling issues, EKS needs to perform the moment shifting procedure to recover the proper moments. In this paper, we utilize these two techniques, hierarchical analysis and model order reduction, to develop a novel hierarchical power delivery analysis engine. The contributions of our method are listed as follows. First, we establish a novel hierarchical power delivery macromodeling methodology, which integrates the multiple port Norton equivalent theorem [17] with the model order reduction algorithm to generate compact and accurate model, and achieve significant runtime improvements. Then, we enhance the EKS method such that Fig. 1. Modeling of the power delivery network. it no longer needs to perform moment shifting for source waveform modeling. Therefore, the highly accurate simulation results are observed. Finally, to further reduce the runtime, we develop a multiple level passive model reduction algorithm and prove its passivity. The remainder of this paper is organized as follows. Section II introduces the basic power delivery network modeling, circuit formulations, and the concepts of model order reduction. Section III presents our hierarchical and passive order-reduced macromodeling methodology. Section IV shows several experimental results. Finally, Section V concludes the work. #### II. PRELIMINARY The RLKC elements are applied to model the power delivery system as shown in Fig. 1. To reduce the simulation runtime, we decouple the linear simulation from the nonlinear simulation [6]. Once the nonlinear simulation is done, the current sources and capacitors are used to model the gate current consumption, and diffusion and gate capacitance, respectively. Therefore, the task of power grid analysis is simplified to simulate a linear RLKC network with linear time-varying current sources, and measure the voltage drop at each grid. A linear RLKC circuit can be represented as a set of circuit equations by using the modified nodal analysis (MNA) method as follows: $$\mathbf{G}x + \mathbf{C}\frac{d}{dt}x = \mathbf{B}u\tag{1}$$ where x is the variable vector consisting of nodal voltages, and the currents flowing through the inductor and voltage sources, u denotes the vector of the port voltage sources and internal current sources, $\mathbf{G}$ is the conductance matrix, and $\mathbf{C}$ is the susceptance matrix. They can be rewritten as $$\mathbf{G} = \begin{bmatrix} \mathbf{N} & \mathbf{E} \\ -\mathbf{E}^T & \mathbf{0} \end{bmatrix} \tag{2}$$ $$C = \begin{bmatrix} Q & 0 \\ 0 & H \end{bmatrix}. \tag{3}$$ N, Q, and H contain the stamping of resistors, capacitors, and inductors. Note that H contains both self and mutual inductance (K elements). E corresponds to the MNA current variables' contribution to the Kirchhoff's Current Law (KCL) equations. Circuit equations as shown in (1) can be transformed to the s-domain by the Laplace transformation $$Gx + sCx = Bu. (4)$$ The model order reduction generates an analytical model which is a compact description of the original circuit by matching its moments or poles. To illustrate the idea of moment matching, we expand both side of the (4) into a Taylor series around zero frequency $$(\mathbf{G} + s\mathbf{C})(m_0 + m_1 s + m_2 s^2 + \cdots)$$ = $\mathbf{B}(u_0 + u_1 s + u_2 s^2 + \cdots)$ (5) where $m_i$ and $u_i$ , the coefficients of the $i_{\rm th}$ term in the Taylor series, are the $i_{\rm th}$ moment of ${\bf x}$ and ${\bf u}$ , respectively. The basic idea of moment matching is to represent the finite, unknown moments of the left-hand side of the above equation in terms of the known moments of the right hand side. During the moment matching process, PRIMA uses impulse sources to preserve the input-output transfer characteristics. The impulse sources are constants in the frequency domain. Therefore, the Taylor expansion becomes $$(\mathbf{G} + s\mathbf{C})(m_0 + m_1 s + m_2 s^2 + \cdots) = \mathbf{B}u_0.$$ (6) The above equation produces an iterative relationship between the moments of x and u: $Gm_0 = Bu_0$ , $Gm_i + Cm_{i-1} =$ 0. This explicit moments-matching method is seldom used because it has a numerical stability problem, especially in the higher order iterations. To avoid the numerical errors, a set of orthogonal bases is built to span a subspace, which is the same one spanned by the finite moments of x. The set of the above orthogonal bases can be represented as a matrix V, which is equivalent to the Krylov subspace of $(\mathbf{A} = -\mathbf{G}^{-1}\mathbf{C}, \mathbf{R} = \mathbf{G}^{-1}\mathbf{B})$ , and is defined as $K_q(\mathbf{A}, \mathbf{R}) = colsp(\mathbf{R}, \mathbf{A}\mathbf{R}, \mathbf{A}^2\mathbf{R} \dots \mathbf{A}^{q-1}\mathbf{R})$ . The dimension of the original circuit (G, C, B) is reduced because the rank of V is much smaller than that of the original matrix A. The order-reduced model can be obtained by projecting the original model (G, C, B) onto the Krylov subspace, $K_q(A, R)$ , by using the congruent transformation. The system matrices of the reduced system are denoted as $\mathbf{G} = \mathbf{V}^T \mathbf{G} \mathbf{V}$ , $\mathbf{C} = \mathbf{V}^T \mathbf{C} \mathbf{V}$ , and $\widetilde{\mathbf{B}} = \mathbf{V}^T \mathbf{B}$ . This compact model can be represented by the following MNA equation in the time domain: $$\widetilde{\mathbf{G}}\widetilde{x} + \widetilde{\mathbf{C}}\frac{d}{dt}\widetilde{x} = \widetilde{\mathbf{B}}u. \tag{7}$$ # III. HIERARCHICAL AND PASSIVE ORDER-REDUCED MACROMODELING Our hierarchical and passive model order reduction engine consists of three steps. First, the power delivery networks are partitioned into multiple blocks. Each block may contain RLKC interconnect networks and many internal switching currents. Second, the Norton equivalent order-reduced model for each block is constructed by three phases. Phase 1 is to find the passive order-reduced model for the RLKC interconnect networks of each block. Phase 2 is to calculate the Norton equivalent currents of the internal current sources at each block. Phase **Algorithm:** HiPRIME (Hierarchical and Passivity Preserved Interconnect Macromodeling Engine) - A1. Partition the given circuit into multiple blocks - **A2.** For each block, its multi-port Norton equivalent order reduced circuit is generated by the following procedure: - A2.1 Set all the active sources to zeros and perform passive model order reduction for the linear circuit using any passivity guaranteed model reduction algorithm such as PRIMA. - A2.2 Activate all sources and short all the ports nodes to ground and nd out the Norton equivalent source at each port by IEKS or SPICE simulation. - A2.3 Form the Norton equivalent circuit by attaching the Norton equivalent source at each port to the reduced circuit generated by Step A2.1. - A3. Form the integrated circuit by combining all reduced modules. Perform the higher level model order reduction by using IEKS or PRIMA when necessary. Fig. 2. HiPRIME algorithm. 3 attaches those Norton equivalent currents at the ports of the order-reduced RLKC model. Finally, an integration algorithm is developed to integrate those macromodels, and the higher level model order reduction can be performed when necessary. The outline and flowchart of the proposed algorithm are shown in Figs. 2 and 3, respectively. Steps A2.1, A2.2, and A3 shown in Fig. 2 are discussed in the following sections. # A. Passive Reduced-Order Macromodeling of RLKC Networks After the power delivery network is partitioned into multiple blocks, each block may contain passive RLKC interconnects and internal switching current sources. In order to obtain a passive order-reduced model, all of the internal current sources are disconnected to make this block a passive RLKC network. The effect of those current sources on grid voltages will be considered later. We may apply a conventional passive model order reduction algorithm, such as PRIMA, to each block. Let the MNA equation for the RLKC interconnect network of the $i_{\rm th}$ block be $$\mathbf{G}_{i}x_{i} + \mathbf{C}_{i}\frac{d}{dt}x_{i} = \mathbf{B}_{i}u_{i} \tag{8}$$ where $u_i$ is the port voltage vector of the $i_{\rm th}$ block. PRIMA constructs a transfer matrix $\mathbf{V}_i$ , and transfers the $\mathbf{G}_i$ , $\mathbf{C}_i$ , and $\mathbf{B}_i$ into $\widetilde{\mathbf{G}}_i$ , $\widetilde{\mathbf{C}}_i$ , and $\widetilde{\mathbf{B}}_i$ , whose dimensions are reduced. The compact MNA equation of the reduced block is $$\widetilde{\mathbf{G}}_{i}\widetilde{x}_{i} + \widetilde{\mathbf{C}}_{i}\frac{d}{dt}\widetilde{x}_{i} = \widetilde{\mathbf{B}}_{i}u_{i}.$$ (9) One advantage of model order reduction after partition is that the size of the circuit handled by the model order reduction algorithm is much smaller. Therefore, the limit of memory might be eased. It also makes parallel order reduction for different blocks possible, and the speed of analysis can be improved. Furthermore, each reduced block is a macromodel, which means that it can be reused to save the runtime. For example, if one of the blocks has been modified, HiPRIME only needs to regenerate the reduced model of this block. However, the flat method need to regenerate the reduced model from scratch. # B. Efficient Way of Finding the Norton Equivalent Current In this section, we consider the effects of the internal current sources ignored in the previous procedure. The Norton equivalent theory [17] is utilized to find out the equivalent current source at each port, and used to replace all the internal current sources so that the port responses of each block are preserved. To distinguish the port voltage sources from the internal current sources, the (8) can be modified as $$\mathbf{G}_{i}x_{i} + \mathbf{C}_{i}\frac{d}{dt}x_{i} = \begin{bmatrix} \mathbf{B}_{i} & \mathbf{B}'_{i} \end{bmatrix} \begin{bmatrix} v_{i} \\ i_{gi} \end{bmatrix}$$ (10) where $v_i$ and $i_{\rm gi}$ denote the independent voltage sources and the internal switching current sources in the $i_{\rm th}$ block respectively. The ${\bf B}_i$ , and ${\bf B}_i'$ denote the positions of the voltage sources and the current sources relative to the whole network. The procedure of calculating the equivalent current sources at the ports is illustrated in Fig. 4. The port currents, with the port voltages set to zeros, are the Norton equivalent current sources, and the port currents can be obtained by $i_{Ni} = {\bf B}_i^T x_i$ . Several methods can be applied to solve (10) with the voltage sources $v_i$ set to zeros. In our approach, we develop the IEKS method, an improved version of EKS such that no moment shifting needed to solve (10). The description of IEKS is presented in the next two sessions. 1) Improved EKS: Developed by Janet et al. in [5], EKS directly calculates the orthogonalized moments of the response when multiple sources are turned on at the same time. Therefore, unlike PRIMA whose runtime is heavily dependent on the number of ports, the runtime of EKS is independent of that. The EKS models an independent PWL source as a sum of delayed ramps in the Laplace domain $$u(s) = \frac{1}{s^2} \sum_{i=0}^{K} r_i \exp(-\beta_i s).$$ (11) This expression contains 1/s, and $1/s^2$ terms. Unfortunately, the traditional Krylov subspace methods start the moment matching from the $0_{th}$ moment. Therefore, EKS needs to extend the Krylov subspace by shifting the moments toward right in the frequency spectrum. This moment shifting in EKS is tedious and error-prone. We develop an improved moment calculation method which ensures that the $-1_{st}$ and $-2_{nd}$ order moments are all zeros for any arbitrary finite time PWL waveform, and hence, the moment shifting process can be removed. Since for simulation purpose we are only interested in a specific time period, the finite-time assumption is quite general. We believe this procedure is numerically more sound than the original EKS method. IEKS Moments Calculating Algorithm: Given a finite-time PWL source u(t) as shown in Fig. 5, u(t) can be written as $$u(t) = \sum_{i=0}^{K} \{ [a_i + \gamma_i(t - \tau_i)] \mathbf{E}_{(t - \tau_i)} - [a_{i+1} + \gamma_i(t - \tau_{i+1})] \mathbf{E}_{(t - \tau_{i+1})} \}$$ (12) where $\gamma_i = (a_{i+1} - a_i)/(\tau_{i+1} - \tau_i)$ is the slope of line segment between time $\tau_i$ and $\tau_{i+1}$ , and $E_{(t-\tau_i)}$ is a unit-step func- Fig. 3. Flowchart for the hierarchical and passivity preserved interconnect macromodeling engine. Fig. 4. Finding the equivalent current of internal sources. Fig. 5. Waveform of the source. tion with delay $\tau_i$ . By taking the Laplace transform of (12) and expanding the transform to its Taylor expansion, we have (13), which can be seen at the bottom of the page. Let $\tilde{u}_i$ denote the coefficient of the $s^i$ term. The Taylor expansion of $\mathcal{L}(u(t))$ can be represented as $$\mathcal{L}\{u(t)\} = \{\tilde{u}_{-2}s^{-2} + \tilde{u}_{-1}s^{-1} + \tilde{u}_0 + \tilde{u}_1s + \tilde{u}_2s^2 + \dots + \tilde{u}_ms^m + \dots\}.$$ (14) After calculating the first two coefficients, we conclude $$\tilde{u}_{-2} = \sum_{i=0}^{K} (\gamma_i - \gamma_i) = 0$$ $$\tilde{u}_{-1} = \sum_{i=0}^{K} (a_i - \gamma_i \tau_i - a_{i+1} + \gamma_i \tau_{i+1})$$ $$= \sum_{i=0}^{K} (a_i - a_{i+1} - \gamma_i (\tau_i - \tau_{i+1})) = 0.$$ (15) Finally, we derive the explicit formulas for the rest of the coefficients, $\tilde{u}_0, \tilde{u}_1, \ldots$ , etc. This procedure is summarized in Fig. 6. The first two terms are eliminated and (14) can be rewritten as a moment representation starting from the $0_{\rm th}$ moment $$\mathcal{L}\{u(t)\} = \{\tilde{u}_0 + \tilde{u}_1 s + \tilde{u}_2 s^2 + \dots + \tilde{u}_m s^m + \dots\}.$$ (17) *Lemma 1:* Given a finite-time PWL source, IEKS constructs its moment representation which the $-1_{\rm st}$ and $-2_{\rm nd}$ order moments are zeros. 2) System Solution by IEKS: IEKS generates a system transform matrix $V_i$ , by which the $i_{\rm th}$ block of the original system is transformed into a compact description $$\widetilde{\mathbf{G}}_{i}\widetilde{x}_{i} + \widetilde{\mathbf{C}}_{i}\frac{d}{dt}\widetilde{x}_{i} = \begin{bmatrix} \widetilde{\mathbf{B}}_{i} & \widetilde{\mathbf{B}}'_{i} \end{bmatrix} \begin{bmatrix} \mathbf{0} \\ i_{\mathrm{gi}} \end{bmatrix}.$$ (18) This compact form can be solved quickly in the time domain by standard integration algorithms. The solution of the $i_{\rm th}$ block is recovered by $x_i \approx \mathbf{V}_i \widetilde{x}_i$ , and the desired port currents can be directly obtained by $i_{Ni} = \mathbf{B}^T x_i \approx \widetilde{\mathbf{B}}^T \widetilde{x}_i$ . $$\mathcal{L}\{u(t)\} = \frac{1}{s^2} \sum_{i=0}^{K} \left\{ a_i s \sum_{l=0}^{\infty} (-1)^l \frac{\tau_i^l}{l!} s^l + \gamma_i \sum_{l=0}^{\infty} (-1)^l \frac{\tau_i^l}{l!} s^l - a_{i+1} s \sum_{l=0}^{\infty} (-1)^l \frac{\tau_{i+1}^l}{l!} s^l - \gamma_i \sum_{l=0}^{\infty} (-1)^l \frac{\tau_{i+1}^l}{l!} s^l \right\}$$ (13) Algorithm: IEKS Moments Calculating Algorithm **Input:** 1) A PWL source u(t) with $\{(a_0, \tau_0), (a_1, \tau_1)\}$ $\cdots, (a_{K+1}, \tau_{K+1})$ 2) M, number of moments to calculate **Output:** $\mathbf{u}_m = \{ u_1, u_2, \cdots, u_M \}$ , the first M moments of the PWL source. Begin **for** i = 0 : Kend**for** m = 1 : M $\quad \mathbf{for}\ i=0:K+1$ $\beta_i^{(m+1)} = \frac{-\tau_i}{m+1} \beta_i^{(m)}$ $u_{m-1} = -(a_{K+1} - \gamma_K \frac{\tau_{K+1}}{m+1})\beta_{K+1}^{(m)} + (a_0 - \gamma_0 \frac{\tau_0}{m+1})\beta_0^{(m)} - \sum_{i=0}^{K-1} (\gamma_i - \gamma_{i+1})\beta_{i+1}^{(m+1)}$ $\mathbf{End}$ Fig. 6. IEKS moments calculating algorithm. Fig. 7. Equivalent circuit for each block. # C. Macromodel Integration and Top Level Reduced-Model Simulation After Step A2 of HiPRIME as illustrated in Fig. 2, a block consisting of RLKC segments with many internal PWL currents is transformed into a passive order reduced block with current sources attached only at the ports. The new macromodel of each block is illustrated in Fig. 7, and each port response of the original block is preserved. Each macromodel is generated for each specific block, and the entire network is integrated by combining those macromodels together. Each block is viewed as a node of the integrated network, and is stamped into the MNA equation of the entire network. The combination of $i_{th}$ block and $j_{th}$ block can be represented as in $$\begin{bmatrix} \widetilde{\mathbf{G}}_{i} & \mathbf{0} & -\widetilde{\mathbf{B}}_{ij} \\ \mathbf{0} & \widetilde{\mathbf{G}}_{j} & -\widetilde{\mathbf{B}}_{ji} \\ \widetilde{\mathbf{B}}_{ij}^{T} & \widetilde{\mathbf{B}}_{ji}^{T} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \widetilde{x}_{i} \\ \widetilde{x}_{j} \\ u_{ij} \end{bmatrix} + \begin{bmatrix} \widetilde{\mathbf{C}}_{i} & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \widetilde{\mathbf{C}}_{j} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & \mathbf{0} \end{bmatrix} \frac{d}{dt} \begin{bmatrix} \widetilde{x}_{i} \\ \widetilde{x}_{j} \\ u_{ij} \end{bmatrix}$$ $$= \begin{bmatrix} \widetilde{\mathbf{B}}_{ii} & \mathbf{0} & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \widetilde{\mathbf{B}}_{jj} & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & \mathbf{E}_{i}^{T} & \mathbf{E}_{j}^{T} \end{bmatrix} \begin{bmatrix} u_{i} \\ u_{j} \\ i_{N_{i}} \\ i_{N_{i}} \end{bmatrix}$$ (19) where $u_{ij}$ nodal voltages at the common ports of $i_{th}$ and $j_{th}$ nodal voltages of the $i_{\rm th}$ block's ports which are not $u_i$ connected to the $j_{\rm th}$ block; nodal voltages of the $j_{\rm th}$ block's ports which are not $u_j$ connected to the $i_{\rm th}$ block; connection between the $i_{ m th}$ block's internal nodes and ports which are exclusive of $j_{th}$ block; connection between the $j_{ m th}$ block's internal nodes and ports which are exclusive of $i_{\rm th}$ block; $\widetilde{\mathbf{B}}_{ij}$ connection of the $i_{\rm th}$ block's internal nodes to the $j_{\rm th}$ $\tilde{\mathbf{B}}_{ii}$ connection of the $j_{\rm th}$ block's internal nodes to the $i_{\rm th}$ equivalent ports' currents of the $i_{\rm th}$ block which are extracted from the $i_{\rm th}$ block; equivalent ports' currents of the $j_{\rm th}$ block which are $i_{N_i}$ extracted from the $j_{\rm th}$ block; $\mathbf{E}_i$ connection of the internal nodes to the equivalent port currents for the $i_{\rm th}$ block; $\mathbf{E}_{i}$ connection of the internal nodes to the equivalent port currents for the $j_{\rm th}$ block. Given this glued macromodel in (19), the model order reduction and simulation techniques such as PRIMA or IEKS can be further applied to the top level to save runtime. Since there may be more than two hierarchical levels, the higher level model order reduction is introduced as follows. First, some block system matrices for the $i_{\rm th}$ and $j_{\rm th}$ block are defined as $$\mathbf{G}' = \begin{bmatrix} \widetilde{\mathbf{G}}_{i} & \mathbf{0} & -\widetilde{\mathbf{B}}_{ij} \\ \mathbf{0} & \widetilde{\mathbf{G}}_{j} & -\widetilde{\mathbf{B}}_{ji} \\ \widetilde{\mathbf{B}}_{ij}^{T} & \widetilde{\mathbf{B}}_{ji}^{T} & \mathbf{0} \end{bmatrix}$$ $$\mathbf{C}' = \begin{bmatrix} \widetilde{\mathbf{C}}_{i} & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \widetilde{\mathbf{C}}_{j} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & \mathbf{0} \end{bmatrix}$$ $$(20)$$ $$\mathbf{C}' = \begin{bmatrix} \widetilde{\mathbf{C}}_i & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \widetilde{\mathbf{C}}_j & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & \mathbf{0} \end{bmatrix}$$ (21) $$\mathbf{B}' = \begin{bmatrix} \mathbf{\tilde{B}}_{ii} & \mathbf{0} \\ \mathbf{0} & \widetilde{\mathbf{B}}_{jj} \\ \mathbf{0} & \mathbf{0} \end{bmatrix}. \tag{22}$$ Then, with the internal current sources disconnected, the system becomes $$\mathbf{G}' \begin{bmatrix} \widetilde{x}_i \\ \widetilde{x}_j \\ u_{ij} \end{bmatrix} + \mathbf{C}' \frac{d}{dt} \begin{bmatrix} \widetilde{x}_i \\ \widetilde{x}_i \\ u_{ij} \end{bmatrix} = \mathbf{B}' \begin{bmatrix} u_i \\ u_j \end{bmatrix}. \tag{23}$$ Let $\widetilde{\mathbf{V}} = [\widetilde{\mathbf{V}}_1^T \ \widetilde{\mathbf{V}}_2^T \ \widetilde{\mathbf{V}}_3^T]^T$ be the orthogonal bases of the subspace spanned by the moments of $[\widetilde{x}_i^T \ \widetilde{x}_j^T \ u_{ij}^T]^T$ , and denote $\widetilde{\mathbf{G}}' = \widetilde{\mathbf{V}}^T \mathbf{G}' \widetilde{\mathbf{V}}, \ \widetilde{\mathbf{C}}' = \widetilde{\mathbf{V}}^T \mathbf{C}' \widetilde{\mathbf{V}}, \ \widetilde{\mathbf{B}}' = \widetilde{\mathbf{V}}^T \mathbf{B}', \ [\widetilde{x}_i^T \ \widetilde{x}_j^T \ u_{ij}^T]^T \approx \widetilde{\mathbf{V}} \widetilde{z}$ . The MNA equation generated by the higher level order-reduction is $$\widetilde{\mathbf{G}}'\widetilde{z} + \widetilde{\mathbf{C}}'\frac{d}{dt}\widetilde{z} = \widetilde{\mathbf{B}}' \begin{bmatrix} u_i \\ u_j \end{bmatrix}. \tag{24}$$ Fig. 8. Comparison of EKS, IEKS and EXACT (a) frequency domain simulations result of IEKS, EKS and EXACT with regard to both magnitude and phase (b)timing domain simulation results of IEKS, EKS and SPICE. ### D. Preservation of Passivity In order to apply PRIMA or IEKS to the reduced model (as described in the previous section), the higher level order-reduced model must be also passive. The warrant is given by the following theorem. Theorem 1: During the hierarchical model order reduction, the passivity of the higher level order-reduced macromodel is preserved. That is to say, the transfer function $\mathbf{Y}(s)$ of the higher level order-reduced system satisfies - 1) $\mathbf{Y}(s^*) = \mathbf{Y}^*(s)$ for all complex s. - 2) $\mathbf{Y}(s)$ is a positive matrix, that means, $Z^{\star T}(\mathbf{Y}(s) + \mathbf{Y}^{T}(s^{\star}))Z \succeq 0$ for any complex s satisfying $Re(s) \succ 0$ and for any complex vector Z. *Proof:* With the impulse voltages stimulating at ports, and applying Laplace transform to (24), the transfer function can be obtained as $$\mathbf{Y}(s) = \widetilde{\mathbf{B}'}^{T} (\widetilde{\mathbf{G}}' + s \cdot \widetilde{\mathbf{C}}')^{-1} \widetilde{\mathbf{B}}'. \tag{25}$$ Since the system matrices are all real, the first condition is met naturally. To prove that the second condition is also met, we start from $$Z^{\star T} (\mathbf{Y}(s) + \mathbf{Y}^{T}(s^{\star})) Z$$ $$= Z^{\star T} \left[ \widetilde{\mathbf{B}'}^{T} (\widetilde{\mathbf{G}}' + s \cdot \widetilde{\mathbf{C}}')^{-1} \widetilde{\mathbf{B}}' + \widetilde{\mathbf{B}'}^{T} (\widetilde{\mathbf{G}}' + s^{\star} \cdot \widetilde{\mathbf{C}}')^{-T} \widetilde{\mathbf{B}}' \right] Z.$$ (26) By plugging $w=(\widetilde{\mathbf{G}}'+s^\star\cdot\widetilde{\mathbf{C}}')^{-T}\widetilde{\mathbf{B}}'Z$ and $s=j\varpi+\sigma$ into (26), it becomes $$Z^{\star T} (\mathbf{Y}(s) + \mathbf{Y}^{T}(s^{\star})) Z$$ $$= w^{\star T} \left[ \widetilde{\mathbf{G}}' + (\sigma + j\varpi) \widetilde{\mathbf{C}}' + \widetilde{\mathbf{G}'}^{T} + (\sigma - j\varpi) \widetilde{\mathbf{C}}'^{T} \right] w$$ $$= w^{\star T} \left( \widetilde{\mathbf{G}}' + \widetilde{\mathbf{G}'}^{T} \right) w + \sigma w^{\star T} \left( \widetilde{\mathbf{C}}' + \widetilde{\mathbf{C}}'^{T} \right) w. \tag{27}$$ Since $C_i$ and $C_j$ are symmetric, $C_i^T + C_i = 2C_i$ , and $C_j^T + C_j = 2C_j$ . Along with the fact that $C_i$ , and $C_j$ are nonnegative definite, it yields $$\sigma w^{\star T} \left( \widetilde{\mathbf{C}}' + \widetilde{\mathbf{C}}'^T \right) w$$ $$= \sigma w^{\star T} \widetilde{\mathbf{V}}_1^T \mathbf{V}_i^T 2 \mathbf{C}_i \mathbf{V}_i \widetilde{\mathbf{V}}_1 w + \sigma w^{\star T} \widetilde{\mathbf{V}}_2^T \mathbf{V}_j^T 2 \mathbf{C}_j \mathbf{V}_j \widetilde{\mathbf{V}}_2 w \succeq 0$$ (28) for any complex vector Z, and positive $\sigma$ . Since $\mathbf{N}_i$ , and $\mathbf{N}_j$ are symmetric, nonnegative definite matrices, we have $$\begin{split} \boldsymbol{w}^{\star T} \left( \widetilde{\mathbf{G}}' + \widetilde{\mathbf{G}'}^T \right) \boldsymbol{w} &= \boldsymbol{w}^{\star T} \widetilde{\mathbf{V}}_1^T \mathbf{V}_i^T (\mathbf{G}_i^T + \mathbf{G}_i) \mathbf{V}_i \widetilde{\mathbf{V}}_1 \boldsymbol{w} \\ &+ \boldsymbol{w}^{\star T} \widetilde{\mathbf{V}}_2^T \mathbf{V}_j^T \left( \mathbf{G}_j^T + \mathbf{G}_j \right) \mathbf{V}_j \widetilde{\mathbf{V}}_2 \boldsymbol{w} \\ &= \boldsymbol{w}^{\star T} \widetilde{\mathbf{V}}_1^T \mathbf{V}_i^T \begin{bmatrix} 2 \mathbf{N}_i & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix} \mathbf{V}_i \widetilde{\mathbf{V}}_1 \boldsymbol{w} \\ &+ \boldsymbol{w}^{\star T} \widetilde{\mathbf{V}}_2^T \mathbf{V}_j^T \begin{bmatrix} 2 \mathbf{N}_j & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix} \mathbf{V}_j \widetilde{\mathbf{V}}_2 \boldsymbol{w} \succeq 0 \end{split}$$ for any complex vector Z. Hence, the second condition is also satisfied. Therefore, the passivity of the higher level order-reduced macromodel is preserved. $\diamondsuit$ # IV. EXPERIMENTAL RESULTS This section demonstrates the speed and accuracy of HiPRIME and IEKS, and compares them with other methods. We use mesh networks to model the power delivery networks, which consist of lumped RC/RLKC segments with many current sources attached inside. The first example is a 5000-node *RLC* circuit, with wire resistance as 0.01 $\Omega$ p.u.l. (per unit length), wire capacitance as 1 pF p.u.l., wire inductance as 10 hH p.u.l., and load capacitance as 40 pF. The bode diagram Fig. 9. Accuracy analysis of RC circuit case: (a) waveform result of HiPRIME, IEKS(flat) and Back Euler (b) error spectrum of HiPRIME and IEKS(flat). Fig. 10. Accuracy analysis of RLC circuit case: (a) waveform result of HiPRIME, IEKS(flat) and Back Euler (b) error spectrum of HiPRIME and IEKS(flat). is in Fig. 8(a). Starting from 1 GHz, EKS shows noticeable difference with the exact value (theoretical calculation) and IEKS results. The results of IEKS matche very well with the EXACT results with regard to both magnitude and phase from low frequency up to over 10 GHz. Fig. 8(b) shows the transient simulation results. The EXACT result is generated by SPICE. For the rest of the examples, each lumped RC/RLKC segment uses $R=0.2\,\Omega$ , $L=1.0\,\mathrm{pH}$ , and $C=0.024\,\mathrm{fF}$ , and HiPRIME TABLE I RUNTIME OF IEKS(FLAT), INDUCTWISE AND SPICE | Circuit | IEKS(flat) | InductWise | Speedup | Spice | Speedup | |---------|------------|------------|---------|---------|---------| | Size | (s) | (s) | (X) | (s) | (X) | | 7861 | 1.46 | 14.76 | 10.1 | 697.13 | 477.48 | | 14081 | 3.88 | 29.77 | 7.67 | 2728.18 | 703.14 | | 43541 | 13.49 | 107.05 | 7.93 | _ | _ | | 89201 | 35.33 | 244.95 | 6.93 | - | - | Fig. 11. Runtime analysis of (a) IEKS(flat), InductWise and Spice (b) IEKS(flat) and InductWise. partitions each original circuit into two blocks. The accuracy of HiPRIME for the *RC* and *RLC* circuits is tested and the results are shown in Figs. 9 and 10, respectively. A grid node is randomly picked and its voltage waveforms of HiPRIME are compared with those of IEKS(flat), and Back Euler. In the *RC* circuit as shown in Fig. 9, the voltage waveforms of HiPRIME, IEKS(flat), and Back Euler are indistinguishable, the error percentage of HiPRIME, and IEKS(flat) for 80% time intervals is within 0.001%, and their maximum error is less than 4.5%. In the *RLC* circuit as shown in Fig. 10, the voltage waveforms of HiPRIME, and IEKS(flat) match the result obtained by Back Euler method very well, and the error percentage for 50% time intervals is within 0.001%. We implement IEKS(flat) in C++ language and test it on a Pentium III 933-MHz machine. The results are compared with Spice, and an efficient time domain solver InductWise [18], [19]. Table I summarizes the runtime results, and the runtime comparisons are shown in Fig. 11. The significant speed improvement, 700 times faster than Spice, is observed and the same tendency that the speed up increases with larger circuit size is shown. The IEKS is also around seven times faster than the InductWise. We also implement HiPRIME, IEKS(flat), and Back Euler in Matlab, and test them on Sun Ultrasparc V. Each circuit is partitioned into two subcircuits in HiPRIME. Tables II and III summarize the runtime results, and the runtime comparisons are shown in Figs. 12 and 13 for the *RC* and *RLC* circuits respectively. From the figures, we can see the tendency that the speed up gets more impressive as the circuit size increases. Finally, we compare the runtime between HiPRIME and IEKS(flat). The program is implemented in C++ language and tested on a Pentium IV 3.0-GHz machine with 3.0 GB of memory. The number of partitions is two in HiPRIME. The result is summarized in Table IV. The HiPRIME is about two TABLE II RUNTIME OF RC CIRCUIT CASE | Circuit | HiPRIME | IEKS(flat) | Speedup | Back Euler | Speedup | |---------|---------|------------|---------|------------|---------| | Size | (s) | (s) | (X) | (s) | (X) | | 203 | 2.52 | 0.89 | 0.36 | 17.1 | 6.79 | | 803 | 2.98 | 1.97 | 0.66 | 78.3 | 26.3 | | 2403 | 4.39 | 6.17 | 1.40 | 288.9 | 65.8 | | 5003 | 7.93 | 13.18 | 1.66 | 760.1 | 95.8 | TABLE III RUNTIME OF *RLC* CIRCUIT CASE | Circuit | HiPRIME | IEKS(flat) | Speedup | Back Euler | Speedup | |---------|---------|------------|---------|------------|---------| | Size | (s) | (s) | (X) | (s) | (X) | | 443 | 2.76 | 1.29 | 0.47 | 29.5 | 10.7 | | 1883 | 3.87 | 5.08 | 1.31 | 129.8 | 33.5 | | 3843 | 6.72 | 12.94 | 1.92 | 276.9 | 41.2 | | 5803 | 11.81 | 27.15 | 2.29 | 427.2 | 36.6 | times faster than the IEKS(flat), and the speed up gets more impressive as the circuit size increases. #### V. CONCLUSION A novel hierarchical power delivery analysis methodology is presented. This methodology integrates the multiple-port Norton equivalent theorem with the model order reduction algorithm to generate compact models from the original circuit. Experimental results show that the simulation is accurate and fast. The procedure of generating compact models involves an improved IEKS method which it no longer needs to perform moment shifting for the source waveform modeling. To further reduce runtime, a multiple-level passive model reduction algorithm is developed and its passivity has been proved. It has been known that the runtime of PRIMA is proportional to the number of ports. Although we use PRIMA to generate the Fig. 12. Runtime analysis of RC circuit case: (a) runtime of HiPRIME, IEKS(flat) and Back Euler (b) runtime of HiPRIME and IEKS(flat). Fig. 13. Runtime analysis of RLC circuit case: (a) runtime of HiPRIME, IEKS(flat) and Back Euler (b) runtime of HiPRIME and IEKS(flat). $\label{eq:table_iv} \textbf{TABLE IV} \\ \textbf{RUNTIME COMPARISON BETWEEN HIPRIME AND IKES(FLAT)}$ | Circuit Size | HiPRIME (s) | IEKS(flat) (s) | Speedup (X) | |--------------|-------------|----------------|-------------| | 12300 | 0.359 | 0.735 | 2.047 | | 49600 | 1.406 | 3.015 | 2.144 | | 199200 | 7.157 | 15.890 | 2.220 | | 448800 | 18.719 | 43.094 | 2.569 | reduced system in the simulation, we can also utilize IEKS to generate both the reduced system and the Norton equivalent cur- rent sources. We plan to investigate the realizable model order reduction algorithms whose runtime are port size independent. we would also like to point out that the focus of this paper is not on the partition algorithms. However, a good partition algorithm is important for the performance of hierarchical and passive model order reduction algorithms. #### REFERENCES [1] H. H. Chen and D. D. Ling, "Power supply noise analysis methodology for deep-submicron VLSI chip design," in *Proc. Design Automation Conf.*, CA, Jun. 1997, pp. 638–643. - [2] M. K. Gowan, L. L. Biro, and D. B. Jackson, "Power considerations in the design of the Alpha 21 264 microprocessor," in *Proc. Design Au*tomation Conf., CA, Jun. 1998, pp. 726–731. - [3] A. Dharchoudhury, R. Panda, D. Blaauw, and R. Vaidyanathan, "Design and analysis of power distribution networks in PowerPC microprocessors," in *Proc. Design Automation Conf.*, CA, Jun. 1998, pp. 738–743. - [4] Y.-M. Jiang and K.-T. Cheng, "Analysis of performance impact caused by power supply noise in deep submicron devices," in *Proc. Design Au*tomation Conf., LA, Jun. 1999, pp. 760–765. - [5] J. M. Wang and T. V. Nguyen, "Extended Krylov method for reduced order analysis of linear circuits with multiple sources," in *Proc. Design Automation Conf.*, CA, Jun. 2000, pp. 247–252. - [6] M. Zhao, R. V. Panda, S. S. Sapatnekar, T. Edwards, R. Chaudhry, and D. Blaauw, "Hierarchical analysis of power distribution networks," in *Proc. Design Automation Conf.*, CA, Jun. 2000, pp. 150–155. - [7] S. R. Nassif and J. N. Kozhaya, "Fast power grid simulation," in *Proc. Design Automation Conf.*, CA, Jun. 2000, pp. 156–161. - [8] A. Krstic and K. Cheng, "Vector generation for maximum instantaneous current through supply lines for CMOS circuits," in *Proc. Design Au*tomation Conf., CA, Jun. 1997, pp. 383–388. - [9] S. Bobba and I. N. Hajj, "Estimation of maximum current envelope for power bus analysis and design," in *Proc. Int. Symp. Phys. Design*, CA, Apr. 1998, pp. 141–146. - Apr. 1998, pp. 141–146. [10] L. W. Nagel, "SPICE2: A Computer Program to Simulate Semiconductor Circuits," Univ. California, Berkeley, ERL Memo ERL-M520, 1975 - [11] L. T. Pillage, R. A. Rohrer, and C. Visweswariah, Electronic Circuit and System Simulation Methods. New York: McGraw-Hill, 1995. - [12] M. Celik, L. T. Pillage, and A. Odabasioglu, IC Interconnect Analysis. Norwell, MA: Kluwer, 2002. - [13] L. Pillage and R. A. Rohrer, "Asymptotic waveform evaluation for timing analysis," *IEEE Trans. Computer-Aided Design Integr. Circuits* Syst., vol. 9, no. 4, pp. 352–366, Apr. 1990. - [14] P. Feldman and R. W. Freund, "Efficient linear circuit analysis by Pade approximation via the Lanczos process," *IEEE Trans. Computer-Aided Design Integr. Circuits Syst.*, vol. 14, no. 5, pp. 639–649, May 1995. - [15] Q. J. Yu and E. S. Kuh, "Exact moment model of transmission lines and application to interconnect delay estimation," *IEEE Trans. VLSI Syst.*, vol. 3, no. 3, pp. 311–322, Jun. 1995. - [16] A. Odabasioglu, M. Celik, and L. T. Pillage, "PRIMA: Passive reducedorder interconnect macromodeling algorithm," in *Proc. Int. Conf. Com*puter-Aided Design, CA, Nov. 1997, pp. 58–65. - [17] S. P. Chan, S. Y. Chan, and S. G. Chan, Analysis of Linear Networks and Systems: A Matrix-Oriented Approach with Computer Applications. Reading, MA: Addison-Wesley, 1970. - [18] T.-H. Chen and C. C.-P. Chen, "Efficient large-scale power grid analysis based on preconditioned Krylov-subspace iterative methods," in *Proc. Design Automation Conf.*, NV, Jun. 2001, pp. 559–562. - [19] T.-H. Chen, C. Luk, H. Kim, and C. C.-P. Chen, "INDUCTWISE: Inductance-wise interconnect simulator and extractor," in *Proc. Int. Conf. Computer-Aided Design*, CA, Nov. 2002, pp. 215–220. **Yu-Min Lee** (M'03) received the B.S and M.S. degrees in communication engineering from the National Chiao-Tung University, Taipei, Taiwan, R.O.C., in 1991 and 1993, respectively, and the Ph.D degree in electrical and computer engineering from the University of Wisconsin, Madison, in 2003. Since 2003, he has been an Assistant Professor in the Department of Communication Engineering, National Chiao-Tung University. His current research interests include electrical design automation, model order reduction, and signal integrity analysis. Yahong Cao received the B.S. degree in electrical engineering from the Beijing University of Posts and Telecommunications, Beijing, China, in 1999 and the M.S. degree in electrical and computer engineering from the University of Wisconsin, Madison, in 2002. She is currently with Cadence Design Systems, San Jose, CA. Her research interests include circuit simulation and power grid analysis. **Tsung-Hao Chen** received the B.S. degree in electrical engineering from the National Taiwan University, Taipei, Taiwan, R.O.C., in 1996 and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Wisconsin, Madison, in 2002 and 2004, respectively. During the summer of 2001, he was a Circuit Researcher with the Circuit Research Lab, Intel Corporation. Since 2004, he has been a Senior Research and Development Engineer with the Implementation Group, Synopsys Inc. His research interests are in the area of computer-aided design, which includes numerical aspects of circuit simulation, signal integrity analysis, parasitic extraction, and circuit modeling. Janet Meiling Wang (M'00) received the B.S. degree in computer science from the Nanjing University of Science and Technology (NUST), Nanjing, China, in 1991, the M.E. degree in computer engineering from the Chinese Academy of Sciences, Beijing, in 1994, and the M.S. and Ph.D. degrees in engineering and computer science from the University of California, Berkeley, in 1997 and 2000, respectively. She was with AT&T Bell Laboratories, Murray Hill, NJ, in the summers of 1997 and 1998. She also spent the summer of 1999 with IBM Austin Research Lab, Austin, TX. From 2000 to 2002, she was with the Intel Corporation, Santa Clara, CA, and the Cadence Corporation, respectively. In 2003, she joined the faculty at the University of Arizona, Tucson, where she is currently an Assistant Professor in the Electrical and Computer Engineering Department. Her current research interests include nanotechnology verification, design for manufacturability, process variation , and design verification Charlie Chung-Ping Chen (M'98) received the B.S. degree in computer science and information engineering from the National Chiao-Tung University, Hsinchu, Taiwan, R.O.C., in 1990 and the M.S. and Ph.D. degrees in computer science from the University of Texas, Austin, in 1996 and 1998, respectively. From 1996 to 1999, he was a Senior CAD Engineer with the Strategic CAD Labs, Intel Corporation. Since 1999, he has been an Assistant Professor in the Electrical and Computer Engineering Department, University of Wisconsin, Madison. Since 2003, he has been an Associate Professor in the Electrical Engineering Department, National Taiwan University, Taipei, Taiwan. His research interests are in the areas of computer-aided design and microprocessor circuit design with an emphasis on interconnect and circuit optimization, circuit simulation, and signal/power/thermal integrity analysis and optimization. Prof. Chen received the D2000 Award from the Intel Corporation in 1999 and the National Sciences Foundation Faculty Early Career Development Award (CAREER) in 2001. He also received the 2002 Sigda/ACM Outstanding Young Faculty Award, the 2002 Peter Schneider Faculty Development Award, and the 2003 International Physical Design Symposium (ISPD) Best Paper Award. He served as the Taiwan representative for the 2003–2004 International Technology Roadmap for Semiconductors Meeting. He served Technical Program Committee and Organizer positions for DAC, ICCAD, DATE, ISPD, ASPDAC, ISQED, VLSI/CAD Symposium, and SASIMI conferences.