# 國立交通大學

電子工程學系電子研究所

博士論文

以電流模式操作之低功率和高速率的靜態隨機 存取記憶體

Low Power and High Speed SRAM with Current-Mode Techniques

研究生:王上銘

指導教授 : 吳 慶 源 博士

中華民國九十三年六月

## 以電流模式操作之低功率和高速率的靜態隨機 存取記憶體

# Low Power and High Speed SRAM with Current-Mode Techniques

研究生: 王上銘 Student: Shang-Ming Wang

指導教授: 吳 慶 源 博士 Advisor: Dr. Ching-Yuan Wu



#### A Thesis

Submitted to Institute of Electronics

College of Electrical Engineering and Computer Science

National Chiao-Tung University

in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

in

Electronic Engineering

June 2004
Hsinchu, Taiwan, Republic of China

中華民國九十三年六月

# 以電流模式操作之低功率和高速率的 隨機靜態存取記憶體

學 生: 王上銘 指導教授: 吳慶源

#### 國立交通大學電子研究所

#### 摘要

本論文主要針對低功率消耗的靜態隨機存取記憶體的設計與分析。靜態隨機 存取記憶體的存取路徑可分為三部分:一為寫入路徑,從位元址輸入到列位元線 端;另一為讀取路徑,從列位元線到資料輸出端;最後為記憶細胞元。

藍達雙極性電晶體是利用金氧半場效電晶體與其寄生的雙極性電晶體所合成的一種電壓控制負微分電阻電晶體,可應用於記憶的元件。本文提出新的藍達 雙極性電晶體結構,並以簡單的電路模式與元件物理來探討其工作原理。利用所 提的藍達雙極性電晶體,設計完成新型單邊讀寫記憶細胞元。

設計一個低功率且高效能的靜態隨機存取記憶體,常常著重於減少工作時的 工率及被用狀態的直流電流與漏電流。為減少讀寫操作時所消耗的功率,我們提 出電流模式讀寫操作的機制以取代傳統的電壓模式。本論文提出電流模式操作感 測放大器,當位元線的電壓僅需少許的變化,此感測放大器便能順利的讀取,並 且能降低雜訊。另外,提出以電流模式操作的寫入驅動器,其寫入時僅需將位元 線的電壓作少許的變化,不僅可降低功率的消耗並可加速寫入的動作。利用電流 模式的讀寫技術,讀取速度與寫入的脈衝寬度幾乎與位元線和資料元線的電容負 載無關。根據此電流模式,提出一個可操作於高速低功率的細胞元。此細胞元的 存取電晶體和反向器電晶體的尺寸幾乎相同,並可經由位元線的微小電壓差而驅 動。 為評估此電流模式技術,使用 0.35 微米一層複晶矽兩層金屬製程,製作一個 32Kx8 的靜態隨機存取記憶體。此記憶體在 3 伏特供壓下其存取時間為 9 奈秒,動態電流在 100 百萬赫茲頻率工作時為 28 毫安培。



# Low Power and High Speed SRAM with Current-Mode Techniques

Student: Shang-Ming Wang Advisor: Dr. Ching-Yuan Wu

Institute of Electronics
National Chiao-Tung University

#### **Abstracts**

This thesis explores the design and analysis of Static Random Access Memories (SRAMs) and focuses on low power operation. The SRAM access path is split into three portions: from address input to word line rise (the write operation), from word line rise to data output (the read path) and memory cell. The techniques to optimize both of these paths are investigated.

The voltage-controlled negative-differential-resistance device by using a merged integrated circuit of n-channel MOSFET and parasitic NPN bipolar transistor, called Lambda bipolar transistor (LBT), is known for its memory application. In this thesis, a new LBT structure is developed and its characteristics are derived by simple circuit model and device physics. A novel single-sided memory cell based on the proposed LBT's is presented.

High performance and low power SRAM design always focuses on reducing dynamic power dissipation at the operating state and decreasing DC current and leakage current at the standby state. To reduce operation

power without decreasing read/write speed, we propose special current-mode read/write mechanism instead of conventional voltage-mode circuits. In this thesis, a new current-mode sense amplifier is proposed to sense the bit-line signal even though the voltage swing of the bit-line is small, and the non-floating design reduces noise produced during sensing in the standby mode. The current-mode write driver can reduce the bit-line swing when data write in, not only decreasing power consumption but also speeding up writing access time. Using new current-mode techniques for read and write operation, the sensing speed and write pulse width are insensitive to the bit-line and data-line capacitances and a separated positive feedback technique is used to enable the circuit to operate at high-speed and low-power. These techniques always keep the voltage swing of the bit-line and data-line quite small. Based on current-mode operation, a memory cell that operates at low-power current-mode is developed. The memory cell has almost equally sized access and inverter transistors, which can be toggled using a small differential bit-line voltage.

The presented techniques were demonstrated to be useful by evaluating an experimental 32Kx8 SRAM chip using 0.35um 1P2M CMOS process technology. An experimental 32Kx8 CMOS SRAM with a 9ns access time at a supply voltage of 3V is described to evaluate the new current-mode techniques. The active current is 28mA at 100MHz and 25°C.

#### Acknowledgements

I thank my advisor Prof. Ching-Yuan Wu for his invaluable guidance throughout the course of this thesis. His insights and wisdom have been a great source of inspiration for this work. He not only encourages me to pursue this degree but also provides me the best research environment with generous support. Special thanks are given to Silicon-Based Technology Corp. for financial support, especially chip implementation.

Thanks to my many friends on campus and outside, I have had many memorable moments outside my work. I would especially like to thank Prof. Bill Tai and the members of Silicon-Based Technology Corp.

Furthermore, I would like to thank the members of the Advanced Semiconductor Device Lab. for their valuable suggestions, interesting discussions, and their friendship.

I am grateful to my sisters and brother for being such a wonderful family. I thank my wife, Hsiao-Mei, for her love and patience during the final stages of my dissertation. Last but not the least, I am eternally indebted to by my parents for their love, encouragement and support. This dissertation is dedicated to them.

## **Contents**

| Abstract(Chinese)                                | 1    |
|--------------------------------------------------|------|
| Abstract(English)                                | iii  |
| Acknowledgments                                  | v    |
| Contents                                         | vi   |
| Figure Captions                                  | viii |
| Table Captions                                   | xi   |
| Chapter 1 Introduction                           | 1    |
| Chapter 2 Overview of CMOS SRAM                  | 6    |
| 2.1 SRAM Partitioning                            | 6    |
| 2.2 Circuit Techniques in SRAMs                  | 8    |
| Chapter 3 Lambda Bipolar Transistor Memory Cell  | 16   |
| 3.1 New Lambda Bipolar Transistor                | 18   |
| 3.2 Description of New Memory Cell               | 23   |
| 3.3 Performance of New Memory Cell               | 28   |
| 3.3.1 Write Operation                            | 28   |
| 3.3.2 Read Operation                             | 29   |
| 3.3.3 Comparisons                                | 31   |
| Chapter 4 New Current-Mode Sense Amplifier       | 33   |
| 4.1 Introduction                                 | 33   |
| 4.2 Voltage Sensing and Current Sensing          | 35   |
| 4.2.1 Theoretical Model                          | 35   |
| 4.2.2 Voltage-Mode and Current-Mode Signal Delay | 38   |
| 4.3 Voltage-Mode Sense Amplifier                 | 43   |
| 4.4 Clamped Bit-Line Sense Amplifier             | 45   |
| 4.5 New Current-Mode Sense Amplifier             | 48   |
| 4.5.1 Circuit Description and Operation          | 48   |
| 4.5.2 Simulation Results                         | 52   |
| Chapter 5 New Current-Mode Write Driver          | 57   |
| 5.1 Conventional Voltage Writing Mechanism       | 58   |

| 5.2 Current Writing with Equalization Transistor      | 59   |
|-------------------------------------------------------|------|
| 5.3 New Current Writing Mechanism                     | 62   |
| 5.3.1 Current-Mode Write Driver                       | . 62 |
| 5.3.2 New Memory Cell for Current-Mode Operation      | 64   |
| 5.3.3 Simulation Results and Comparisons              | 66   |
| Chapter 6 Low Power and High Speed SRAM               | 69   |
| 6.1 Low Power SRAM Architecture                       | . 69 |
| 6.2 Cell Design and Layout                            | . 71 |
| 6.3 Process Variation Effects on Current-Mode Circuit | 73   |
| 6.4 Experimental Results                              | . 76 |
| Chapter 7 Conclusions                                 | . 80 |
| References                                            | 82   |



## **List of Figures**

Elementary SRAM structure

Fig. 1

| Fig. 2  | Divided Word Line (DWL) Architecture                                                    |
|---------|-----------------------------------------------------------------------------------------|
| Fig. 3  | Schematic of a two-level 8 to 256 decoder                                               |
| Fig. 4  | <ul><li>a) Conventional static NAND gate</li><li>b) Nakamura's NAND gate [35]</li></ul> |
| Fig. 5  | Skewed NAND gate                                                                        |
| Fig. 6  | Bit-line mux hierarchies in a 512 row block                                             |
| Fig. 7  | Two common types of sense amplifiers                                                    |
| Fig. 8  | A vertical Lambda bipolar transistor structure                                          |
| Fig. 9  | An equivalent circuit of a vertical Lambda bipolar transistor                           |
| Fig. 10 | The I-V characteristics of a vertical Lambda bipolar transistor                         |
| Fig. 11 | General configuration of a new memory cell                                              |
| •       | The I-V characteristics of a new memory cell with current and resistive load            |
| Fig. 13 | A new SRAM memory cell circuit                                                          |
| Fig. 14 | The static transfer characteristics of the memory cells                                 |
| Fig. 15 | Write "0" operation                                                                     |
| Fig. 16 | Write "1" operation                                                                     |
| Fig. 17 | Read "0" operation                                                                      |
| Fig. 18 | Read "1" operation                                                                      |
| Fig. 19 | Sensing delay versus bit-line capacitance                                               |
| Fig. 20 | Typical use of a sense amplifier                                                        |
| Fig. 21 | Theoretical voltage-mode signal model                                                   |
| Fig. 22 | Theoretical current-mode signal model                                                   |
| Fig. 23 | CMOS representation for a voltage-mode signal model                                     |
| Fig. 24 | CMOS representation for a current-mode signal mode                                      |
| Fig. 25 | A long interconnect model                                                               |
|         |                                                                                         |

- Fig. 26 Comparison of voltage sensing and current sensing
- Fig. 27 Comparison of voltage sensing and current sensing with different values of load resistance
- Fig. 28 Comparison of voltage-sensing and current-sensing with approximations
- Fig. 29 Simple differential couple schematic
- Fig. 30 Full complementary positive feedback amplifier schematic
- Fig. 31 Clamped bit-line sense amplifier
- Fig. 32 A current-mode sense amplifier and a simplified data path circuit
- Fig. 33 Simulated current waveforms of the new current-sensing data path circuit
- Fig. 34 Simulated waveforms of the new current-sensing data path circuit
- Fig. 35 Sensing delay and average power dissipation versus bit-lines capacitance
- Fig. 36 Sensing delay and average power dissipation versus data-lines capacitance
- Fig. 37 Bit-line model during write access cycle
- Fig. 38 7T-memory cell
- Fig. 39 A current-mode write driver and a simplified data path circuit
- Fig. 40 Schematic of the memory cell
- Fig. 41 The static transfer characteristics of the memory cell
- Fig. 42 Simulated waveforms of the new current-writing data path circuit
- Fig. 43 Write pulse width and average power dissipation versus data-lines capacitance
- Fig. 44 Architecture of low power memory chip
- Fig. 45 The layout of memory cell
- Fig. 46 Layout placement of same-size transistor
- Fig. 47 Sensing delay and average power dissipation with process variations versus bit-lines capacitance
- Fig. 48 Write pulse width and average power dissipation with process

#### variations versus data-lines capacitance

- Fig. 49 A photomicrograph of 32Kx8 SRAM
- Fig. 50 Typical address and output waveforms
- Fig. 51 Shmoo plot of address time versus power supply voltage



## **List of Tables**

Table 1. Comparison to conventional SRAM cell

Table 2. Process and SRAM characteristics



## **Chapter 1**

#### Introduction

High-speed and low-power SRAMs have become a critical component of many VLSI chips. This is especially true for microprocessors, wherein the on-chip cache sizes are growing with each generation to bridge the increasing divergence in the speeds of the processor and the main memory [1-2]. Simultaneously, power dissipation has become an important consideration both due to the increased integration and operating speeds, as well as due to the explosive growth of battery operated appliances [3]. This thesis explores the design of SRAMs and focuses on reducing the operating power. While process scaling [4-5] remains the biggest drivers of low power design, this thesis investigates some circuit techniques which can be used in conjunction to scaling to achieve low power operation.

Conceptually, a SRAM has the structure shown in Fig.1. It consists of a matrix of 2<sup>m</sup> rows by 2<sup>n</sup> columns of memory cells. Each memory cell in a SRAM contains a pair of cross-coupled inverters which form a bi-stable element. These inverters are connected to a pair of bit-lines through NMOS pass transistors which provide differential read and write access. A SRAM also contains some column and row circuitries to access these cells. The m+n bits of address input, which identify the cell to be accessed, are split into m row address bits and n column address bits. The row decoder activates one of the 2<sup>m</sup> word lines, which connects the memory cells of that row to their respective word line. The column

decoder sets a pair of column switches, which connects one of 2<sup>n</sup> bit-lines to the peripheral circuits.



Fig.1 Elementary SRAM structure

In a read operation, the bit-lines start precharged to a reference voltage usually close to the positive supply. When the word line turns high, the access NMOS connected to the cell node being stored a data '0' starts discharging the bit-line, while the complementary bit-line remains in its precharged state, thus resulting in a differential voltage being developed across the bit-line pair. Each SRAM cell is optimized to minimize the cell area, and hence its cell current is very small, resulting

in a slow bit-line discharge rate. To speed up the RAM access, each sense amplifier is used to amplify the small bit-line signal and eventually drives the signal to the external world.

During a write operation, the write data is transferred to the desired columns by driving the data onto the bit-line pairs by grounding either the bit-line or its complementary. If the cell data is different from the write data, then the data '1' node is discharged when the access NMOS connects to the discharged bit-line, thus causing the cell to be written with the bit-line value.

The next chapter introduces the various techniques which are used in practical SRAMs. For the purpose of design and optimization, the access path can be divided into two portions: the read path- the portion from the memory cell ports to the SRAM I/O ports and the write path- the portion from the I/O ports to the memory cell.

In most SRAM cell design, the basic flip-flop circuit structure is the most frequently used. However, a full CMOS cell usually occupies twice larger area as compared with high-resistance poly load and poly-PMOS TFT load cells. On the other hand, a high-resistance poly load cell consumes relatively high standby power. Therefore, several earlier works [6-11] on single-sided memory cells had been conducted for both power and area reduction. In this thesis, we proposed new single-sided memory cells based on new Lambda bipolar transistor (LBT). In Chapter 3, we report the new LBT, and the new LBT is developed based on the original LBT structure with a modification for low power purpose. The operation principle of the device is derived by simple circuit model and device physics. In Chapter 3, we also present the new single-sided memory cells

based on our LBT. Some comparisons between the reported memory cells and the single-sided CMOS cell are made.

For many years, the design of SRAM circuits has focused on improving the operation speed. For example, the capacity of SRAM quadruples every three years, and various voltage-mode sense amplifiers have been used in many generations of SRAM. As the bit-line and data-line capacitances get larger and larger as SRAM evolves, memory access time using voltage-mode sense amplifier will become quite long. Meanwhile, power supply voltage should be reduced in the future VLSI design for the sake of power reduction and device reliability. In order to overcome these problems, several papers [12-17] had proposed to use current-mode sense amplifiers for the future. However, the DC current through the sense amplifier would not be eliminated. To solve this DC power consumption in sense amplifier, a new current-mode sense amplifier is proposed. The new structure not only reduces DC power consumption of sense amplifier but also senses the bit-line differential signal in a very short time. The new n-type separated flip-flop current-mode sense amplifier will be described in Chapter 4.

Chapter 5 discusses the new current-mode write driver circuit. The power consumption of writing data into memory always dominates for a large percentage of whole chip during the writing access cycle. In the past, the voltage-mode writing circuit was used. Using this mechanism, the voltage swing at the bit-line always needs almost full supply voltage swing. Therefore, the dynamic power consumption at the bit-line will increase as large as voltage swing at the bit-line variation. This large voltage swing not only consumes large power when writing data, but also

increases the memory cycle time. The cycle time is long because the bit-line level must be pulled up back to supply voltage after write operation, preparing for next read or write operation. So the operation speed of SRAM is not only determined by the access time but also by the cycle time. Some design concepts [18] using the current-mode technique in write operation have been proposed to reduce the large voltage swing at the bit-line. However, these methods increase the transistor number in memory cell, making the memory size larger. Moreover, the decoder of control signal and timing control become more complicated. The new current writing mechanism is proposed to reduce the large voltage swing at the bit-line without increasing transistor number in memory cell. The decoder architecture and timing control signal are as simple as the conventional technique.

In Chapter 6, 32Kx8 SRAM chip is implemented. At the architecture level design, the key goals are localizing on operation signal to reduce active capacitance and switch, reduce signal swings, and eliminate any DC power consumption of system. We finally summarize the main conclusions of this thesis in Chapter 7.

### **Chapter 2**

#### **Overview of CMOS SRAM**

The delay and power of practical SRAMs have been reduced over the years via innovations in the array organization and circuit design. This chapter discusses both these topics and highlights the issues addressed by this thesis. We will first explore the various partitioning strategies in Section 2.1 and then point out the main circuit techniques which have been presented in the literature to improve speed and power in Section 2.2.

### 2.1 SRAM Partitioning

For large SRAMs, significant improvement on delay and power can be achieved by partitioning the cell array into smaller subarrays, rather than having a single monolithic array as shown in Fig.1. Typically, a large array is partitioned into a number of identically sized subarray (commonly referred to as macros), each of which stores a part of the accessed word, called the subword, and all of which are activated simultaneously to access the complete word [19-21]. The macros can be thought of as independent RAMs, except that they might share parts of the decoder.

Each macro conceptually looks like the basic structure shown in Fig.1. During an access to a certain row, the word line activates all the

cells in that row and the desired subword is accessed via the column multiplexers. This arrangement has two drawbacks for macros that have a very large number of columns; the word line RC delay grows as the square of the number of cells in the row, and bit line power grows linearly with the number of columns. Both these drawbacks can be overcome by



Fig.2 Divided Word Line (DWL) Architecture

further subdividing the macros into smaller blocks of cells using the Divided Word Line (DWL) technique as first proposed by [22]. In the DWL technique, the long word line of a conventional array is broken up into k sections, which each section is activated independently thus reducing the word line length by k and hence reducing its RC delay by  $k^2$ . Fig.2 shows the DWL architecture where a macro of 256 columns is partitioned into 4 blocks and each block has only 64 columns. The row selection is now done in two stages, first a global word line is activated which is then transmitted into the desired block by a block select signal to

activate the desired local word line. Since the local word line is shorter (only 64 columns wide), it has a lower RC delay. Though the global word line is nearly as long as the width of the macro, it has a lower RC delay than a full length word line since its capacitive loading is smaller. It sees only the input loading of the four word line drivers instead of the loading of all the 256 cells. In addition, its resistance can be lowered as it could use wider wires on a higher level metal layer. The word line RC delay is reduced by another factor of four by keeping the word-line drivers in the center of the word line segments thus halving the length of each segment. Since 64 cells within the block are activated as opposed to all the 256 cells in the undivided array, the column current is also reduced by a factor of 4. The concept of dividing the word line can be carried out recursively on the global word line (and the block select line) for larger RAMs, and is called the Hierarchical Word Decoding (HWD) technique [23]. Partitioning can also be done to reduce the bit-line height.

Partitioning of the RAM incurs area overhead at the boundaries of the partitions. For example, a partition which dissects the word lines requires the use of word-line drivers at the boundary. Since the RAM area determines the lengths of the global wires in the decoder and the data path, it directly influences their delay and energy.

## 2.2 Circuit Techniques in SRAMs

The SRAM access path can be broken down into two components: the decoder and the data path. The decoder encompasses the circuits from the address input to the word line. The data path encompasses the circuits

from the cells to the I/O ports.

The logical function of the decoder is equivalent to 2<sup>n</sup> -input AND gates, where the large fan-in AND operation is implemented in a hierarchical structure. The schematic of a two-level 8 to 256 decoder is shown in Fig.3. The first level is the predecoder where two groups of four address inputs and their complements (A0, A0, A1, A1, ...) are first decoded to activate one of the 16 predecoder output wires respectively to form the partially decoded products (A0A1A2A3, A0A1A2A3, ...). The predecoder outputs are combined at the next level to activate the



Fig.3 Schematic of a two-level 8 to 256 decoder

word line. The decoder delay consists of the gate delay in the critical path and the interconnect delay of the predecoder and word line wires. As the wire RC delay grows as the square of the wire length, the wire delay within the decoder structure, especially of the word line, becomes significant in large SRAMs. Sizing of gates in the decoder allows for trade off between the delay and the power. Transistor sizing had been studied by a number of researchers for both high speed [24-26] and low power [27-28]. The decoder sizing problem is complicated slightly due to the presence of intermediate interconnect from the predecoder wires.



Fig.4 a) Conventional static NAND gate b) Nakamura's NAND gate [35]

The decoder delay can be greatly improved by optimizing the circuit style used to construct the decoder gates. Older designs implemented the decoder logic function in a simple combinational style using static CMOS circuit style (Fig.4a) [29-31]. In such a design, one of the 2<sup>m</sup> word lines will be active at any time. If in any access, the new row address differs from the previous one, then the old word line is deasserted and the new word line is asserted. Thus, the decoder gate delay in such a design is the maximum of the delay to deassert the old word line and the delay to assert a new word line, and it is minimized when each gate in the decode path is designed to have equal rising and falling delays. The decoder gate delay can be significantly reduced by using pulsed circuit techniques [32-34], where the word line is not a combinational signal but a pulse

which stays active for a certain minimum duration and then shuts off. Thus, before any access all the word lines are off and the decoder just needs to activate the word line for the new row address. Since only one kind of transition needs to propagate through the decoder logic chain, the transistor sizes in the gates can be skewed to speed up this transition and minimize the decoder delay. Fig.4b shows an instance of this technique [35], where the PMOS in the NAND gates are sized to be a half that in a regular NAND structure. In the pulsed design, the PMOS sizes can be reduced by a factor of two and still result in the same rising delay since it is guaranteed that both the inputs will deassert, thus reducing the loading of the previous stage and hence reducing the overall decoder delay. This concept is extended further in [32], where the deassertion of the gate is completely decoupled from its assertion. Fig.5 shows an example of such a gate where the transistor size in the logic chain is skewed heavily to speed up the output assertion once the inputs are activated. The gate is then reset by some additional devices and made ready for the next access. By decoupling the assert and deassert paths, the former can be optimized to reduce the decoder delay.



Fig.5 Skewed NAND gate

The SRAM data path logically implements a multiplexer for reads (and a demultiplexer for writes). In the simplest implementation, the multiplexer has only two levels: at the lowest level, the memory cells in a column are all connected together to a bit line and in the next level, a small number of these bit lines are multiplexed together through column pass transistors (Fig.1). When the bit-line height is very large, it can be further partitioned to form multi-level bit line hierarchies, by using additional layers of metal [36]. In general, the multiplexer hierarchy can be constructed in a large number of ways (2<sup>r-1</sup>\*2<sup>c</sup> mux designs are possible for a 2<sup>r</sup> \* 2<sup>c+k</sup> block with 2r number of rows, 2<sup>c</sup> number of columns and an access width of 2<sup>k</sup> bits). Fig.6 shows two possible designs for a block with 512 rows. The schematic shows only the NMOS pass gates for a single-ended bit line to reduce the clutter in the figure, while the real multiplexer would use CMOS pass gates for differential bit-lines, to allow for reads and writes. Fig.6a shows the single level mux design, where two adjacent columns with 512 cells are multiplexed into a single sense amplifier. Fig.6b shows a two level structure in which the first level multiplexes two 256 high columns, the output of which are multiplexed in the second level to form the global bit lines, feeding into sense amplifiers. Similarly, hierarchical muxing can also be done in the I/O lines which connect the outputs of all the sense amplifiers to the I/O ports [37].

Due to its small size, a memory cell is very weak and limits the bit-line slew rate during reads. Hence sense amplifiers are used to amplify the bit-line signal so signals as small as 100mV can be detected. In a conventional design, even after the sense amplifier senses the bit lines,

they continue to slew to eventually create a large voltage differential. This leads to a significant waste in power since the bit lines have a large capacitance. By limiting the word-line pulse width, we can control the amount of charges pulled down by the bit lines and hence limit power dissipation [38-41]. In this thesis, we use a scheme to control the word line pulse width to be just wide enough, over a wide range of operating conditions, for the sense amplifiers to reliably sense, and prevent the bit lines from slewing further.



Fig.6 Bit-line mux hierarchies in a 512 row block

A number of different sense amplifier circuits have been proposed in the past and they essentially fall into two categories: the linear amplifier type [42-43] and the latch type [19-21]. Fig.7 illustrates a simple prototype of each type. In the linear amplifier type (Fig.7a), the amplifier needs a DC bias current to set it up in the high gain region prior to the arrival of the bit-line signal. To convert the small swing bit-line signal into a full swing CMOS signal, a number of stages of amplification are

required. These kinds of amplifiers are typically used in very high performance designs. Because they consume biasing power and they operate over a limited supply voltage, they are not preferred for low power and low voltage designs. In these designs, the latch-type designs are used (Fig.7b). They consist of a pair of cross-coupled gain stages which are turned on with the aid of a sense clock when an adequate input differential is set up. The positive feedback in the latch leads to a full amplification of the input signal to a full digital level. While this type consumes the least amount of power due to the absence of any biasing power, they could potentially be slower since a timing margin is needed in the generation of the sense clock. If the sense clock arrives before enough input differential is set up, it could lead to a wrong output value. Typically, the sense clock timing needs to be adjusted for the worst case operating and process condition, which in turn slows it down for the typical conditions due to the excess timing margins. In this thesis, we will look at some timing circuits which track the bit-line delay and which are used to generate a sense clock with a reduced timing overhead.

In large SRAMs, another level is added to the data path hierarchy by connecting the outputs of the sense amplifiers onto the I/O lines (Fig.2). The I/O lines transport the signal between the RAM I/O ports to the memory blocks. In large access width SRAMs, the power dissipation of these lines can also be significant and hence the signaling on these is also via small swings [44]. In Chapter 4, we will apply the low swing bit-line technique to the I/O lines too, to reduce the I/O line power.



- a) current mirror amplifier
- b) latch-type amplifier

Fig.7 Two common types of sense amplifiers



### **Chapter 3**

## Lambda Bipolar Transistor Memory Cell

Negative differential resistance semiconductor devices have been known for their memory application. The negative differential resistance or the folded I-V characteristics of devices makes it possible to have the multiple stable states with good margins in a simple circuit consisting of just a few devices. This fact was recognized by several researchers and several compact multiple-valued storage functions had been described in the literature. For example, Thomas et al. [45] had described a voltage-controlled negative differential resistance device (NEGIT) made by a bipolar transistor and an extended field plate over the emitter-base junction. The operation of NEGIT depends on some uncontrollable parameters such as surface recombination velocity and surface state. Wu et al. [46~47] had presented another voltage-controlled negative differential resistance device, called Lambda bipolar transistor (LBT), which merges a NMOS with a bipolar transistor. The LBT's, in particular, had shown clear voltage-controlled negative differential resistance characteristics, which functional circuit applications. advantageous for Moreover, planar-structure LBT's has also been realized to meet the demand for high-level integration. In recent years, quantum devices with carrier transport of resonant tunneling (RT) had been developed, several attempts were made on the RT structures to obtain multiple negative differential resistance characteristics [48]. Based on the multiple negative differential

resistance concepts, many resonant tunneling devices had been developed and fabricated to implement a memory cell [49-51]. However, these structures need the external bias source to separate the peaks and are difficult to incorporate into a bipolar transistor to exploit the additional advantages of high gain and good input-output isolation. Also, they will be difficult to fabricate over million-transistor circuit in III-V technology, and one of the shortcomings is the absence of a reasonable density low-power on-chip memory.

In the past, several earlier works on single-side memory cells had been conducted for both power and area reduction. Among these, Takagi et al. [6] proposed Dual Depletion CMOS memory cell; Schrader et al. [7] proposed a static memory based Schmitt trigger circuit; Elmasry et al. [8] proposed double-Lambda diode (DOL) memory cells, and they also proposed SDW MOSFET memory cell [9] by using single-device well MOSFET's. To our knowledge, none of them has been implemented far for practical applications. Besides, as the advance of submicrometer device fabrication technology, the area of memory cells is continuously scaled down, leading to fine metal bit-line problems. The fine metal bit-lines in a high-resistance poly load or ploy-PMOS load cell will induce large signal delays, or high current density could cause reliability degradation. For these reasons, several new works on single-bit line SRAM's were proposed. Sasaki et al. [10] had proposed a high-density 16-Mb SRAM, and a CMOS flip-flop circuit is acted as the storage element with only one access transistor. However, the noise margin is small based on this structure. Ukita et al. [11] had proposed an ultra-low power SRAM. They used the same CMOS flip-flop as the

storage element but with two serial-connected access transistors at one side: one is for X address selection and the other is for Y address selection. This structure can achieve very low power requirement. However, as the driver to load ratio is small, the delay time becomes significant.



Fig.8 A vertical Lambda bipolar transistor structure

The proposed new memory cell is based on Wu's Lambda bipolar transistor (LBT) developed in 1980's. The LBT is a highly integrated device characterized by its voltage-controlled negative differential resistance, and has been used successful in many applications [52-53]. If the LBT is to be used in static random access memory cell, the standby current in one of its storage states is relatively high. For this reason, a new LBT is proposed for low power applications.

#### 3.1 New Lambda Bipolar Transistor

The basic structure of the new Lambda bipolar transistor and its electrical equivalent circuit connection are shown in Fig.8 and Fig.9, respectively. From Fig.8, the n-channel enhancement-mode MOSFET's are fabricated upon the base region of a vertical NPN bipolar transistor, which is called the vertical Lambda bipolar transistor (VLBT). The source of one of the MOSFET's (labeled as E') is utilized as the emitter of the vertical NPN bipolar transistor, while the p-type diffusion well and the n-type epi-layer act as the base and the collector, respectively. The equivalent circuit is shown in Fig.9 and their interconnections can be clearly seen. It could be noted that, in ordinary circuit applications, E' is biased at a voltage level lower than B' and C'. Therefore, E' is the only possibly turned-on PN junction, i.e. the other three sources/drains other than E' have no chance to act as the emitter of the vertical bipolar transistor.



Fig.9 An equivalent circuit of a vertical Lambda bipolar transistor

The vertical Lambda bipolar transistor is operated in the same way as

the conventional bipolar transistor with a fixed external base current. From the terminal characteristics of the separate devices, the general equations for the proposed VLBT, according to the circuit model of Fig.9, can be written as

$$I_{B'} = \begin{cases} K_1 \left[ \left( V_{C'E'} - V_{T1} \right) V_{B'E'} - \frac{V_{B'E'}^2}{2} \right] & if \left( V_{C'E'} - V_{T1} \right) > V_{B'E'} \\ \frac{K_1}{2} \left( V_{C'E'} - V_{T1} \right)^2 & if \left( V_{C'E'} - V_{T1} \right) < V_{B'E'}, \end{cases}$$

$$(1)$$

where  $K_1 = C_{ox} \mu_n W_1 / L_1$  and  $V_{T1}$  is the threshold voltage of  $M_1$ .

$$I_{C'} = I_C + I_B \tag{2}$$

$$I_{B} = \begin{cases} K_{2} \left[ (V_{B'E'} - V_{T'})(V_{C'E'} - V_{BE'}) - \frac{(V_{C'E'} - V_{BE'})^{2}}{2} \right] & if (V_{B'E'} - V_{T2}) > V_{C'E'} \\ \frac{K_{2}}{2} (V_{B'E'} - V_{T'})^{2} & if (V_{B'E'} - V_{T2}) < V_{C'E'}, \end{cases}$$

$$(3)$$

where  $K_2 = C_{ox} \mu_n W_2 / L_2$ ,  $V_{T2}$  is the threshold voltage of  $M_2$ , and

$$V_{T'} = V_{T2} + V_{BE'}$$
 .

$$I_C = \beta I_B + I_{CO} (1 + \beta) \tag{4}$$

where  $\beta$  is the dc common-emitter current gain of the NPN bipolar transistor, and  $I_{CEO}=I_{CO}(1+\beta)$  is the common-emitter collector reverse saturation current.

A certain current source load  $M_3$  operated in saturation region is chosen for derivation. The current equation can be written as

$$I_{B'} = \frac{K_3}{2} (V_{GG} - V_{B'E'} - V_{T3})^2 \qquad for \quad V_{GG} > (V_{B'E'} + V_{T3})$$
 (5)

where  $K_3 = C_{ox} \mu_n W_3 / L_3$ ,  $V_{T3}$  is the threshold voltage of  $M_3$  and the  $V_{GG}$ is the power supply connected to the drain of the  $M_3$ .

In order to get analytical expressions, the body effects are assumed to be negligible. To see the quantitative operational principles of a VLBT shown in Fig. 10, the six-region analyses are given as follow:

#### Region I:

If  $V_{C'E'} < V_{BE'(on)} < V_{T1}$ , ,  $M_1$  is off,  $M_2$  is operated in linear region, and  $Q_1$  is off. In this region,  $I_B=0$ , thus  $I_{C'}=I_{CO}(1+\beta)$ .

Region II:

from II:  
If 
$$V_{BE'(on)} < V_{C'E'} < V_{T1}$$
 and  $V_{B'E'} = V_{GG} - V_{T3} - K \left[ \sqrt{2\phi_{fp} + V_{B'E'}} - \sqrt{2\phi_{fp}} \right] > V_{T'}$ 

(where K is the modifying substrate factor), M<sub>1</sub> is off, M<sub>2</sub> is kept in linear region, and  $Q_1$  is operated in forward-active region. By solving  $V_{B'E'}$ , i.e.

$$V_{B'E'} = \left(V_{GG} - V_T + K\sqrt{2\phi_{fp}} + \frac{K^2}{2}\right) + K\left(V_{GG} - V_T + K\sqrt{2\phi_{fp}} + \frac{K^2}{4} + 2\phi_{fp}\right)^{1/2}$$
 (6)

we get the output current in this region:

$$I_{C'} = \left(1 + \beta\right) \left[ K_2 \left[ \left( \left( V_{GG} - V_T + K \sqrt{2\phi_{fp}} + \frac{K^2}{2} \right) + K \sqrt{\left( V_{GG} - V_T + K \sqrt{2\phi_{fp}} + \frac{K^2}{4} + 2\phi_{fp} \right)} - V_{T'} \right] \left( V_{C'E'} - V_{BE'} \right) - \frac{\left( V_{CE'} - V_{BE'} \right)^2}{2} + I_{CO} \right]$$

$$(7)$$

Region III:

If  $V_{T1} < V_{C'E'} < V_{B'E'} - V_{T2}$  and assuming that  $V_{B'E'} > V_{T'}$ ,  $M_1$  is operated in saturation region,  $M_2$  is kept in linear region, and  $Q_1$  is still in forward-active region. Solving  $V_{B'E'}$  by equating (1) and (5), we obtain

$$V_{B'E'} = V_{GG} - \sqrt{\frac{K_1}{K_3}} (V_{C'E'} - V_{T1}) - V_{T3}$$
(8)

and the output current in this region is

$$I_{C'} = (1 + \beta) \left[ K_2 \left[ V_{GG} - \sqrt{\frac{K_1}{K_3}} (V_{C'E'} - V_{T1}) - V_{T3} - V_{T'} \right] (V_{C'E'} - V_{BE'}) - \frac{(V_{C'E'} - V_{BE'})^2}{2} + I_{CO} \right]$$

$$(9)$$

The peak current is  $I_P = I_{C'|V_{B'E'}=V_P}$ , where the peak voltage  $V_P$  can be derived by letting  $\frac{\partial I_{C'}}{\partial V_{C'E'}} = 0$ , i.e.

$$V_{P} = \frac{K_{3}(V_{GG} - V_{T2} + V_{T3}) + \sqrt{K_{1}K_{3}}(V_{BE'} + V_{T1})}{K_{3} + 2\sqrt{K_{1}K_{3}}}$$
(10)

ion IV: If  $V_{B'E'} - V_{T2} < V_{C'E'} < V_{B'E'} + V_{T1}$  and  $V_{B'E'} > V_{T'}$ ,  $M_1$  and  $M_2$  are both operated in saturation region, and  $Q_1$  is operated in forward-active region. Using equations (1),(2),(4), and (8), we obtain

$$I_{C'} = \left(1 + \beta\right) \left[ \frac{K^2}{2} \left( V_{GG} - \sqrt{\frac{K_1}{K_3}} \left( V_{C'E'} - V_{T1} \right) - V_{T3} - V_{T'} \right)^2 + I_{CO} \right]$$
 (11)

By differentiating (11) with respect to  $V_{C'E'}$ , the output resistance in this region can be written as

$$R_{O} = \frac{-1}{(1+\beta)K_{2}\sqrt{\frac{K_{1}}{K_{3}}} \left[V_{GG} - \sqrt{\frac{K_{3}}{K_{1}}} (V_{C'E'} - V_{T1}) - V_{T3} - V_{T'}\right]}$$
(12)

Region V:

When  $V_{C'E'} > V_{B'E'} + V_{T1}$ ,  $M_1$  is operated in linear region and  $M_2$  is operated in saturation region.

Assuming  $V_{B'E'} > V_T$ ,  $Q_1$  is operated in forward-active region. The output

current is 
$$I_{C'} = (1 + \beta) \left[ \frac{K_2}{2} (V_{B'E'} - V_{T'})^2 + I_{CO} \right]$$
. Solving (1) and (5), gives 
$$V_{B'E'} = \frac{K_1 (V_{C'E'} - V_{T1}) + K_3 (V_{GG} - V_{T3})}{K_1 + K_3}$$

$$-\frac{\left[\left[K_{1}\left(V_{C'E'}-V_{T1}\right)+K_{3}\left(V_{GG}-V_{T3}\right)\right]^{2}-K_{3}\left(K_{1}+K_{3}\right)\left(V_{GG}-V_{T3}\right)^{2}\right]^{\frac{1}{2}}}{K_{1}+K_{3}}$$
(13)

By a chain rule, we have the output resistance

$$R_{O} = \frac{1}{\frac{\partial I_{C'}}{\partial V_{B'E'}}} \frac{\partial I_{B'E'}}{\partial V_{C'E'}}$$

$$= \frac{1}{(1+\beta)K_{2}(V_{B'E'}-V_{T'})(K_{1}+K_{3})} \times \left[K_{1} - \frac{2K_{1}[K_{1}(V_{C'E'}-V_{T1})+K_{3}(V_{GG}-V_{T3})]}{2[[K_{1}(V_{C'E'}-V_{T1})+K_{3}(V_{GG}-V_{T3})]^{2}-K_{3}(K_{1}+K_{3})(V_{GG}-V_{T3})^{2}]^{\frac{1}{2}}}\right]$$
(14)

The valley voltage  $V_v$  can be obtained by solving  $V_{B'E'}(V_{C'E'}) = V_{T'}$ , i.e.

$$V_{V} = \frac{K_{3}(V_{GG} - V_{T'} + V_{T3})^{2} + K_{1}(V_{T'}^{2} + 2V_{T'}V_{T1})}{2K_{1}V_{T'}}$$
(15)

Region VI:

When  $V_{C'E'} > V_V$ ,  $M_1$  is still operated in linear region,  $M_2$  and  $Q_1$  are both off. Thus,  $I_B = 0$  and  $I_{C'} = I_{CO}(1 + \beta)$ .

The output dc characteristic of the new VLBT is shown in Fig. 3-3.

#### 3.2 Description of the New Memory Cell

The performance of a SRAM strongly depends on the design of its memory cell. Generally, a full CMOS cell is suitable for low power design with acceptable speed. However, it has a significant area penalty over a high-resistance poly load or poly-PMOS load cell. On the contrary,

the fine metal bit-lines in a high-resistance poly load or poly-PMOS load cell will induce large signal delay or high current density, causing reliability



Fig. 10 The I-V characteristics of a vertical Lambda bipolar transistor

degradation. In our thesis, a new single-sided memory cell is proposed to solve these problems.

The general configuration of the proposed static random access memory cell is shown in Fig.11, which consists of a VLBT, a load element, a current source device, and an access transistor. Owing to the negative differential resistance of VLBT, the storage node SN has two dc static points (See Fig.12). Two kinds of load elements, current source-like and resistance-like, can be selected for different applications. For a current source-like load, the current flow at the static points  $S_{C1}$  and  $S_{C2}$  can be both small if the circuit is well-configured. On the contrary, a

resistance-like load memory cell generally suffers dc current flow at the lower static state  $S_{R2}$ . However, it occupies a relative smaller area as compared with a current source-like load one.



Fig.11 General configuration of a new memory cell



Fig.12 The I-V characteristics of a new memory cell with current and resistive load

The new memory cell based on the proposed VLBT is presented in Fig.13. In memory cell configuration,  $M_1$ ,  $M_2$  and  $Q_1$  operate as a VLBT storage element,  $M_3$  acts as a current source,  $M_4$  acts as the load element, and  $M_5$  is the access transistor. When Vx is in the low stable-state  $S_{C1}$ , any positive noise causes Vx to increase slightly. At this moment,  $I_{C^\circ}$  is larger than  $I_{DS4}$  so that Vx discharges to  $S_{C1}$ . If any negative noise causes Vx to reduce a little, the fact that  $I_{DS4}$  is larger than  $I_{C^\circ}$  will cause Vx to be charged to  $S_{C1}$ . Previous description demonstrates why this state is stable. The same argument can apply to the state  $S_{C2}$  to verify this state to be stable.



Fig. 13 A new SRAM memory cell circuit

If any positive noise is introduced as Vx in the switching state  $S_W$ , the positive differential current  $I_{DS4}$ - $I_{C'}$  will charge the node X to the high stable state  $S_{C2}$ . The memory cell no longer stays in the state  $S_W$ . On the other hand, if negative noise is introduced as the memory cell is in the state  $S_W$ , a negative differential current  $I_{DS4}$ - $I_{C'}$  makes the node X to be discharged to  $S_{C1}$ . Both types of noise (positive and negative) cause a

transition from the state  $S_W$  to either the stable state  $S_{C1}$  or  $S_{C2}$ . The stored voltage levels are CMOS like, i.e., a full swing between ground and supply voltage is obtained.

Fig.14 shows the static noise margin (SNM) comparison between our new memory cell and [48] proposed, which is referred as a LBT configuration. The voltage of storage node at any instant is the base-emitter voltage in the LBT configuration, hence is always less than 1V. The SNM of the new memory cell (VLBT) and the LBT configuration are about 1.2V and 0.4V, respectively. The new memory cell has the larger SNM than LBT configuration. It also shows that the LBT configuration requires adequate circuit to sense the state of the cell, because the switch point of the LBT configuration is less than 1V.



Fig. 14 The static transfer characteristics of the memory cells

## 3.3 Performance of the New Memory Cell

Extensive circuit simulations have been carried out to verify the circuit operation and the characteristics of performance. The performance of the proposed circuit is evaluated based on 5V, 0.5um BiCMOS technology. The simulation results are based on 1ns rise and fall time.

#### 3.3.1 Write Operation

In the static memory cell, the write operation is performed by forcing high and low voltage to the bit-line. The operation cycles start at 3ns, turning on the access transistor M<sub>5</sub> by a word-line pulse with 1ns rise time.



Fig.15 Write "0" operation

When changing the binary state of the memory circuit from 1 to 0, the voltage level of node X rapidly decreases. Because the transistor  $M_1$ 

is turned off and the transistor  $M_2$  and  $Q_1$  are turned on, the internal capacitor of node Y is charged very fast via the transistor  $M_3$ . The simulation result is shown in Fig.15. Changing the binary state from 1 to 0 just takes about 0.5ns.

When changing the binary state from 0 to 1, the voltage level of node X increases very fast due to the fact that the current through the transistor  $M_4$  is increased. But with increasing the node voltage Vx, the access transistor as well as the transistor  $M_3$  is turned off. Now, the internal capacitor of node X is charged more slowly via the load transistor  $M_4$ . The simulation result is shown in Fig.16. Changing the binary state from 1 to 0 just takes about 1.5ns.



Fig.16 Write "1" operation

## 3.3.2 Read Operation

The stored data of a memory cell selected by the word-line and the

column decoder has to be read nondestructively. For the read operation, the bit-line capacitor  $C_{BL}$  is precharged to the reference voltage level  $V_{ref}$  and then is left floating. The bit-line voltage versus time during the reading cycle is calculated by assuming that the memory cell has to charge a bit-line capacitor  $C_{BL}$  of 1pF.

Reading a binary 0, the bit-line capacitor has to be discharged via the transistor Q<sub>1</sub>. The current flowing from the bit-line into the circuit should be low enough so that the voltage level of node X does not cross the switching point S<sub>W</sub>. This current will increase with an increasing precharge voltage level on the bit-line. It means that the precharge voltage has an upper limiting voltage. For a precharge voltage level higher than the upper limiting voltage, the circuit becomes unstable and switches into the opposite binary state. The information in the memory cell will then be destroyed during readout. The reading "0" operation is shown in Fig.17.



Fig.17 Read "0" operation

Reading a binary 1, the bit-line capacitor is charged via the transistor  $M_4$ , and the load current will cause the node voltage Vx to full. To avoid the voltage level of node X crossing the switching point  $S_W$ , the load current level should be higher. Therefore, during reading a binary 1, the precharge voltage level has a lower limiting voltage. The reading "1" operation is shown in Fig.18.



Fig.18 Read "1" operation

## 3.3.3 Comparisons

Fig.19 shows comparisons of the transient analysis of read "0" operation with respect to different load capacitances on the bit-line. Since the bipolar transistor is operated from cut-in to forward-active region, the proposed memory cell does not make too much difference on the delay time from conventional single-side CMOS memory cell for a small

bit-line capacitance. However, for a large bit-line capacitance, the proposed memory cell is superior to the conventional one because it owns large cell current. What can be noted is that for a heavily-loaded bit-line, the conventional memory is destructively read, i.e. its storage state is changed from "0" to "1" after read operation. On the contrary, the proposed memory cell maintains its trend on delay time toward a bit-line capacitance. Because the charges required for changing the state of the proposed cell from "0" to "1" are relatively large as compared with the conventional one, which are important for nondestructive read operation.



Fig. 19 Sensing delay versus bit-line capacitance

## **Chapter 4**

## **New Current-Mode Sense Amplifier**

During the reading access cycle, the sense amplifier is one of the most critical element of memory circuit. The conventional sense amplifier is based on the voltage-mode technique, but its sensing time increases as the bit-line capacitance increases and its AC operation power consumption is very large. Several design techniques had been proposed to reduce the power dissipation of static RAM [54] in the past. On the other hand, several current-mode sensing circuits [55-57] had been proposed to overcome the problem of possible speed degradation due to larger bit-line or data-line capacitances.

#### 4.1 Introduction

Due to their great importance in memory performance, sense amplifiers have become a very large class of circuits. Their main function is to sense or detect stored data from a read selected memory cell. Fig.20 shows a typical use of a sense amplifier.



Fig.20 Typical use of a sense amplifier

The memory cell being read produces a current " $I_{DATA}$ " that removes some of the charge (dQ) stored on the pre-charged bit-lines. Since the bit-lines are very long and are shared by other similar cells, the parasitic resistance " $R_{BL}$ " and capacitance " $C_{BL}$ " are large. Thus, the resulting bit-line voltage swing (d $V_{BL}$ ) caused by the removal of "dQ" from the bit line is very small, i.e.,  $dV_{BL}$ = $dQ/C_{BL}$ . Sense amplifiers are used to translate this small voltage signal to a full logic signal that can be further used by digital logic.

The need for increased memory capacity, higher speed, and lower power consumption has defined a new operating environment for future sense amplifiers. Below are some of the effects of increased memory capacity and decreased supply voltage:

- 1) Increasing the number of memory cells per bit-line increases  $C_{\rm BL}$ , while an increase in length of the bit-line increases  $R_{\rm BL}$ .
- 2) Decreasing memory cell area to integrate more memory cells in a single chip reduces the current  $I_{DATA}$  that is driving the heavily loaded bit-line. This coupled with increased  $C_{BL}$  causes an even smaller voltage swing on the bit-line.
- 3) Decreasing supply voltage results in smaller noise margins which in turn affect sense amplifier reliability.

In this Chapter, new current-mode sense amplifiers will be presented and its ability to deal with these newly imposed operating conditions examined.

### 4.2 Voltage Sensing and Current Sensing

Current-sensing or current-mode as the name suggests is the sensing technique which determines the logic value present on a wire based on the current through the wire. The difference between current-sensing and voltage sensing is very subtle for conventional CMOS. As for MOS transistors, there is no current-threshold but they have a voltage threshold and hence, conventionally they determine the signal state by sensing the voltage.

#### **4.2.1** Theoretical Model

Theoretically, a voltage-mode signaling can be modeled as shown in Fig.21. In the voltage mode, the driver drives interconnect and is terminated with an open circuit ( $R_{L} \approx \infty$ ). This allows the voltage at the destination to change based on the input voltage. The sensing circuit at the destination has to then figure out the signal state using this voltage value.



Fig.21 Theoretical voltage-mode signal model

However, in the case of current-sensing the signal is transmitted by a current pulse. The theoretical representation for current sensing would be shown in Fig.22. In a current-sensing, the driver drives a line which is terminated by a short ( $R_L \approx 0$ ). Hence, there exists a path for the current to flow and the sensing circuit at the end of the line has to detect this current to determine the signal value.



Fig.22 Theoretical current-mode signal model

As shown in Fig.23, the conventional way of signaling is voltage-mode. An inverter acting as a driver drives interconnect which builds up a voltage at the end of the line. Since the line ends in the gates of the transistors,  $R_L \approx \infty$ . The voltage sensing circuit is another inverter and since the MOS transistors have voltage thresholds to turn them on or off, the output of the inverter depends on the voltage at its gate. The biggest challenge in current-mode signaling is to design an efficient sensing circuitry, which detects the change in current. A normal driver can be used to drive interconnect and to drive current instead of voltage,



Fig.23 CMOS representation for a voltage-mode signal model

and the end of the line should provide a path to ground. Thus, a current-mode sensing setup would look like the one in Fig.24.



Fig.24 CMOS representation for a current-mode signal model

The main difference between the current-mode and voltage-mode signaling is the termination of interconnect. In the case of current-mode, the termination resistance is very small; while in the case of voltage-mode, it is very large. Since current is used as a mode of signaling in current-mode and there should be a path to ground from driver, static power dissipation is expected in current-mode signaling. Also the receiving (sensing) circuit is complex in current-mode as MOS

transistors don't have a current threshold.

Also since there is a low impedance path to the ground at the end of the line, the capacitance of interconnect is not charged to Vdd but to an intermediate value. Since the sensing current in MOS is not very trivial, most of the current-mode sensing is done differentially. This may require some synchronizing (precharging or pre-equalizing) signal.

#### 4.2.2 Voltage-Mode and Current-Mode Signal Delay

The use of current sensing amplifiers has a number of benefits over voltage sensing amplifiers. The most important ones are significant reductions in bit-line voltage swing and major reductions in sensing delays [58]. These benefits translate to lower dynamic power consumption and increased sensing speed. The key to these improvements lies in the low input resistance of the current sensing amplifier. This becomes evident when examining the equivalent sensing circuit in Fig.25.



Fig.25 A long interconnect model

In this model, we assumed that the output current is a linear-ramp signal as shown in Eq. (4-1), i.e.,

$$i_o = p_o(t - \delta t) \tag{4-1}$$

where  $i_o$  is the output current,  $p_o$  is the constant slope, and  $\delta t$  is the delay.

The analysis shows that the delay for a line is given by the following equation:

$$\delta t = \frac{\left(R_T \cdot C_T\right)}{2} \cdot \left(\frac{R_B + \frac{R_T}{3} + R_L}{R_B + R_T + R_L}\right) + R_B C_T \cdot \left(\frac{R_L}{R_B + R_T + R_L}\right)$$
(4-2)

where  $R_T$  and  $C_T$  are the total bit-line resistance and capacitance.

A voltage mode signal path, the RC line modeled in the above circuit is open circuit, it means that the resistor  $R_L$  is extremely large. When  $R_L$  >>  $R_B$ , it can be assumed to be infinite in the above equation. Therefore, the time constant can be given by:

$$\delta t = \frac{(R_T \cdot C_T)}{2} \cdot \left(1 + \frac{2R_B}{R_T}\right)$$
 (4-3)

When we consider a current mode signal path behavior, the output loading of the long interconnect line is always a low resistance (ideally zero). Therefore, the  $R_L$  modeled in Eq. (4-1) can be ignored, so the time constant can be given by :

$$\delta t = \frac{(R_T \cdot C_T)}{2} \cdot \left( \frac{R_B + \frac{R_T}{3}}{R_B + R_T} \right)$$
(4-4)

Fig.26 shows a comparison of voltage-sensing and current-sensing, Eq. (4-3) and Eq. (4-4). The figure shows that the current-sensing has less delay as compared to the voltage sensing. Actually, the load resistance for

current-sensing is not zero and so the effect of non zero load resistance should be studied. Fig.27 shows a comparison of the current-sensing with different load resistances.



Fig.26 Comparison of voltage sensing and current sensing



Fig.27 Comparison of voltage sensing and current sensing with different values of load resistance

As expected, an increase in load resistance increases the delay in the current-sensing technique, but the increase is not very significant for low resistance of interconnect and/or low resistance of the driver. The plots show that the delay in both current-sensing and voltage-sensing technique increases quadratically with respect to the length of the line (represented by the resistance of the line in the plots).

When we consider the long interconnect line signal path delay, we assume the source resistance is  $1k\Omega$ , and total capacitance distributed in the line is 1pF, and the total resistance of the line is  $100\Omega$ . The time constant of voltage mode signal path is 1.05ns. For the same assumption in current mode signal path, the time constant can be estimated to be 0.047ns.

We makes another approximation, since  $R_B >> R_T$ , the delay for voltage-mode can be approximated as  $R_B C_T$ . Also, the delay for current-mode can be approximated as  $R_T C_T / 2$  and since  $R_B >> R_T$  current-mode is faster than voltage mode.

A plot for comparing the voltage-mode and current-mode delay is shown in Fig.28.

Based on the above analysis, if the capacitance loading is independent, the time constant of long interconnection line can be reduced by reducing the loading resistance  $R_L$ . When the next stage is a voltage-mode circuit, it is always working as a capacitance loading. Therefore, the loading resistance is much larger than resistance in line. This method will make long delay time in signal transportation. In order to shorten the time constant in long interconnect line, we can make



Fig.28 Comparison of voltage-sensing and current-sensing with approximations

next stage to be a current mode circuit. By this way, the loading  $R_L$  can be reduced to lower than the line resistance. Even they are the same order; the delay time in long line can be shorten to one order or more little. Hence, this low resistance input node for next stage can speed up the signal delay time, which pass through the long interconnects line.

#### 4.3 Voltage-Mode Sense Amplifier

Voltage-mode sense amplifiers have been known for a long time, the simplest voltage sensing amplifier is the differential couple [2]. Fig.29 shows a schematic diagram of a simple differential couple with its inputs and outputs labeled. During a read, the input nodes ( $V_{IN+}$  and  $V_{IN-}$ ) would be pre-charged to  $V_{PRE}$ , causing the output nodes ( $V_{OUT+}$  and  $V_{OUT-}$ ) to

stay at the same level. The read-selected cell would then be asserted and a small voltage swing would appear on the bit-lines. This small voltage swing would then be amplified by the differential couple and later used to drive digital logic



Fig.29 Simple differential couple schematic

Another version of a voltage sense amplifier which has enjoyed a wide usage is the full complementary positive feedback differential sense amplifier. This voltage sense amplifier has a very large differential gain and the added ability to automatically rewrite destructive read data [59]. Fig.30 shows the schematic diagram of the full complementary positive feedback amplifier.

The positive feedback amplifier has two data nodes  $V_{IN/OUT1}$  and  $V_{IN/OUT2}$  and three control nodes  $SAN_{EN}$ ,  $SAP_{EN}$  and PRE. Nodes  $V_{IN/OUT1}$  and  $V_{IN/OUT2}$  act as both input and output to the sense amplifier. Its operation is as follows: 1) the data nodes are equalized using PRE; 2)the memory cell being read is asserted and a small voltage difference forms on nodes  $V_{IN/OUT1}$  and  $V_{IN/OUT2}$ ; 3) while MN1 and MN2 are biased to be operated



Fig.30 Full complementary positive feedback amplifier schematic

in the saturation region, MN6 is turned on by  $SAN_{EN}$ ; 4)as both  $V_{IN/OUT1}$  and  $V_{IN/OUT2}$  are decreased in voltage, so is the difference between them; 5) one of them decreases much faster than the other and causes MN(1or2) to enter cutoff while the other starts operating in triode; 6) at this point MP5 is turned on by  $SAP_{EN}$  which pulls the signals rapidly apart; 7) at this point since  $V_{IN/OUT1}$  and  $V_{IN/OUT2}$  are directly connected to the bit-lines, the data is automatically written to the destructively read memory cell. Due to its positive feedback, this voltage sensing amplifier achieves a very high differential gain. This high gain minimizes sensing time by being able to sense small voltage swings on the bit-line. However, since the bit-line capacitance is growing along with memory capacity, the bit-line voltage swing is becoming smaller and more power expensive to produce. There also exists a practical limit to this decreasing voltage swing. When the bit-line voltage swing reaches the same magnitude as bit-line noise, the voltage sense amplifier will become unusable.

Therefore, to achieve the preset objectives of large memory capacity, high speed, and low power, a new type of sense amplifier is needed.

## 4.4 Clamped Bit-Line Sense Amplifier

A commonly used current mode sensing amplifier is the clamped bit-line sense amplifier [58] shown in Fig.31. By clamping the voltage on the bit-line to a stable voltage ( $V_{REF}$ ), the signal current produced by the cell can be transferred to an internal sense amp node without charging/discharging the large bit-line capacitance. As a result, both sensing delay and dynamic power consumption are significantly decreased.



Fig.31 Clamped bit-line sense amplifier

This sense amplifier uses three pre-charge and equalization transistors (M7, M8 and M9), two current sensing transistors (M5 and M6)

and four back to back inverter configuration transistors for the voltage output stage (M1, M2, M3, M4). Its operation follows two stages pre-charge/equalization and sensing. The following is the timing schedule:

1) transistors M7, M8, and M9 are turned on to pre-charge and equalize the sensing nodes; 2) transistors M7 and M8 are turned off and the memory cell accessed; 3) the current from the cell starts being sourced by one of the transistors M1 and M2 and a voltage difference starts forming on one of the output nodes; 4) this voltage is further amplified by the positive feedback amplifier until it reaches the latched state.

It has been shown that the time response of a latch formed by cross-coupled inverters is directly related to the AC small signal gain bandwidth (GBW) product of the inverters. Maximizing GBW product maximizes the speed of the sense amplifier. By examining both small signal models for the positive feedback cross-coupled voltage sense amplifier and the clamped bit-line current sensing amplifier, we can derive the following GWBs:(a) voltage sensing and (b) current sensing GWB:

a) 
$$GBW_{VS} = \frac{g_m}{C_{RL}}$$
 b)  $GBW_{CS} = \frac{g_m}{C_d}$ 

Since  $C_d \ll C_{BL}$ , it can be easily seen that the current mode sense amplifier enjoys a much higher speed. Another observation is that this amplifier is bit-line capacitance insensitive maintaining a constant speed over increased bit-line capacitance.

To recognize the power savings associated with the switch to current sensing amplifiers, we need to examine the dynamic power dissipation of the voltage sensing amplifier. In voltage sensing, the bit-line are discharged and charged by  $dV_{BL}$ (close to 400mV) for every read operation. When this  $dV_{BL}$  is combined with both increasingly large bit-line capacitance  $C_{BL}$  and read frequency " $f_{read}$ ", the energy following below equation becomes large [60]:

$$P = f_{read} * C_{BL} * V_{BL}^2$$

.

The current sensing amplifier on the other hand has a very negligible voltage swing, thus nearly eliminating dynamic power dissipation. Furthermore, this bit-line voltage inactivity significantly decreases cross talk between bit-lines, and supply voltage drop associated with bit-line charge-up.

# 4.5 New Current-Mode Sense Amplifier

The sensing speed of the current mode sense amplifier is faster than conventional voltage sense amplifier and is independent of the bit-line and data-line capacitances. For conventional sense amplifier, because the input nodes connect the bit line or data line, the reading access speed is always dependent of the bit-line and data-line capacitance. This will be a difficult problem to solve due to the more and more cells parallel connect in bit line that always makes a large bit line loading. As the store capability in memory grows up, the number of cell increased in the memory can not be avoided. If we need reading data of cell in a short time, the number of switch that selects the current column can not be increased, it means that the column number must be reduced, at this time, the cells parallelly connected in the bit-line increase. Loading a large bit-line capacitance makes the RC time constant delay extremely larger and the speed of sense

amplifier pulls down as capacitance increases. Due to the low impedance of current-mode sense amplifier, the signal from the memory cell can be injected into the sense amplifier with only minimal charging or discharging of the bit-line capacitance. As a consequence, the voltage change in the bit line during the sense portion of a cell read access is extremely low, eliminating the source of most voltage noise coupling problems and minimizing power supply bounce during sensing.

## 4.5.1 Circuit Description and Operation

In this section, the new current-mode sense amplifier is proposed. The operation power of reading access cycle is less than that of conventional current-mode sense amplifier and the speed is extremely high. Fig.32 presents the read data path of an n-type separated flip-flop current-mode sense amplifier. The N5-N6 and P1-P2 are made in a manner similar to positive feedback latches. N1 and N2 connect the input nodes and pull down the data-lines close to the ground level. The transistors N7 and N8 are the separating transistors, and the transistors N3 and N4 are the equalization transistors. The bit-line and data-line capacitances are represented by C<sub>BL</sub> and C<sub>DL</sub>, respectively, and WL and CL are the word-line and column-line selector signals, respectively. The inputs to the current-mode cross-coupled latch are at the sources of the transistors N5 and N6. The low impedance at the input nodes causes the current signals at the data-lines to be injected into the cross-coupled latch without charging or discharging the data-line capacitances. Hence, the sensing speed is insensitive to both the bit-line and the data-line capacitances.



Fig.32 A current-mode sense amplifier and a simplified data path circuit

1896

Before beginning the sensing operation, the same as the conventional sense amplifier, the bit-lines need to be equalized to the same voltage level. In this design, we pull down the bit-line voltage to the ground level, which is different from the conventional SRAM's.

When the sense amplifier is in the standby state, the signal "SENB" is at high-level and the signal "SEN" is at low-level. Under this condition, N3 and N4 are on, so they pull down the drains of the N5 and N6 to low-level. Hence, N5 and N6 are in the cut-off state, and P1 and P2 operate in the linear region, since their gate voltages are at low-level. The "SEN" is at low-level, so N7 and N8 are in the cut-off state, therefore, no current flows through N7 and N8. At the time, the voltage at the output nodes of the sense amplifier (node A and node B) are equal to the power supply voltage, the input nodes are at zero volts, and the latch nodes (the

drains of N5 and N6) are discharged to low-level. Hence, in the standby state, no DC current flows in the sense amplifier.

During the read operation, both WL and CL lines are activated. The "SENB" is at low-level, and so turns off N3 and N4; and the "SEN" is at high-level, the separated flip-flop is turned on. When a particular memory cell is accessed, a differential current signal appears at the DL and DLB of the common data-lines. N5 has a lower V<sub>GS</sub> than N6, so the voltage at node A exceeds the voltage at node B. Moreover, the amplifier with cross-coupled configuration implies that the source to gate voltage of P2 is less than that of P1. The current that flows into node A will therefore be much higher than the current that flows into node B. The voltage at node A then increases further and the voltage at node B decreases. The separated flip-flop is a positive feedback loop, which regenerates the voltage to full swing and latches the voltage, and the response time of the flip-flop is very short, since the capacitance of the output node is very small.

Besides, the different points from conventional current sense amplifier are the equalization transistors N3, N4, and the special positive latch structure with separated transistor N7, N8. In conventional design, for example, CBLSA or hybrid mode sense amplifier always uses only one NMOS transistor which connects the two output nodes as an equalization transistor. For this method, when equalizing signal rise high, the NMOS turns on to equalize the charge between the two output nodes. Assume the beginning voltage levels of the output nodes are supply voltage and ground level. After the equalization stops, the voltage is the half of the supply voltage. In this condition, the transistors that combine the positive feedback latch always turn on due to their gate-source voltage is larger than their threshold voltage. Hence, there is a static current flow through from power supply node to the ground. This static current makes

the power consumption at the operation state increase as the equalization time increases. Moreover, use only one NMOS for equalization transistor, the operation of output nodes charges redistribution like two same size capacitance, so that NMOS must be large to speed up the equalization time. Not only it will slow down the equalizing state, but also the loading in the equalizing signal increases.

To avoid the static current flow through and speed up the equalizing time, N7 and N8 separate the latch nodes and output nodes. And the N3, N4 are used to substitute for only one transistor. During equalizing situation, N7 and N8 turn off to cut the current flow down from P1 and P2. N3 and N4 open to pull down the latch nodes to ground. Because the capacitance of ground is larger than the capacitance of the latch nodes, the equalizing time can be reduced, and the sizes of N3 and MN4 are much smaller than conventional transistor. The voltages of latch nodes are zero volts, so the transistors N5 and N6 are all at cut off region. Hence, the current flow through P1 and P2 is stopped. Transistor N7 and N8 separate the output nodes and latch nodes, the output capacitance won't affect the latch node. Although the output capacitance is large, the capacitance in the latch nodes is still very small due to only one gate load connects, so that the latch operation is insensitive to output capacitance, and the smaller latch capacitance makes the latch time short.

When the sense amplifier equalizes, the latch nodes connect to ground, the output nodes connect to supply voltage, and the input nodes connect to ground. In this circuit structure, every node is connected to constant voltage, so that there is no floating node at the sense amplifier at standby state. This no floating node condition will make the noise inject reduced.

As discussed above, the new current mode sense amplifier can reduce the operation power consumption by the separate transistor and equalize the latch nodes to ground. And the voltage gain of the output nodes can be increased by exponential function to achieve high-speed reading access cycle.

#### 4.5.2 Simulation Results

Extensive circuit simulations, using HSPICE, were performed to confirm the operation of circuit and characterize its performance. The simulation results are based on rise and fall times of 1ns. During the read cycle, differential current signals appear at the common bit-lines, as shown in Fig.33. Since no differential capacitance discharging is required to sense the cell data, these currents are almost transported instantaneously to the data-lines. The data-line voltages are kept low and equal by N1 and N2, and this eliminates the need for data-line equalization during the read access. Owing to the small capacitances at node A and B, the response speed of the new current-mode sense amplifier is very fast.

The key voltages during the read operation are shown in Fig.34. When the sense amplifier is equalized, the two outputs are pulled up to supply voltage because the latch nodes are in ground level. The voltage differences at the bit-lines and data-lines are indeed very small (about 50mV) and close to the ground level, thus reducing the power dissipation. The positive feedback effect of the proposed current-mode sense amplifier very rapidly amplifies the differential voltage between nodes A and B to the CMOS logic level.



Fig.33 Simulated current waveforms of the new current-sensing data path circuit



Fig.34 Simulated waveforms of the new current-sensing data path circuit

From the simulation results, the new current mode sense amplifier can reduce the operation power by two concepts. First is to use the current mode sensing to substitute for voltage mode sensing, and second is to use the separation transistor to reduce current flow during equalization state of sense amplifier. Due to positive feedback characteristic of the cross-couple pair in the sense amplifier, the sensing speed can be very fast. Hence, the low operation power and high speed sense amplifier can be implemented.

The performance of the proposed circuit is evaluated and compared with that of the hybrid current-mode sense amplifier [61] and the cross-coupled current-mirror sense amplifier [62] based on 3V, 0.35um technology. The simulations were carried out on the proposed current sense amplifier circuit, by sizing the transistor compared to the previous circuits. Fig.35 shows the effect of bit-line capacitances on both sensing delay and average power consumption at a frequency of 100 MHz and data-line capacitances of 1pF. Here, the sensing delay is the interval between the time when a word line becomes high and the time when the memory cell data is read and amplified to the CMOS level. All the circuits are insensitive to the bit-line capacitances, but the proposed circuit senses more quickly. The power consumption is determined from the current drawn by the read data path circuit. The average power consumption of the proposed circuit during a read operation is also less than that of the circuits in [61] and [62]. Before the read operation, the sense amplifier is in the standby state, and the transistors of the flip-flop in [61] and [62] are all turned on, increasing power dissipation. However, N7 and N8 of the proposed sense amplifier isolate the flip-flop, so no DC

current path exists.



Fig.35 Sensing delay and average power dissipation versus bit-lines capacitance

Fig.36 shows sensing delay and power consumption against the data-line capacitances at bit-line capacitances of 1pF. Unlike that of the circuit described in [62], the sensing delay of the proposed circuit hardly changes as the data-line capacitance increases. The proposed circuit also provides the advantages of faster sensing and lower power dissipation. The improvements on speed and power of the proposed circuit are even greater at higher data-line capacitances. In the case of C<sub>L</sub>=0.1pF with C<sub>BL</sub>=1pF, C<sub>DL</sub>=5pF, the average power consumption of the proposed circuit is 319% and 127% lower than that of the circuit described in [61] and [62], respectively, and the sensing speed of the proposed circuit is 13% and 107% faster than that of the circuits reported in [61] and [62], respectively. Hence, the proposed circuit is very suitable for applications

in high-speed, low-power and high-density SRAMs.



Fig.36 Sensing delay and average power dissipation versus data-lines capacitance

56

## **Chapter 5**

#### **New Current-Mode Write Driver**

The bit-line power to perform a write operation can significantly exceed the read power due to a larger signal swing required to write the memory cell. Typically, the bit-line is discharged almost all the way to ground before the write is achieved. Write power can be reduced by partitioning the bit-line into small segments with very small capacitance [36] or by using a smaller voltage swing in the bit-lines to do the writes.

The key problem in achieving a write using a smaller swing is to be able to overpower the memory cell flip flop with a smaller bit-line differential voltage while ensuring that the cell will not erroneously flip its state during a read. Mai et al. [64] had proposed a technique to bias the bit-lines near the ground voltage. When a small bit-line differential voltage is set up for the write and the word line is activated, the internal nodes of the memory cell are quickly discharged to the values of the bit-lines through the access NMOS. When the word line is turned off, the small bit-line differential voltage between the cell internal nodes is amplified by the cross-coupled PMOS in the cell. To prevent the reads from over-writing the cell, the authors propose to reduce the word-line voltage for the read operation, thus weakening the access NMOS and preventing a spurious discharge of the cell internal nodes. The main weakness of this approach is that read access becomes slower since the word-line voltage is smaller. Mori et al. [65] uses a similar concept,

except that they use a bit-line reference of Vdd/2 which is easy to generate and incurs significantly lower read access penalty. A write is achieved by discharging one of the bit-lines to ground. Since the write bit-line swings are halved, a factor of four savings in the bit-line write power is achieved. To improve the robustness of reads, the cell voltage is boosted above Vdd.

In this chapter, we will explore the possibility of using small voltage swings in the bit-lines while keeping the bit-line reference close to Vdd. These small swings are amplified by operating the memory cell as a latch sense amplifier to accomplish a current-mode write driver.

# 5.1 Conventional Voltage Writing Mechanism

In the past, the writing circuit of a memory cell is always combined with two driving buffers. When the word line is turned on and the column switch is opened, the driving buffer will change the voltage level of the two bit lines, then the data in a cross-couple paired memory cell can be restored.

Fig.37 shows that the different points from reading access cycle, in which the loading is not simply resistance and capacitance loading. The input driver of writing circuit not only changes the voltage level in the bit-line, but also changes the state in the memory cell. So, power consumption at writing access cycle is dependent on not only the dynamic power in the bit-line, but also the steady power in the buffer and memory cell. By this method, the data input buffer must have more driving ability than the inverter couple in the memory cell. For a high-speed writing cycle design, most solution is to increase the size of input buffer. However,

the power dissipation especially the static current in the input buffer is increased in proportional to transistor size of the buffer. Moreover, the large bit-line voltage swing, which is needed to change the cell state, will produce the power loss in bit-line equalization. Even the speed of writing access cycle is increased, the static current in the cell can be reduced, the other power consumption and circuit area are increased. Hence, some new writing access mechanism is implemented to achieve high-speed and low operation power writing circuit.



Fig.37 Bit-line model during write access cycle

## 5.2 Current Writing with Equalization Transistor

In a conventional writing operation, it needs a nearly full supply voltage swing at the bit line to override the original cell data during writing access cycle. However, the bit-line voltage swing dominates dynamic operation power dissipation according to fCV<sup>2</sup>. Hence, the reduction of the bit-line voltage swing in writing access cycle can reduce the dynamic power dissipation. The new writing mechanism [18] is to change the memory cell structure even the bit-line voltage swing is extremely small.

In this method, besides the two access transistors, the new memory cell is consisted of 7 transistors as shown in Fig.38.



Fig.38 7T-memory cell

The new 7T cell is similar to the conventional 6T cell. The only difference from the conventional memory cell is the equalization transistor being connected to the two latch nodes in the cell. The equalization transistor used makes the latch node in the cell being the same at the initial state in writing. The same as the current sense amplifier is operated by the positive feedback circuit, the small difference voltage can make the cell data pull to full supply voltage swing itself.

During reading access cycle, the equalization signal is pull down to turn off the equalization transistor, the new 7T cell is acted the same as the conventional 6T cell. From the discussion in Chapter 4, the data in the cell can be sensed by current mode sense amplifier with lower operation power dissipation.

During writing access cycle, the data to be written into memory cell is controlled by the internal pulse signal equalization. Before the word line turns on, the signal equalization raises high to turn on the equalization transistor in the memory cell. Then, the two latch nodes in the memory cell will charge or discharge to the middle of the supply voltage by the charge redistribution. At the time, the two latch nodes in the memory cell are at the same voltage level. The cell is similar to the simplest current mode sense amplifier. When the word line pulls up to turn on the access

transistor, the difference current in the two bit lines will flow into the two latch nodes. Although the voltage difference is very small at the beginning, it will pull up and down by itself due to its positive feedback characteristic. According to this equalizing state, the access time of writing cycle is shorter than that of conventional design. With the current conveyor circuit or any negative feedback circuit, the bit line can be locked in almost constant voltage level. So the bit-line voltage swing can be reduced and the dynamic power dissipation is lower than conventional writing mechanism.

However, this method has some disadvantages for use in memory design. The first big problem is that its transistor number in the memory cell is larger than conventional memory cell. This 7T-memory cell will make the cell size larger than the conventional design, not only the transistor area is increased, but also it increases two contacts in the cell. For scaling down the cell size, the contact number in the cell must be reduced as less as possible due to its larger area. Therefore, this cell structure will make the memory size extremely large. Second problem is the power dissipation of the cell when it is in the equalizing state. Because the cell must be equalized before the word line turns on. Since the latch nodes are all at the middle supply voltage level, the transistors in the cell are all turned on at the moment. There is a static current flow from the two PMOS of the memory cell. Hence, the operation power dissipation increases as the equalizing time increases. The third important problem is that it is hard to only write one cell at once. Because the 7T-memory cell must be equalized at the beginning to let the cell being non-write and stable, the equalization signal must only equalize one cell. Besides, the signal path lines are increased because one cell needs its signal, the signal must be decoded by all the addresses to achieve only one cell accessed. If the equalization signal is only decoded by row address, one writing

operation will destroy the data of cell in the same row, it will produce the same trouble when it is only decoded by the column address. Hence, the only method to avoid wrong writing happened, the equalization signal must be decoded by all the addresses. Therefore, the decoding time of the signal will very long when the memory size is larger. However, the equalization signal occurs before the word line turns on, so the access time of the whole operation increases.

According to the three problems described above, even the bit-line swing can be reduced to lower the dynamic power in the bit line, the performance of the whole memory access seems to be not better than conventional design.

### **5.3 New Current Writing Mechanism**

### 5.3.1 Current-Mode Write Driver

Conventional writing operations need a nearly full supply-voltage swing at the bit-line to overwrite the original cell data during the writing access cycle. However, the bit-line voltage swing dominates the power dissipation during dynamic operation. Hence, the reduction of the bit-line voltage swing at the writing access cycle can reduce the dynamic dissipation power.

Fig.39 shows the write data path of a p-type separated flip-flop current-mode write driver. The characteristics of this proposed write driver are quite similar to those of the n-type separated flip-flop current-mode sense amplifier, so no DC current flows in the standby state. During the write operation, as the full swing voltage is applied to the input nodes of the write driver, the separated transistors transport the current to the data-lines. In the new writing mechanism, the bit-line and data-line

are precharged to the ground level. Assume that the node V1 is at high-level and the node V2 is at low-level. When the word line is turned on, the equalizing transistor remains on, seemingly acting as an extra transistor to equalize V1 and V2. This charge redistribution mechanism brings V1 and V2 close to each other since the bit-line capacitance is always much larger than the capacitances of the node V1 and V2. At that time, the equalizing transistor is turned off and the column switch is turned on, and the data is driven into the data-line from the write driver. Since the voltages of V1 and V2 are close to each other, although the bit-line swing is still very small, the small differential current in the data-line can change the state of the memory cell. The new write driver circuit makes the bit-line and data-line swing lower than 500mV, thus the power dissipation is much less than that of the conventional voltage-mode write driver.



Fig.39 A current-mode write driver and a simplified data path circuit

#### **5.3.2** New Memory Cell for Current-Mode Operation

In a conventional SRAM, the bit-line voltage swing during writing must be large, normally at the full CMOS level, to toggle the cell. The reason is that the sufficient noise margin during read is accomplished by using much weaker access than inverter transistors. Instead, this work presents a cell with almost equally sized access and inverter transistors, which can be toggled with a small differential bit-line voltage.



Fig.40 Schematic of the memory cell

Fig.40 shows the proposed memory cell, which consists of MA1-MA2, MN1-MN2 and MP1-MP2. The configuration of the proposed memory cell is the same as that of the conventional memory cell. Since the bit-line voltage is pulled down to ground, the PMOS transistors in the cell are acted as driver transistors and the NMOS transistors in the cell are worked as load transistors. Contrary to the conventional method of operating SRAM cells, in which the bit-lines are precharged to high-level, cell stability here depends not on the  $\beta_N/\beta_A$  ratio, but the  $\beta_P/\beta_A$ 

ratio is defined as the cell ratio. The static noise margin (SNM) of a SRAM cell is defined as the minimum dc noise voltage required to change the state of the cell. The SNM of the new memory cell is around 1.0V, as shown in Fig.41.

During read operation, the node V1 is assumed to be at a low voltage and the node V2 is at a high voltage. The low bit-line voltage does not affect V1.



Fig.41 The static transfer characteristics of the memory cell

The voltage of V2 falls by an amount that depends on the cell ratio. As long as this drop is less than the threshold voltage of the PMOS, the cell is clearly stable, because MP2 will not turn on. During the write operation, the word line is raised to the high-level. Based on the current-mode write driver, the bit lines are driven to 0V and 0.5V. Despite the low cell ratio, writing can fail if  $\beta_N$  is much larger than  $\beta_A$ . In the new memory cell, the size of MN is designed by using the minimum device size and is almost the same as that of MA, this not only makes the safe write operation but also reduces the layout area below that of the

conventional memory cell. Both read and write operations are performed in the current mode, so a small differential current from the cell can be detected by the current-mode sense amplifier, while a sufficiently small differential data current can overwrite the content of the cell.

### **5.3.3** Simulation Results and Comparisons

Fig.42 shows the simulated waveform during the write operation. When WL and EQ are at high-level, V1 and V2 are close to each other. When the EQ goes low, V1 and V2 change the state rapidly. The proposed current-mode write driver makes the bit-line and data-line swing lower than 500mV and can enable the equalization current of the bit-line and the equalization time to be reduced. Hence, not only the power dissipation is reduced, but also the speed of the writing access cycle is improved.



Fig. 42 Simulated waveforms of the new current-writing data path circuit

All three circuits (the conventional input buffer circuit, the circuit described in [18] and the proposed circuit) were simulated together with a simplified write-cycle-only memory system. They are compared in terms of the write pulse width and average power dissipation with various data-line capacitances with a bit-line capacitance of 1pF, as shown in Fig.43. Here, the write pulse width means the interval between the time when a word line becomes high and the time when the data is written into the memory cell. The figure indicates that the proposed circuit and [18] are independent of the data-line capacitances, and the conventional voltage-mode input buffer is sensitive to the data-line capacitance. The proposed circuit has the smallest write pulse width. The power dissipation is measured from the current drawn by the write data path circuit. The average power dissipation of the proposed circuit is also less than that of the conventional circuit and the circuit in [18]. For example, at a load (C<sub>L</sub>) of 0.1pF, a frequency of 100MHz, and with  $C_{BL} = C_{DL} = 1$ pF, the average power dissipation of the proposed circuit is 73% and 295% lower than that of the conventional circuit and that described in [18], and the write pulse width of the proposed circuit is 97% and 32% shorter than that of the conventional circuit and the circuit in [18], respectively. The data-line loading increases as the memory size increases, so the conventional design always uses a larger input buffer to drive data into the cell. However, the power dissipation of the driver increases with the size of the transistor. For the same buffer, the conventional design can write data into the cell only when the data-line loading is small. As the data-line loading increases, the data driving ability is not suffice. The proposed current-mode write driver can write data into the cell, even though the data-line loading is high.



Fig.43 Write pulse width and average power dissipation versus data-lines



# **Chapter 6**

# Low Power and High Speed SRAM

In memory design, the operating power dissipation is dominated by the reading access circuit and the writing access circuit. In chapter 4 and chapter 5 introduce the new read and write operation circuits which can reduce power consumption and speed up the access time. In this chapter, a low power and high speed SRAM circuit is implemented with the new current mode read/write circuit.

# **6.1 Low Power SRAM Architecture**

For a low power and high performance SRAM circuit design, besides read/write circuit design level, the architecture is also very important.

| bias                | ATD<br>BLK | bias                |
|---------------------|------------|---------------------|
| RAM core<br>512x256 | WL decoder | RAM core<br>512x256 |
| r/w ckt             | dec        | r/w ckt             |

Fig.44 Architecture of low power memory chip

Fig.44 shows a simple architecture of the whole 32K x 8 static random access memory. The sense amplifier is always used to speed up the read access time of memory. The dynamic memory design makes every column have its own sense amplifier due to its very same voltage difference. However, the sense amplifier usually needs large power dissipation during operation and standby states. The number of the sense amplifier is increased, the power dissipation is increased in the access state. The static memory usually decreases the sense amplifier number to reduce power dissipation due to its cell can pull the bit line to larger enough different voltage level. Hence, the hierarchical column switch is used to select the accessed column to connect the sense amplifier. If the sense amplifier is voltage mode amplifier, the switch between the bit line and the sense amplifier will increase the delay time of signal path if the switch is acted as resistor and capacitance. In Chapter 4, the current sense technique is introduced to reduce signal path delay. Current sense amplifier is used to speed up access time due to its shorter delay time characteristic, the hierarchical column switch can be implemented to reduce the number of the sense amplifier. So that, the current mode sense amplifier not only speeds up access time, but also reduces operation power during sensing state.

As the memory size increases, the cells in one word line become more and more. It makes the word line have a large loading for a word-line driver. To speed up the rising time and falling time of the word line, the size of the word-line driver must be increased. However, it is difficult to achieve because every word line needs its own driver, so the size must be as small as possible to reduce memory chip area. To reduce the word-line loading, the memory array is separated into two blocks and

each access cycle only needs to active one block of them. Because the cells in the word line are reduced to 1/2 of only one block architecture, the loading is reduced to 1/2, too. Not only the dynamic power in the word line can be reduced, but also the rising and falling time can be speed up with the same driver. In some low power and high performance memory design, the whole memory array was separated into more blocks to reduce the larger word-line loading. However, the chip area and operation power in decoding state increase due to the increase of the number of decoder. In order to trade off the power dissipation and speed of access time, the memory array is separated into two blocks.

## 6.2 Cell Design and Layout

In Chapter 5, the memory cell design is dependent on the writing in circuit. Since the bit line is equalized to the ground level, the PMOS size of the memory cell is very important. Conventionally, the PMOS works as a resistive loading. It charges the storage node when the storage nodes have leakage current. However, because the bit line is equalized to the ground level, the PMOS transistors in the cell are acted as driver transistors and the NMOS transistors in the cell are worked as load transistors.

There are two important factors of transistors' size. First, the PMOS transistor size can not be very small. Since the PMOS transistors in the cell are acted as driver transistors and the NMOS transistors in the cell are worked as load transistors, the driving current of PMOS transistor must be larger than the discharge current of access NMOS transistor. Therefore, the PMOS transistor needs larger than a minimum size and the

access NMOS transistor should be the minimum size. Second, to have a larger noise margin of the memory cell during reading access cycle, the bias voltage of storage nodes is larger than that of the conventional cell because the bit line is equalized to the ground level. So the NMOS transistor of positive feedback pair needs as small as possible.



Fig.45 The layout of memory cell

Fig.45 shows the layout of the proposed memory cell whose size is 3.9um\*5.85um. There are four transistors of cross-coupled pair, and the bottom is two access transistors and word line. The layout of memory cell is implemented by 0.35µm CMOS process. Table 1 compares this cell to a typical conventional SRAM cell. Both cells are designed using the same standard 0.35um CMOS technology. The conventional cell was operated

with precharged high bit-lines, a conventional sense amplifier and a CMOS-level bit-line swing during the write operation must overcome the high cell ratio. The new memory cell was operated as described above with a bit-line swing of no greater than 0.5V. In both cases, the bit-line load is equivalent to 512 cells. The read and write power dissipation given in Table I is the total power consumed by the memory cell, the bit-line precharge, the sense amplifier and the write driver during an operation cycle.

Table 1 Comparison to conventional SRAM cell

|                           | proposed<br>SRAM cell     | conventional<br>SRAM cell |
|---------------------------|---------------------------|---------------------------|
| $\beta_P$                 | 93 $\mu$ A/V <sup>2</sup> | $46 \mu \text{A/V}^2$     |
| $\beta_N$                 | $115 \mu \text{A/V}^2$    | $327 \mu \text{A/V}^2$    |
| $eta_{ m A}$              | $131  \mu \text{A/V}^2$   | $131 \mu \text{A/V}^2$    |
| power dissipation (read ) | 1.2 mW                    | 4.47 mW                   |
| power dissipation (write) | 1.5 mW                    | 2.88 mW                   |

### 6.3 Variation Effects on Current-Mode Circuit

The layout mismatches usually make the current-mode circuits work incorrectly. Since there is geometrical mismatch during fabrication, two same size transistors are hard to make very match. For very sensitive sense amplifier, the offset signal is easily larger than bit-line signal.

Hence, the layout of current-mode circuits needs some technical skill. When the gate length of transistor is shorter, the variations between two same-size transistors become larger. Therefore, the gate length of transistors in current-mode circuits is longer than a feature size. For the layout of the two same-size transistors, the better method is to divide one transistor into two or more, then the divided transistor of the two same-size transistors is placed to be crossed each other, as shown in Fig.46. This method reduces the effect of the process variation, the mismatch between two transistors can be made smaller, and the current-mode circuits have a larger noise margin.



Fig.46 Layout placement of same-size transistor

The parameters of a transistor may vary on the same die, depending upon the location of the transistor. The variations like impurity concentration (doping concentration), the variation in oxide thickness etc. are caused by non-uniform doping concentrations during the manufacturing process. A fastest and slowest model of the transistor can be made by considering all the process variations. In current-mode sense

amplifier (Fig.32), we assume that the transistors N5, N8 have the fastest model and N6, N7 have the slowest model, which will cause the sense amplifier operate in a worst case condition. Fig.47 shows the effect of bit-line capacitances on both sensing delay and average power consumption with typical and worst conditions. With  $C_{DL}$  =1pF,  $C_{BL}$  =5pF, the average power consumption of the proposed circuit with process variations is 12.9% higher than that of the circuits with typical condition, and the sensing speed of the proposed circuit with process variations is 2.2% faster than that of the circuits with typical condition.



Fig.47 Sensing delay and average power dissipation with process variations versus bit-lines capacitance

For a current-mode write driver (Fig.39), we also assume that transistors P5, NP8 use the fastest model of transistor and P6, P7 use the slowest model of transistor, which will cause the write driver operate in the worst case condition. Fig.48 shows the effect of data-line capacitances

on both write pulse width and average power consumption with typical and worst conditions. With  $C_{BL}$ =1pF,  $C_{DL}$ =5pF, the average power consumption of the proposed circuit with process variations is 0.96% higher than that of the circuits with typical condition, and the write pulse of the proposed circuit with process variations is 3.59% larger than that of the circuits with typical condition.



Fig.48 Write pulse width and average power dissipation with process variation versus data-lines capacitance

## **6.4 Experimental Results**

A 32Kx8 SRAM chip was designed and fabricated to evaluate the new current-mode techniques. The 32Kx8 SRAM was fabricated using TSMC 0.35um 1P2M CMOS logic process. Fig.49 displays a photomicrograph of the fabricated 256Kb SRAM. The SRAM is

externally organized as 32Kx8 and internally as two banks, and each bank contains 512 rows and 256 columns. The designed read/write circuit is operated in current-mode and the capacitive loading of the data lines only slightly affects performance, so eight read/write circuits are commonly used for two banks and located in the bottom region.



Fig.49 A photomicrograph of 32Kx8 SRAM

Fig.50 shows the measured waveforms of the address input and the data output at 25°C with a 3V supply voltage. The typical access time is 9ns at an output load capacitance of 30pF and the active current is 28mA at 100MHz. A shmoo plot of the address access time versus power supply voltage is shown in Fig.51. The operating voltage ranges from 2.5V to 3.6V. The maximum access time of 9ns was achieved at 3V, the target operating voltage. Table 2 lists the features of the process and the typical characteristics of the SRAM.



Fig.50 Typical address and output waveforms



Fig.51 Shmoo plot of address access time versus power supply voltage

Table 2 Process and SRAM characteristics

| Technology          | 0.35um 1P2M CMOS Logic Process |
|---------------------|--------------------------------|
| Gate length         | 0.35um                         |
| Gate oxide          | 7.5 nm                         |
| Configuration       | 32Kbx8                         |
| Cell size           | 3.9x5.85 um <sup>2</sup>       |
| Chip size           | 2.38x3.76 mm <sup>2</sup>      |
| Supply voltage      | 3V                             |
| Address access time | 9ns (30pF, 3V)                 |
| Active current      | 28mA (100MHz, 25°C)            |

# Chapter 7

### **Conclusions**

This thesis is centralized on the low-power and high-speed circuit techniques for SRAM design. With SRAM circuits being widely used in portable electronic products, the power consumption of SRAMs becomes an important design concern, and therefore, low-power design techniques for SRAMs are strongly pursued. In this project, the power saving mechanisms for the memory cell array, current-mode sensing circuit and current-mode write driver are explored. In addition, a low-power and high-speed 32Kx8 SRAM chip is implemented.

In the first phase of this thesis, a novel single-sided static random access memory cell based on the new Lambda bipolar transistor is proposed. Due to the characteristics of the LBT's, it offers better noise margin and larger driving capability as compared with the conventional single-sided CMOS memory cell. The memory cell is suitable for high speed SRAM's design.

The key to low power operation in the SRAM data path is to reduce the signal swings in the high capacitance nodes, the bit-lines and the data-lines. A current-mode sense amplifier is essential for obtaining low sensing power. A new high-speed low-power current-mode sense amplifier is presented. It is based on the positive feedback technique and its access time is unaffected by the bit-line and data-line capacitances. Owing to the separated cross-coupled latch, it causes no DC current path in standby state, a very small voltage swing at the bit-line and the

data-line, and a low average power consumption in the read operation. Comparative evaluations show that the new circuit gives higher sensing speed and lower power consumption as compared to the reported circuit.

For write operation, we also propose a current-mode write driver. The new writing-mode circuit makes the small bit-line voltage swing while keeping the bit-line reference close to the ground level. These small swings are amplified by operating the memory cell as a latch sense amplifier, accomplished by a cell with almost equally sized access and inverter transistors. Based on the current-mode operation, the proposed circuit exhibits smaller average power dissipation and shorter write pulse width than the conventional circuit.

Finally, these presented techniques were demonstrated to be useful by evaluating an experimental 32Kx8 SRAM chip using 0.35um process technology. A new SRAM cell operation mode with significantly reduced power consumption and layout area but with a good maintained speed, is demonstrated. The SRAM has a low power dissipation of 84mW at 100MHz under typical conditions. The typical access time is 9ns at a supply voltage of 3V and an output load capacitance of 30pF. The new current-mode techniques are suitable for realizing high-speed and low-power SRAM's.

## References

- [1] P. Barnes, "A 500MHz 64b RISC CPU with 1.5Mb on-chip cache", IEEE International Solid State Circuits Conference, Digest of Technical Papers, pp. 86-87, 1999.
- [2] S. Hesley et al., "A 7th-Generation x86 Microprocessor", *IEEE International Solid State Circuits Conference, Digest of Technical Papers*, pp.92-93, 1999.
- [3] Special issue on low power electronics, *Proceedings of the IEEE*, Vol. 83, No.4, April 1995.
- [4] S. Subbanna et al., "A High-Density 6.9 sq. um embedded SRAM cell in a High-Performance 0.25um-generation CMOS Logic Technology", *IEDM Technical Digest*, pp. 275-278, 1996.
- [5] G.G. Shahidi et al., "Partially-depleted SOI technology for digital logic", *ISSCC Digest of Technical Papers*, pp. 426-427, Feb. 1999.
- [6] H. Takagi and G Kano, "Dual depletion CMOS (D<sup>2</sup>CMOS) static memory cell", *IEEE Journal of Solid-State Circuits*, Vol.SC-12, No.4, pp.424-426, Aug. 1977.
- [7] L. Schrader and G. Meusburger, "A new circuit configuration for a static memory cell with an area of 880 um<sup>2</sup>", *IEEE Journal of Solid-State Circuits*, Vol.SC-13, No.3, pp.345-351, Jun. 1978.
- [8] M. I. Elmasry and L. R. Peterson, "A DOL CMOS static memory cell", *IEEE Journal of Solid-State Circuits*, Vol.SC-16, No.5, pp.466-471, Oct. 1981.
- [9] M. I. Elmasry and L. R. Peterson, "SDW MOSFET static memory cell", *IEEE Journal of Solid-State Circuits*, Vol.SC-16, No.2, pp.80-85, Apr. 1981.

- [10]K. Sasaki et al., "A 16-Mb SRAM with a 2.3 um<sup>2</sup> single-bit-line memory cell", *IEEE Journal of Solid-State Circuits*, Vol.SC-28, No.11, pp.1125-1130, Nov. 1993.
- [11]M. Ukita et al., "A single-bit line cross-point cell activation (SCPA) architecture for ultra-low power SRAM's", *IEEE Journal of Solid-State Circuits*, Vol.SC-28, No.11, pp.1114-1118, Nov. 1993.
- [12]Y. K. Seng and S. S. Rofail, "1.5V high speed low power CMOS current sense amplifier", *Electronics Letters*, Vol.31, pp.1991-1993, Nov. 1995.
- [13] J. Alowersson and P. Andersson, "622MHz current-mode sense amplifer", *Electronics Letters*, Vol.32, pp.154-157, Feb. 1996.
- [14]E. Scevinck, P. J. van Beers, and H. Ontrop, "Current-mode techniques for high-speed VLSI circuits with application to current sense amplifier for CMOS SRAM's", *IEEE Journal of Solid-State Circuits*, Vol.SC-26, No.4, pp.525-536, Apr. 1991.
- [15]T. N. Blalock and R. C. Jaeger, "A high-speed sensing scheme for 1T dynamic RAM's utilizing the clamped bit-line sense amplifier", *IEEE Journal of Solid-State Circuits*, Vol.SC-27, No.4, pp.618-625, Apr. 1992,.
- [16] K. Ishibashi, K. Takasugi, and K. Komiyaji, "A 6-ns 4-Mb CMOS SRAM with offset-voltage-insensitive current sense amplifiers", *IEEE Journal of Solid-State Circuits*, Vol.SC-30, No.4, pp.480-486, Apr. 1995.
- [17]P. Y. Chee, P. C. Liu, and L. Siek, "High-speed hybrid current-mode sense amplifier for CMOS SRAMs", *Electronics Letters*, Vol.38, pp.871-873, Apr. 1992.
- [18] J. S. Wang, W. Tseng, and H. Y. Li, "Low-power embedded SRAM with the current-mode write technique", *IEEE Journal of Solid-State Circuits*, Vol.SC-35, No.1, pp.119-124, Jan. 2000.

- [19]T. Chappell et al., "A 2-ns cycle, 3.8-ns access 512-Kb CMOS ECL SRAM with a fully pipelined architecture", *IEEE Journal of Solid-State Circuits*, Vol.SC-26, No.11, pp. 1577-1585, Nov. 1991.
- [20]H. Nambu et al., "A 1.8ns access, 550MHz 4.5Mb CMOS SRAM", IEEE International Solid-State Circuits Conference, Digest of Technical Papers, pp. 360-361, 1998.
- [21]G. Braceras et al., "A 350MHz 3.3V 4Mb SRAM fabricated in a 0.3um CMOS process", *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp. 404-405, 1997.
- [22]M. Yoshimoto et al., "A 64kb CMOS RAM with divided word line structure", *IEEE International Solid-State Circuits Conference*, *Digest of Technical Papers*, pp. 58-59, 1983.
- [23]T. Hirose et al., "A 20ns 4Mb CMOS SRAM with hierarchical word decoding architecture", *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp.132-133, 1990.
- [24] R. C. Jaeger, "Comments on 'An optimized output stage for MOS integrated circuits'," *IEEE Journal of Soli- State Circuits*, Vol.SC-10, No. 3, pp. 185-186, June 1975.
- [25]C. Mead and L. Conway, Introduction to VLSI systems, Reading, MA, Addison-Wesley, 1980.
- [26] N. C. Li et al., "CMOS tapered buffer", *IEEE Journal of Solid-State Circuits*, Vol.SC-25, No. 4, pp. 1005-1008, Aug. 1990.
- [27] J. Choi et al., "Design of CMOS tapered buffer for minimum power-delay product", *IEEE Journal of Solid-State Circuits*, Vol.SC-29, No. 9, pp. 1142-1145, Sep. 1994.
- [28]B. S. Cherkauer and E. G. Friedman, "A unified design methodology

- for CMOS tapered buffers", *IEEE Journal of Solid-State Circuits*, Vol.SC-3, No. 1, pp. 99-110, Mar. 1995.
- [29]M. Yoshimoto et al., "A 64kb CMOS RAM with divided word line structure", *IEEE International Solid-State Circuits Conference*, *Digest of Technical Papers*, pp. 58-59, 1983.
- [30]O. Minato et al., "2Kx8 bit Hi-CMOS static RAMs", *IEEE Journal of Solid-State Circuits*, Vol.SC-15, No. 4, pp. 656-660, Aug. 1980.
- [31]Y. Kohno et al., "A 14-ns 1-Mbit CMOS SRAM with variable bit organization", *IEEE Journal of Solid-State Circuits*, Vol.SC-29, No. 9, pp. 1060-1065, Oct. 1988.
- [32]T. Chappell et al., "A 2-ns cycle, 3.8-ns access 512-Kb CMOS ECL SRAM with a fully pipelined architecture", *IEEE Journal of Solid-State Circuits*, Vol. SC-26, No. 11, pp. 1577-1585, Nov. 1991.
- [33]H. Nambu et al., "A 1.8ns access, 550MHz 4.5Mb CMOS SRAM", IEEE International Solid-State Circuits Conference, Digest of Technical Papers, pp. 360-361, 1998.
- [34]G. Braceras et al., "A 350MHz 3.3V 4Mb SRAM fabricated in a 0.3um CMOS process", *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp. 404-405, 1997.
- [35]K. Nakamura et al., "A 500MHz 4Mb CMOS pipeline-burst cache SRAM with point-to-point noise reduction coding I/O", *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp. 406-407, 1997.
- [36] K. Osada et al., "A 2ns access, 285MHz, two-port cache macro using double global bit-line pairs", *ISSCC Digest of Technical Papers*, pp. 402-403, Feb. 1997.

- [37]K. Seno et al., "A 9-ns 16-Mb CMOS SRAM with offset-compensated current sense amplifier", *IEEE Journal of Solid-State Circuits*, Vol.SC-28, No. 11, pp.1119-1124, Nov. 1993.
- [38]O. Minato et al., "A 20ns 64K CMOS SRAM", *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp. 222-223, 1984.
- [39]S. Yamamoto et al., "256k CMOS SRAM with variable impedance data-line loads", *IEEE Journal of Solid-State Circuits*, Vol.SC-20, pp. 924-928, Oct. 1985.
- [40] K. J. Schultz et al., "Low-supply-noise low-power embedded modular SRAM for mixed analog-digital ICs", *Proceedings of IEEE Custom Integrated Circuits Conference*, pp.1-4, 1992.
- [41]P. Reed et al., "A 250MHz 5W RISC microprocessor with on-chip L2 cache controller", *IEEE International Solid-State Circuits Conference, Digest of Technical Papers*, pp. 412-413, 1997.
- [42]K. Sasaki et al., "A 15-ns 1-Mbit CMOS SRAM", *IEEE Journal of Solid-State Circuits*, Vol.SC-23, No. 5, pp. 1067-1071, Oct. 1988.
- [43]K. Sasaki et al., "A 7-ns 140-mW 1-Mb CMOS SRAM with current sense amplifier", *IEEE Journal of Solid-State Circuits*, Vol.SC-27, No. 11, pp. 1511-1517, Nov. 1992.
- [44]M. Matsumiya et al., "A 15-ns 16-Mb CMOS SRAM with interdigitated bit-line architecture", *IEEE Journal of Solid-State Circuits*, Vol.SC-27, No. 11, pp. 1497-1502, Nov. 1992.
- [45] R. E. Thomas, R. Haythornthwaite, and W. A. Chin, "The NEGIT: a surface-controlled negative impedance transistor", *IEEE Trans. Electron Devices*, Vol.ED-24, pp.1070-1076, 1997.

- [46] C. Y. Wu and C. Y. Wu, "An analysis and the fabrication technology of the Lambda bipolar transistor", *IEEE Trans. Electron Devices*, Vol.ED-27, pp.414-419, Feb. 1980.
- [47] C. Y. Wu and C. Y. Wu, "Characterizations and design considerations of Lambda bipolar transistor (LBT)", *IEE Proceeding*, Vol.128, pp.73-80, May. 1981.
- [48] M. M. Sarkar, M. Satyam, and A. Prabhakar, "A study of static RAM cell using the Lambda Bipolar Transistor (LBT)", *Microelectronics Journal*, Vol.28, pp.65-72, 1997.
- [49] A. Sellai et al., "Double-barrier resonant tunneling diode three-state logic", *Electron Letter*, Vol.26, pp.61-62, Jan. 1990.
- [50]F. Capasso, S. Sen, A. Y. Cho, and D. Sivco, "Resonant tunneling devices with multiple negative differential resistance and demonstration of a three-state memory cell for multiple-valued logic applications", *IEEE Electron Device Letter*, Vol.EDL-8, pp.297-299, July 1987.
- [51]S. J. Wei and H. C. Lin, "Multivalued SRAM cell using resonant tunneling diodes", *IEEE Journal of Solid-State Circuits*, Vol.SC-27, pp.212-216, Feb. 1992.
- [52] C. Y. Wu, "A new internal overvoltage protection structure for the bipolar power transistor", *IEEE Journal of Solid-State Circuits*, Vol.SC-18, pp.773-777, Dec. 1983.
- [53] C. Y. Wu and Y. F. Liu, "A high density MOS static RAM using the lambda bipolar transistor", *IEEE Journal of Solid-State Circuits*, Vol.SC-18, pp.222-224, 1983.
- [54] K. Itoh, K. Sasaki, and Y. Nakagome, "Trends in low-power RAM circuit technologies", *Proceedings of the IEEE*, Vol.83, pp.524-543, 1995.

- [55]P. Y. Chee, P. C. Liu, and L. Siek, "High-speed hybrid current-mode sense amplifier for CMOS SRAMs", *Electronics Letters*, Vol.38, pp.871-873, 1992.
- [56] K. S. Yeo, "New current conveyor for high-speed low-power current sensing", *IEE Proceedings-Circuits, Devices and Systems*, Vol.145, pp.871-873, 1998.
- [57] K. S. Yeo, W. L. Goh, Z. H. Kong, Q. X. Zhang, and W. G. Yeo, "High-performance low-power current sense amplifier using a cross-coupled current-mirror configuration", *IEE Proceedings-Circuits, Devices and Systems*, Vol.149, Oct. pp.308-314, 2002.
- [58]T. N. Blalock, "A high speed clamped bit-line current mode sense amplifier", *IEEE Journal of Solid-State Circuits*, Vol.SC-26, pp 542-548, Apr. 1991.
- [59]T. P. Haraszti, "CMOS memory circuits", *Kluwer Academic Publishers*, pp 165-275, 2000.
- [60] K. Itoh, K. Sasaki, and Y. Nakagome, "Trends in low power RAM circuit technologies", *Proceedings of the IEEE*, Vol.83, No 4, Apr. 1995.
- [61] P. Y. Chee, P. C. Liu, and L. Siek, "High-speed hybrid current-mode sense amplifier for CMOS SRAMs," *Electronics Letters*, Vol.38, No.9, pp.871-873, 1992.
- [62] K. S. Yeo, W. L. Goh, Z. H. Kong, Q. X. Zhang, and W. G. Yeo, "High-performance low-power current sense amplifier using a cross-coupled current-mirror configuration," *IEE Proceedings-Circuits*,

- Devices and Systems, Vol.149, No.516, pp.308-314, 2002.
- [63] J. Alowersson and P. Andersson, "SRAM cell for low-power write in buffer memories", *IEEE Symposium on Low Power Electronics*, pp.60-61, 1995.
- [64] K.W. Mai et al., "Low-power SRAM design using half-swing pulse-mode techniques", *IEEE Journal of Solid-State Circuits*, Vol.SC- 33, No. 11, pp. 1659-1671, Nov. 1998.
- [65]T. Mori et al., "A 1V 0.9 mW at 100 MHz 2 k\*16 b SRAM utilizing a half-swing pulsed-decoder and write-bus architecture in 0.25um dual-Vt CMOS", *IEEE International Solid-State Circuits Conference*, *Digest of Technical Papers*, pp. 354-355, 1998.



#### 簡歷

姓名:王上銘

性 别: 男

年 龄: 民國 57 年 9 月 28 日

籍 貫:臺灣省屏東縣

住址:台東市中華路一段858號

學 歷: 私立逢甲大學 電子工程學系

國立交通大學 電子研究所 碩士班

國立交通大學 電子研究所 博士班

#### 博士論文題目:

以電流模式操作之低功率和高速率的

隨機靜態存取記憶體

1896

Low Power and High Speed SRAM with Current-Mode Techniques