# A MICRO-WATT MULTI-PORT REGISTER FILE WITH WIDE OPERATING VOLTAGE RANGE

Shyh-Chyi Yang, Hao-I Yang, and Wei Hwang

Department of Electronics Engineering & Institute of Electronics, and Microelectronics and Information System Research Center (MIRC) National Chiao-Tung University, Hsin-Chu 300, Taiwan a90914@hotmail.com, haoi.ee94g@nctu.edu.tw, hwang@mail.nctu.edu.tw

Abstract - In this paper, a micro-watt multi-port register file with wide operating voltage range for micro-power applications is presented. Multibank architecture for simultaneous access with collision detecting technique is proposed. The architecture has been analyzed under wide operating voltage range between 1V to 0.25V with varies process corner. Negative voltage write scheme ensures successful write in deep sub-threshold region. Also, an improved read buffer footer and controllable pre-charge in read scheme are designed. A 4W/4R 16KB register file is implemented in UMC 90nm CMOS technology. The simulation results show that the maximum active power of multi-port register file can achieve near 22.3-22.9uW at 485 KHz under 0.25V.

## I. INTRODUCTION

Sub-threshold operation can achieve orders of magnitude low power consumption compared to conventional super-threshold operation. It can be used in applications such as medical devices, portable device, sensor networks and wireless body area network (WBAN) where performance is not constrained.

Register file is a key component of many processors or SoC applications. Not only its access time dominates the application speed but also its area and power occupies most part of chip in high performance processor design. In order to achieve sufficient bandwidth, designers increase the port number on bit-cell in conventional register file design [1]. However, such approaches make the bit-cell have larger area, worse noise margin, longer access time and limited operation voltage. To address these issues, many techniques were investigated to reduce the port number [2]. In this paper, a low power multibank architecture for simultaneous access with collision detecting technique is proposed. The port number can also be reduced. The architecture has been analyzed under wide operating voltage range between 1V to 0.25V. The proposed register file can be applied to the Superscalar architecture or VLIW (Very Long Instruction Word) DSP.

The rest of this paper is organized as follows: Section II describes the architecture of low power multi-port register file. Section III presents circuit techniques to improve register file performance under wide voltage operating. Section IV shows simulation results, and conclusions are given in Section V.

# II. ARCHITECTURE OF LOW POWER MULTI-Port Register File

The proposed architecture of register file has 4 Read ports and 4 Write ports, and there are 4 banks, each bank capacity has 4KB with bit-interleaving design, as shown in Fig.1. Instruction issue stage can use simple control signal and address to determine each execution This research is supported by National Science Council, R.O.C., under project NSC 96-2220-E-009-027. This work is also supported by Ministry of Education, R.O.C., under the project 5YT6B. The authors would like to thank ITRI, TSMC, and Ministry of Economic Affairs for their support.



unit to access the particular storage bank simultaneously. The main function of the switch circuit is to authorize which execution unit can access the corresponding bank. If address collision happens, switch circuit still correctly selects an access with high priority and issue the collision signal back to instruction issue stage. In this architecture, each execution unit has a higher priority to access the corresponding bank. For example, execute unit 0 has a higher priority to access the bank 0. Consequently, each bank can simultaneously perform write and read operation for the same or the different execution units dependent on application. In other words, the register file can support four different applications performing the access or support one program for the simultaneous multi-access like VLIW DSP.

Switch circuit addresses the collision and ensures exact access with high priority in this multibank structure. As a result, it is not necessary to add more ports on bit-cell which increases area and power consumption of register file. In addition, local decoder is turned off while the switch circuit detects no access in this bank.

# III. DESIGN CONCEPT

#### A. DUAL VT 8T BIT-CELL WITH HIGH-VT PASSGATE

Fig. 2 shows the circuit diagram of dual Vt 8T bit-cell which is adopted in proposed register file. In bit-interleaving architecture, conventional 8T bit-cell has worse noise margin in write half select mode [3]. Dual Vt 8T bit-cell with high-Vt passgate can increase noise margin which is very important for sub-threshold region operation and decrease the power in write half-select mode. If system



Fig. 3: Properties of dual Vt 8T bit-cell (a) Power consumption in Write half select mode; (b) Write half select noise margin of dual Vt 8T bit-cell compared with conv. 8T bit-cell; (c) Write margin.

only operates at low voltage, such as 0.4V or 0.5V, bit-cell designed with the same size in all gates also performs suitable noise margin, as shown in Fig. 3. However, this sub-threshold bit-cell area is larger than conventional 8T bit-cell which only works in high voltage region. The reason is that capability of NMOS and PMOS driving current in sub-threshold region is different from high voltage region and the variations in Vt caused by random dopant fluctuations (RDF) exhibits an inverse dependence on the square root of gate area [7]. As a result, this dual Vt 8T bit-cell is almost 2X larger than that of a standard 8T bit-cell in [8], to work successfully in sub-threshold region across various corner. Similarly, the sub-threshold 6T bit-cell of [7] is also ~2X larger than standard 6T bit-cell.

#### B. NEGATIVE VOLTAGE WRITE SCHEME

Dual Vt 8T bit-cell improves the noise margin but decreases write margin, as shown in Fig. 3 (c). Therefore, negative voltage write strategy in this paper replaces conventional write scheme. Fig. 4 (a) shows the write scheme and the proposed negative voltage generator. In addition, the negative voltage generator with MOS capacitance is on-chip design for reducing influence of process variation and the cost. However, capacity of MOS capacitance degrades in sub-threshold region especially in SS corner, as shown in Fig. 5 (a). That means more area required in MOS capacitance for successful operation while the supply voltage scaling down. In order to decrease the area overhead, a negative voltage generator with local BL sensing logic is proposed. Fig. 4 (b) shows the improved write



Fig. 4: Negative voltage generator with local BL sensing logic. (a) Circuit diagram; (b) Bit-line write voltage vs. VDD.





margin and the appropriate disturbance margin which ensures the stability of unselected cell in the same column.

When a write operation begins, negative voltage generator pulls bit-line voltage down. The device generates the negative voltage for bit-line after the sensing device detects voltage zero on bit-line. This



Fig. 6: Read architecture and IREAD tracing replica circuit

similar works also reported in [4]. However, timing of generating negative voltage in [4] is not a suitable design for sub-threshold operating. Required time of discharging bit-line fluctuates a lot in sub-threshold region due to the significant influence caused by PVT variation and bit-line leakage with data dependence. By using the local BL sensing logic, required area of negative voltage generator for successful operation in sub-threshold region can be reduced significantly, as shown in Fig. 5 (b).

## C. IMPROVED READ BUFFER FOOTER, CONTROLLABLE PRE-CHARGE SCHEME AND READ REPLICA CIRCUIT

In contrast with super-threshold region, leakage is an important issue in sub-threshold region. The ratio of ION and IOFF declines from 10e+5 to less than 100 [5]. As the process scaled down, the large gate and junction leakage degrade circuit design. Unfortunately, this impact also makes 8T bit-cell work fail while performing read operation under ultra low voltage. The charge of read bit-line is discharged by the read port of other unselect 8T cells. Many researches provide new 10T or 11T bit-cell to address leakage issue. However, the new bit-cell increases area, power consumption and cost of chip. One of methods to address read fail of conventional 8T bit-cell can be found in [6]. Adding the read buffer footer for eliminating leakage path makes 8T bit-cell perform read operation successfully. In original design, the read buffer footer is driven by boosting voltage instead of increase transistor size due to the cost consideration and sizing transistor has little help for current driving in sub-threshold region. Besides, capability of driving current of read buffer footer determines the read access time

However, this scheme limits the performance especially in high voltage region. Although the proposed wide operating voltage range register file is mainly applied to low power product which usually operates under low voltage to sub-threshold voltage region, it is still necessary to improve the performance at high voltage operating since system might increases supply voltage for requiring higher throughput in a short time. Fig. 6 shows the read architecture of the proposed register file. Only using NMOS in read buffer footer is enough due to the stack effect almost eliminates the leakage. Read buffer footers without PMOS can provide shorter read access time with some pattern due to eliminating leakage toward the IREAD, as shown in Fig. 7. Longer read access time increases the leakage power consumption. Relative simulation in Fig. 8 shows the comparison in varies voltage and bit-cell number/bit-line. A large improvement provided by improved read buffer footer presents except 0.3V-0.25V because capacity of MOS capacitance in deep sub-threshold region degrades, and the voltage issue is not the key influence on current driving at high voltage region.



Fig. 7: PMOS of read buffer foot in [6] generates leakage to increase access time in worst pattern.



Fig. 8: Read buffer footer with PMOS increases array read access time in Fig. 7. (a) 0.18V - 1V; (b) 16-256 cells/BL

The read replica circuit traces the selected row's IREAD and leakage to generate the most appropriate width of RWL pulse since IREAD determines read access time. The quantity of IREAD and leakage is dependent on stored data, variation and operation in the corresponding row. Appropriate RWL pulse width each time is important for sub-threshold operating since the fluctuation of RWL pulse width from simulation is more than 30% while reading different column but same row.

Besides, a controllable pre-charge circuit is important to this read architecture. In bit-interleaving architecture, not all columns need to perform read operation. Therefore, controllable scheme not only reduces lots of power consumption in array, but also decreases IREAD which reduces overhead of current driving of read buffer footer especially in high voltage operating. With 4-bit bit-interleaving architecture, the IREAD can be reduced to 1/4 in the worst pattern. Access time in array reduces to 42-63% and array power reduces to 82% in the proposed scheme, as shown in Fig.9.

Proposed improved read buffer footer and controllable precharge scheme decrease the read access time and power, in addition, read replica circuit also generates the shortest and the most appropriate RWL pulse width by tracing IREAD for successful read operation each time.

# **IV. SIMULATION RESULTS**

The proposed 4W/4R 16KB low power multi-port register file with wide operating voltage range is implemented in UMC 90um CMOS technology. It can operate at 433MHz at 1V, 48MHz at 0.5V and 485 KHz at 0.25V, respectively. While the register file works under 433MHz for 4 simultaneous accesses, it consumes 4.97mW and 2.53mW during write and read operations respectively. When it works under 485 KHz for 4 simultaneous accesses, it consumes 22.3uW and 22.9uW during write and read operations respectively. In most of the time, operating voltage of low power application is under 0.5V or below. High voltage operating for performance is only in a short period of time. This proposed register file can achieve the requirement.

Fig. 10 shows the layout micrograph of the main array in this proposed register file (4x32x32bits, due to the available shuttle of UMC, 2009). Furthermore, by increasing more area of MOS capacitance of boosting and negative voltage generator in proposed design, operating voltage of this proposed register file can scale down to 0.18V. In 0.18V, proposed register file works successfully across varies process and temperature corner. However, it works fail at 0.17V or below due to the large gate leakage and sub-threshold leakage in worst corner. Fig. 12 shows the access time in FF 75°C corner is 254X times in SS -15°C corner while operating in deep sub-threshold region, 0.18V.

## V. CONCLUSIONS

A low power multibank architecture for simultaneous access with collision detecting technique is presented. For the case the performance is non-critical, the supply voltage can operate at sub-threshold region (<0.5V). A new dual Vt 8T bit-cell, negative voltage write scheme with local BL sensing logic, and read scheme with read footer improvement, controllable pre-charge scheme and read replica circuit are proposed. A 4W/4R 16KB register file under wide operating voltage range between 1V to 0.25V has been designed and implemented in UMC 90nm CMOS technology. The results show that register file is operated properly at ultra low voltage. The power consumption and operating frequency are 823uW, 48MHz at 0.5V and 22.9uW, 485 KHz at 0.25V, respectively. The proposed register file will be useful for the future micro-power applications.

## VI. REFERENCE

- H. Jessica, et al., "A Speculative Control Scheme for an Energy-Efficient Banked Register file," IEEE Trans. Vol. 54, No. 6, pp. 741, 2005.
- [2] S.Eric, et al., "The Parity Protected, Multithreaded Register Files on the 90-nm Itanium Microprocessor, IEEE JSSC, Vol. 41, No.1, pp. 246, 2006.
- [3] S. Ishikura, et al., "A 45 nm 2-port 8T-SRAM Using Hierarchical Replica Bitline Technique With Immunity From Simultaneous R/W Access Issues," IEEE JSSC, Vol. 43, pp. 938, 2008.
- [4] N. Shibata, et al., "A 0.5-v 25mhZ 1-Mw 256-Kb MTCMOS/SOI SRAM for Solar-Power-Operated Portable Personal Digital Equipment --- Sure Write Operation by Using Step-Down Negatively Overdriven Bitline Scheme," IEEE JSSC, vol. 41, pp. 728, 2006.
- [5] A. Wang, et al., "A 180-mV subtreshold FFT processor using a minimum energy design methodology," IEEE JSSC, Vol. 40, pp.310, 2005.

- [6] N. Verma, et al., "A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense Amplifier Redundancy," IEEE JSSC, Vol. 43, pp. 141, 2008
- [7] B. Zhai, et al., "A Variation-Tolerant Sub-200 mV 6-T Subthreshold SRAM," IEEE Journal of Solid-State Circuits, vol. 43, Issue 10, pp. 2338 - 2348, Oct. 2008
- [8] Y. MORITA, et al., "Area Optimization in 6T and 8T SRAM Cells Considering *V*th Variation in Future Processes," IEICE Transactions on Electronics, pp. 1949-1956, October 2007



Fig. 9: Controllable pre-charge scheme reduces both power and access time in this read architecture.



Negative voltage generator

Fig. 10: Layout micrograph (4x32x32 bits).

| Configuration     | 4W/4R 4x128x32bits |       |        |
|-------------------|--------------------|-------|--------|
| Technology        | UMC 90nm CMOS      |       |        |
| Operating Voltage | 1V                 | 0.5V  | 0.25V  |
| Max. Frequency    | 433MHz             | 48MHz | 485KHz |
| Max. Read Power   | 2.53mW             | 443uW | 22.3uW |
| Max. Write Power  | 4.97mW             | 823uW | 22.9uW |

Fig. 11: The simulation results (worst corner).

Access Time Comparison in Deep Sub-threshold (0.18V)



Fig. 12: If increasing more large area of MOS capacitance in negative voltage generator and boost voltage circuit, operating voltage of this proposed register file can scale down to 0.18V. From simulation, access time significantly varies across process and temperature variation.