# Low-Power Multiport SRAM With Cross-Point Write Word-Lines, Shared Write Bit-Lines, and Shared Write Row-Access Transistors

Dao-Ping Wang, Hon-Jarn Lin, Ching-Te Chuang, Fellow, IEEE, and Wei Hwang, Fellow, IEEE

Abstract—This brief proposes one-write–one-read (1W1R) and two-write–two-read (2W2R) multiport (MP) SRAMs for register file applications in nanoscale CMOS technology. The cell features a cross-point Write word-line structure to mitigate Write Half-Select disturb and improve the static noise margin (SNM). The Write bit-lines (WBLs) and Write row-access transistors are shared with adjacent bit-cells to reduce the cell transistor count and area. The scheme halves the number of WBL, thus reducing WBL leakage and power consumption. In addition, column-based virtual  $V_{\rm SS}$  control is employed for the Read stack to reduce the Read power consumption. Post-sim results show that the proposed scheme reduces both Write/Read current consumption by over 30% compared with the previous MP structure. The proposed scheme is demonstrated and validated by an 8-Kb 2W2R SRAM test chip fabricated in TSMC 40-nm CMOS technology.

*Index Terms*—Half-Select, multiport SRAM, read path, two-port (TP).

#### I. INTRODUCTION

N SOC applications, multicore designs are frequently employed for advanced processors to achieve high performance. Thus, high-efficiency memory access among multicores is vital for overall system throughout and performance. A multiport (MP) register file with multiple access ports and multiple entries offers a highly parallel operation to meet the high-bandwidth requirement. However, the memory cell size typically increases drastically as the number of access ports increases. Therefore, limiting the memory access ports to improve the memory area, power, and access time is critical. The most widely used element is two-port (TP) SRAM because it enables the parallel operation for high-speed communication, video, and register file applications in high-performance processors. Thus, SOC that employs MP SRAM design for parallel operations can improve efficiency [1]-[4]. Fig. 1(a) shows a four-read-four-write (4R4W) multibank register file architecture with a TP SRAM in each bank.

Digital Object Identifier 10.1109/TCSII.2013.2296137



Fig. 1. (a) 4R4W multibank register file with TP SRAM in each bank. (b) Conventional 8T TP cell. (c) Cross-point 10T TP cell.

Fig. 1(b) shows the conventional 1W1R TP 8T bit-cell. The dedicated Read port decouples the Read current path from the cell storage node to eliminate Read disturb. However, the Write operation is performed the same way as the conventional 6T SRAM; hence, the Write Half-Select disturb persists. The Write Half-Select disturb is typically mitigated using various writeback schemes [5]–[7] at the expense of extra devices/circuits, power, cycle time, and throughput.

Fig. 1(c) shows the schematic of a cross-point 10T (CP10T) cell with a cross-point Write Word-line structure. The addition of the column-based Write access transistors (NYL and NYR) eliminates the Write Half-Select disturb to improve the static noise margin (SNM) [8]. One study [9] presented a TP 6T cell with shared read/write assist transistors for each word. This design mitigates the read/write-disturb to improve the SNM. However, the bit cells of a word cannot be interleaved, thus degrading the soft error immunity. Another new TP 8T SRAM architecture with near-one-read/one-write (1R1W) capability of TP operation has been proposed [10]. This architecture provides a local write-back within one subarray, and the remaining subarrays can still perform read operations. However, the near-1R1W TP operation has one limitation; certain ranges of row addresses are unavailable for simultaneous read and write operations.

This brief proposes a 9T TP cell (see Fig. 2) featuring a crosspoint Write word-line structure to eliminate Write Half-Select disturb and improve the SNM. The Write bit-lines (WBLs) and Write row-access transistors are shared with adjacent

Manuscript received June 9, 2013; revised October 12, 2013; accepted November 29, 2013. Date of publication January 22, 2014; date of current version March 14, 2014. This brief was recommended by Associate Editor H.-J. Yoo.

The authors are with the Department of Electronics Engineering and Institute of Electronics, National Chiao-Tung University, Hsinchu City 300, Taiwan.

Color versions of one or more of the figures in this brief are available online at http://ieeexplore.ieee.org.



Fig. 2. Proposed cross-point 9T TP cell with shared WBL and shared Write row-access transistors. Four columns are shown as an example.

bit-cells to reduce the cell transistor count and area. The scheme halves the number of WBLs, thus reducing WBL leakage and power consumption. The design also utilizes a column-based virtual  $\rm V_{SS}$  control for the Read stack to reduce the Read power consumption. An extension from the TP bit-cell to the MP bit-cell is also presented.

The rest of this brief is organized as follows: Section II discusses the proposed cross-point 9T TP cell and its operation. The layout of two adjacent TP cells with shared WBL and Write row-access transistors is also presented. Section III presents the extension to two-write/two-read (2W2R) MP cell. The measured silicon results of an 8-Kb 2W2R SRAM test chip fabricated in TSMC 40-nm CMOS technology are also shown. Section IV concludes this brief.

## II. PROPOSED CROSS-POINT 9T TP CELL

Fig. 2 shows the proposed cross-point 9T TP cell. Four columns are shown to illustrate the shared WBL and shared Write row-access transistor. The row-based Write word-line (WX) and the column-based Write word-line (WY0 to WY3) form a cross-point Write structure, thus eliminating Write Half-Select disturb to improve the SNM. The WBLs (e.g., WBL2 and WBL2B) and the corresponding Write row-access transistors (controlled by WX) are shared among adjacent cells.

The adjacent cells in Column 0 and Column 1 are connected by Y-direction (column) access nMOSs controlled by WY0 and WY1. They share a common X-direction (row) access nMOS controlled by WX. The shared nMOS controlled by WX connects to the common WBL WBLB0, which is shared between Column 0 and Column 1. Similarly, the adjacent cells in Column 1 and Column 2 are connected by Y-direction (column) access nMOSs controlled by WY1 and WY2. They



Fig. 3. (a) Butterfly curves of the conventional 8T TP cell. Blue line: Hold SNM; Red line: Write Half-Select SNM. (b) Monte Carlo simulation results of the SNM of the proposed 9T cell in Fig. 2 with 10000 samples. Green line: Hold SNM; Orange line: Write Half-Select SNM. ( $V_{\rm DD} = 0.6$  V, FNSP corner).

share a common X-direction (row) access nMOS controlled by WX.

The shared nMOS controlled by WX connects to the common WBL2, which is shared by Column 1 and Column 2. As shown in Fig. 2, the WBL pair for Column 1 is WBLB0 and WBL2. The WBL pair for Column 2 is WBL2 and WBLB2. The even-numbered WBL pairs are retained, whereas the oddnumbered WBL pairs are omitted. Thus, the number of WBL is halved compared with that of the conventional TP 8T cell array. The proposed scheme reduces the cell array area, WBL leakage, and power consumption, compared with the cross-point 10T TP cell shown in Fig. 1(c).

#### A. SNM of Proposed 9T Cell

Fig. 3(a) shows the butterfly curves for Hold SNM and Write Half-Select SNM of the conventional 8T TP cell at  $V_{\rm DD} = 0.6$  V, FNSP corner (where the SNM is the worst).

The degradation in the SNM due to Write Half-Select disturb can be clearly seen. Fig. 3(b) shows the Monte Carlo simulation results of the distribution of the SNM of the proposed 9T cell with 10 000 samples at  $V_{\rm DD} = 0.6$  V, FNSP corner. The green curve represents the case with both WX and WY deactivated, corresponding to Hold SNM. The orange curve represents the case with WX activated while WY remains deactivated, corresponding to Write Half-Select SNM. The Write Half-Select SNM (Mean = 132.6 mV and  $\sigma = 18.9$  mV) can be seen to almost completely coincide with the Hold SNM (Mean = 135 mV and  $\sigma = 18.9$  mV), thus validating the disturb-free nature of the proposed 9T cell.



Fig. 4. Layout of two adjacent 9T TP cells in Column 0 and Column 1.



Fig. 5. (a) Column-based virtual  $V_{\rm SS}$  control for the Read stack. (b) Write/read current reduction ratio versus  $V_{\rm DD}$  for a 4-Kb TP SRAM.

#### B. Layout Implementation

Fig. 4 shows the layout of two adjacent 9T TP cells in Column 0 and Column 1. The WBLB0 is shared between Column 0 and Column 1, as shown in Fig. 2. Therefore, the number of WBLs is halved. Furthermore, WX controls only one Write row-access nMOS for each cell. Therefore, the row-based Write WL gate capacitance of the 9T cell array is also halved.

### C. Column-Based Virtual V<sub>SS</sub> Control for Read Stack

To reduce Read current and Read power consumption, column-based virtual  $V_{\rm SS}$  control for the Read stack is employed, as shown in Fig. 5(a). When the column control signal RY[n] is activated, RYnB goes down to 0 V to enable the corresponding Read stack. Thus, only the selected Read BL (RBL) can discharge from  $V_{\rm DD}$  to ground when the cell storage node is "1," and the other (unselected) Read BLs cannot discharge. The proposed bit-interleaving design selects only one out of eight RBLs. The column-based virtual  $V_{\rm SS}$  control reduces Read current by over 30% for a 4-Kb TP SRAM, as



Fig. 6. Illustration of the Write-disturb-free nature of the proposed 2W2R MP cell. Two adjacent 2W2R MP cells in the same row are shown.

shown in Fig. 5(b), which was simulated at 25 °C, 25 MHz to accommodate 0.6-V low-voltage operation.

#### III. EXTENSION TO 2W2R MP SRAM

The proposed scheme can be extended to 2W2R MP SRAM design in a 4R4W register file. Fig. 6 shows two proposed 2W2R MP cells in one row [11]. When Cell\_1 writes "0" from A-port, the WBL for B-port WBBL0 remains precharged. In this case, WAY0 and WBY0 are deactivated because Cell\_0 is not selected for Write operation, and the path to the storage node of Cell\_0 is cut. Therefore, Cell\_0 maintains its hold state and avoids the Write-disturb problem. XCut and YCut are used to cut off the feedback loop of the selected bit-cell to enhance the Write-ability. When either A-port WAX0 or B-port WBX0 is selected, X0Cut becomes "High" to turn off the pMOS. YCut is a NOR function of WAY and WBY. When either WAY or WBY is selected, YCut is deactivated to turn off the nMOS. The series resistance of the Y-access MOS and write WLaccess transistor raises the access time delay 3%. The shared WBL scheme halves the WBL number, thus reducing the write current consumption by 30%.

Fig. 7(a) shows the 8-Kb MP architecture and the conflict resolving scheme in the 2W2R MP SRAM. To eliminate the conflict problem, a conflict-detecting circuit is included. The Read priority is permanently set higher than the Write priority. When Read and Write addresses are the same, only the Read operation is active, and the Write operation is skipped. At the completion of conflict detection, True\_WEN signal is generated to trigger the next stage.

Fig. 7(b) shows the Read data path with a replica column. Fig. 7(c) shows the chip photo of an 8-Kb 2W2R MP SRAM in TSMC 40-nm CMOS technology. The 8-Kb SRAM chip consists of two banks. Each bank has a 4-Kb capacity, with 64 rows and 64 columns. The MP SRAM contains two Write ports, namely, the Write-A port and the Write-B port, on the upper section. It also contains two Read ports, namely, the Read-A port and the Read-B port, on the lower section. The test chip area is 0.9701 mm<sup>2</sup>.

A simulation test with 0.9 V  $V_{\rm DD}$  has executed MP A/B port write/read at the same row with A-port activating column 0 and B-port activating column 7. The corresponding waveform is shown in Fig. 8(a). The Shmoo plot of the 2W2R 8-Kb MP SRAM showing working ranges of array  $V_{\rm DD}$  (a $V_{\rm DD}$ ) and peripheral  $V_{\rm DD}$  is shown in Fig. 8(b). The working array  $V_{\rm DD}$ 



Fig. 7. (a) Conflict-resolving scheme in the 2W2R MP SRAM. (b) Read data path. (c) Chip photo of the 40-nm 2W2R 8-Kb MP SRAM.

ranges from 1.4 V down to 0.5 V, illustrating and validating the low-voltage low-power capability of the proposed scheme. Fig. 8(c) shows measured and simulated current versus  $V_{\rm DD}$  with TP Write and TP Read operation of the 8-Kb MP SRAM.



Fig. 8. (a) Simulation waveform of MP A/B port write/read test pattern at the same row. (b) Shmoo plot of 2W2R 8-Kb MP SRAM showing working ranges of array  $V_{\rm DD}$  (avdd) and peripheral  $V_{\rm DD}$  (vdd). (c) Measured current and simulated current versus  $V_{\rm DD}$  with TP Write and TP Read operation of the 8-Kb MP SRAM.

The measured current is close to the simulated current for array  $V_{\rm DD}$  ranging from 0.6 to 1.3 V.

Table I compares the characteristics of the proposed MP SRAM with those from relevant literatures, including Writedisturb, SRAM macro area (including the cell array and the periphery circuit), and power consumption. Only the proposed scheme and that in ISQED [9] are free of Write-disturb, but the cell structure in [9] does not support bit-interleaving and is thus prone to soft-error. Reference [10] provides write-back in a subarray to overcome write-disturb, but some range of row addresses is not available for read and write operation at the same time. In SRAM macro area comparison, this work is 1.027x area due to additional write-enhancement control transistors in the bit-cell. The macro area in [9] is 1.2x due to the extra local sub-word-line driver per 32 bits. In operation frequency

| Item                                           | 2008                               | 2010                                     | 2011                             | This                      |
|------------------------------------------------|------------------------------------|------------------------------------------|----------------------------------|---------------------------|
|                                                | JSSC[6]                            | ISQED[9]                                 | ISLPED[10]                       | work                      |
| Port                                           | 1W1R                               | 1W1R                                     | 1W1R                             | 2W2R                      |
| Write-disturb<br>free                          | No                                 | Yes                                      | No                               | Yes                       |
| Half-select<br>solution                        | Write-<br>back                     | Extra<br>sub-WL<br>driver per<br>32 bits | Local<br>sub-array<br>write-back | Shared<br>BL<br>structure |
| Read-port<br>feature                           | Single-<br>ended BL                | Read assist<br>transistor                | Single-<br>ended BL              | Selective<br>read path    |
| SRAM<br>macro area                             | 1x                                 | 1.2x                                     | 1.06x                            | 1.027x                    |
| Operational<br>frequency                       | 1x                                 | 0.9x                                     | 1x                               | 1x                        |
| Power<br>consumption                           | 1x                                 | 0.72x                                    | 1x                               | 0.68x                     |
| Bit-cell<br>leakage at<br>0.6V V <sub>DD</sub> | 1x                                 | 0.65x                                    | 1x                               | 1.29x                     |
| Technology                                     | 65nm<br>SOI<br>CMOS                | 65nm Bulk<br>CMOS                        | 45nm Bulk<br>CMOS                | 40nm<br>Bulk<br>CMOS      |
| Applications                                   | High<br>performa-<br>nce<br>caches | Low power<br>embedded<br>systems         | DVFS<br>enabled<br>processors    | Low<br>power<br>processor |

TABLE I THE RELATED MULTI-PORT SRAM COMPARISON

comparison, [9] has lower operation frequency because of the modified read-port and the stacking structure in the read-port, which slows down the read performance. The proposed scheme has the lowest power consumption at 0.68x compared with the conventional 2W2R SRAM. Table I illustrates that the proposed scheme offers the least area overhead and the most power savings. The power estimation in prior works is assumed to operate at the same clock frequency as this work and calculate its equivalent power consumption at 400 MHz, 25 °C.

## IV. CONCLUSION

This brief has presented 1R1W and 2W2R MP SRAM design with improved low-voltage/power capability for register file applications. The proposed cell features a cross-point Write word-line structure to mitigate Write Half-Select disturb. The WBLs and Write row-access transistors are shared with adjacent bit-cells to reduce the cell transistor count and area. The scheme halves the number of WBL, thus reducing WBL leakage and power consumption. Column-based virtual  $V_{SS}$  control is employed for the Read stack to reduce the Read power consumption. The proposed scheme offers superior area overhead and power saving compared with previous MP structures. An 8-Kb 2W2R SRAM test chip is demonstrated in 40-nm CMOS technology.

#### REFERENCES

- [1] G. S. Ditlow, R. K. Montoye, S. N. Storino, S. M. Dance, S. Ehrenreich, B. M. Fleischer, T. W. Fox, K. M. Holmes, J. Mihara, Y. Nakamura, S. Onishi, R. Shearer, D. Wendel, and L. Chang, "A 4R2W register file for a 2.3 GHz wire-speed POWER processor with double-pumped write operation," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2011, pp. 256–258.
- [2] K. Nii, Y. Tsukamoto, M. Yabuuchi, Y. Masuda, S. Imaoka, K. Usui, S. Ohbayashi, H. Makino, and H. Shinohara, "Synchronous ultra-highdensity 2RW dual-port 8T-SRAM with circumvention of simultaneous common-row-access," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 977–986, Mar. 2009.
- [3] Y. Ishii, H. Fujiwara, K. Nii, H. Chigasaki, O. Kuromiya, T. Saiki, A. Miyanishi, and Y. Kihara, "A 28-nm dual-port SRAM macro with active bitline equalizing circuitry against write disturb issue," in *Proc. IEEE Symp. VLSI Circuits*, 2010, pp. 99–100.
- [4] Y. Ishii, H. Fujiwara, S. Tanaka, Y. Tsukamoto, K. Nii, Y. Kihara, and K. Yanagisawa, "A 28 nm dual-port SRAM macro with screening circuitry against write-read disturb failure issues," *IEEE J. Solid-State Circuits*, vol. 46, no. 11, pp. 2535–2544, Nov. 2011.
- [5] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, "A stable 2-port SRAM cell design against simultaneously read/writedisturbed accesses," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2109– 2119, Sep. 2008.
- [6] L. Chang, R. K. Montoye, Y. Nakamura, K. A. Batson, R. J. Eickemeyer, R. H. Dennard, W. Haensch, and D. Jamsek, "An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 956–963, Apr. 2008.
- [7] J. J. Wu, Y. H. Chen, M. F. Chang, P. W. Chou, C. Y. Chen, H. J. Liao, M. B. Chen, Y. H. Chu, W. C. Wu, and H. Yamauchi, "A large σV<sub>TH</sub>/VDD tolerant zigzag 8T SRAM with area-efficient decoupled differential sensing and fast write-back scheme," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 815–827, Apr. 2011.
- [8] I. J. Chang, J. J. Kim, S. P. Park, and K. Roy, "A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 650–658, Feb. 2009.
- [9] J. Singh, D. S. Aswar, S. P. Mohanty, and D. K. Pradhan, "A 2-Port 6T SRAM bitcell design with multi-port capabilities at reduced area overhead," in *Proc. 11th Int. Symp. Quality Electron. Design*, 2010, pp. 131–138.
- [10] S. P. Park, S. Y. Kim, D. Lee, J. J. Kim, W. P. Griffin, and K. Roy, "Column-selection-enabled 8T SRAM array with ~1R/1W multi-port operation for DVFS-enabled processors," in *Proc. Int. Symp. Low Power Electron. Design*, 2011, pp. 303–308.
- [11] D. P. Wang and W. Hwang, "A two-write and two-read multi-port SRAM with shared write bit-line scheme and selective read path for low power operation," *J. Low Power Electron.*, vol. 9, no. 1, pp. 9–22, Apr. 2013.