# A Single-Ended Disturb-Free 9T Subthreshold SRAM With Cross-Point Data-Aware Write Word-Line Structure, Negative Bit-Line, and Adaptive Read Operation Timing Tracing Ming-Hsien Tu, Jihi-Yu Lin, Ming-Chien Tsai, Chien-Yu Lu, Yuh-Jiun Lin, Meng-Hsueh Wang, Huan-Shun Huang, Kuen-Di Lee, Wei-Chiang (Willis) Shih, Shyh-Jye Jou, *Senior Member, IEEE*, and Ching-Te Chuang, *Fellow, IEEE* Abstract—This paper presents a novel single-ended disturb-free 9T subthreshold SRAM cell with cross-point data-aware Write word-line structure. The disturb-free feature facilitates bit-interleaving architecture, which can reduce multiple-bit upsets in a single word and enhance soft error immunity by employing Error Checking and Correction (ECC) technique. The proposed 9T SRAM cell is demonstrated by a 72 Kb SRAM macro with a Negative Bit-Line (NBL) Write-assist and an adaptive Read operation timing tracing circuit implemented in 65 nm low-leakage CMOS technology. Measured full Read and Write functionality is error free with $V_{\rm DD}$ down to 0.35 V ( $\sim$ 0.15 V lower than the threshold voltage) with 229 KHz frequency and 4.05 μW power. Data is held down to 0.275 V with 2.29 µW Standby power. The minimum energy per operation is 4.5 pJ at 0.5 V. The 72 Kb SRAM macro has wide operation range from 1.2 V down to 0.35 V, with operating frequency of around 200 MHz for $V_{\rm DD}$ around/above 1.0 V. *Index Terms*—Low power, low voltage, negative bit-line (BL), subthreshold SRAM cell, timing tracing. ## I. INTRODUCTION RECENTLY, the demand for ultra-low power dissipation battery-operated devices is increasing. If the performance at low supply voltage ( $V_{\rm DD}$ ) can still meet the system requirements, the system power dissipation can be reduced significantly by scaling down the supply voltage. Fig. 1 shows the measured oscillation frequency, power dissipation and energy per oscillation of a 399-stage NAND-type ring oscillator using 65 nm low leakage CMOS process with threshold voltage ( $V_{\rm TH}$ ) around 0.5 V. The total power and leakage power decrease drastically with $V_{\rm DD}$ scaling, and leakage power dominates the total power in deep subthreshold region even in low leakage process. Total Energy per oscillation decreases Fig. 1. Measured (a) oscillation frequency, power, and (b) energy per oscillation of 399-stage NAND-type ring oscillator versus supply voltage. Manuscript received September 23, 2011; revised January 23, 2012; accepted January 25, 2012. Date of publication April 13, 2012; date of current version May 22, 2012. This paper was approved by Associate Editor Peter Gillingham. This work was supported in part by the Ministry of Economic Affairs (99-EC-17-A-01-S1-124), and the Ministry of Education under the ATU program. The authors are with the Electronics Engineering Department and Institute of Electronics, National Chiao Tung University, Hsinchu, 300 Taiwan (e-mail: minghsien.ee95g@gmail.com). Y.-J. Lin, M.-H. Wang, H.-S. Huang, K.-D. Lee, and W. Shih are also with Faraday Technology Corporation, Hsinchu, Taiwan. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2012.2187474 first with $V_{\rm DD}$ scaling. However, as the leakage energy starts to dominate with $V_{\rm DD}$ near/below the threshold voltage, a minimum energy point is formed near the threshold voltage. A circuit can achieve ultra-low power dissipation by operating in the subthreshold region, but the circuit must face the challenges of significantly degraded Ion/Ioff ratio and the large Process, Voltage, Temperature (PVT) variations in the subthreshold region. For example, typical Ion variation in super-threshold region is less than a factor of 2, while that in subthreshold region is several orders of magnitude. SRAM is a critical component in memory rich SoC today. The conventional 6T SRAM cell achieves large storage capability with simple structure, yet suffers from Read disturb, Half-Select disturb, and the conflicting Read/Write requirements [1]. As such, the stability of 6T SRAM degrades significantly with $V_{\rm DD}$ scaling, and its $V_{\rm MIN}$ dictates the overall system power supply and hence power consumption. Various SRAM cells [2]-[35] have been proposed to enhance stability of SRAM cell for robust low voltage/power operation. In [2]–[6], asymmetric SRAM cells enhance the Read stability by weakening one-side of NMOS pull-down transistor with single Read port to mitigate Read disturb. In [7], [8], Schmitt-trigger-based SRAM cells are formed by applying half Schmitt trigger in pull-down path of a SRAM cell. The feedback mechanism of the half Schmitt trigger raises the trip voltage of the cross-coupled inverters unidirectionally. thereby reducing Read-disturb to mitigate stability degradation. However, the stability of the SRAM cells in [2]-[8] still suffer Read disturb. In [9], feedback-cutoff NMOS transistors are used to isolate cell storage node from Read BL. However, feedback-cutoff NMOS transistors also cause floating storage node, which is easily affected by leakage current and coupling noise [44], especially in subthreshold region. In [10]-[33], various Read buffers are used to decouple storage nodes of cells from BLs to eliminate Read disturb, thus achieving Read SNMs equal to Hold SNMs. Compared with super-threshold operation, in subthreshold region, alpha-particles or energetic cosmic rays can potentially induce soft errors more easily as Q<sub>crit</sub> is reduced, and Multiple Cell Upsets (MCU) may occur more frequently [37]. MCU can be reduced effectively by combining bit-interleaving architecture with Error Checking and Correction (ECC) technique [40]. The SRAM designs in [45], [46] use ECC technique to reduce soft errors and meet the required yield for low-voltage operation. However, since the cells in [10]–[29] use the same Write mechanism as the conventional 6T SRAM cell, the half-selected cells on the selected Word-Line (WL) perform dummy Read operation, thus degrading Write-Half-Select (WHS) stability and not suitable for bit-interleaving architecture. WHS disturb can be eliminated by using cross-point Write structure, where both the row-based WL and column-based Write WL (WWL) of a selected cell must be enabled for Write operation [32]-[36]. However, since the cells in [34]–[36] eliminate only Write-Half-Select disturb, they still suffer stability degradation due to Read disturb. For robust subthreshold operation, this paper presents a novel single-ended disturb-free 9T subthreshold SRAM cell with following features: (i) cross-point Write structure with data-aware column-based Write WL to eliminate WHS disturb, (ii) Read buffer for Read stability enhancement, and (iii) single BL for Read/Write to improve density and BL power dissipation. The detail and operation of the 9T SRAM cell are described in Section II. Section III discusses the two employed Write/Read-assist circuits: (i) a variation-tolerant and area-efficient Negative BL (NBL) scheme for Write-ability enhancement and (ii) an Adaptive Read Operation Timing Tracing (AROTT) circuit for Fig. 2. (a) Schematic, (b) cell layout, and (c) timing diagram of the proposed 9T SRAM cell. robust subthreshold operation. Section IV explains the architecture and design considerations of a 72 Kb SRAM macro implemented in 65 nm low-leakage CMOS technology. Section V presents measurement results to verify the 9T SRAM performance. Section VI concludes the paper. #### II. THE PROPOSED 9T SRAM CELL Fig. 2(a) and (b) show schematic and cell layout of the 9T SRAM cell, respectively. The 9T SRAM cell consists of a core, M1-M4, and a Read/Write port, M5-M9. The Word-Line (WL) and Virtual VSS (VVSS) are row-based, and Write Word-Line A (WWLA), Write Word-Line B (WWLB), and Bit-line (BL) are column-based. The cell layout shows the layers from diffusion to metal-2. The metal-3 lines of the row-routed WL and VVSS are not shown in the cell layout. The timing diagram of the 9T cell is shown in Fig. 2(c). In Hold mode, WL, WWLA, and WWLB are disabled and VVSS is held at $V_{\rm DD}$ . Data is held by cross-coupled inverters, M1-4, and is decoupled from BL. Fig. 3. Simulated Read SNM comparison at 0.4 V $\rm V_{\rm DD}$ (subthreshold region). Fig. 4. Simulated BL leakage and Read current comparison at $0.4~\mathrm{V}~\mathrm{V}_\mathrm{DD}$ # A. Read Operation With Read Buffer In Read mode, the selected WL is enabled and the corresponding VVSS is forced to ground, while WWLA and WWLB remain disabled. M7-M9 buffer the stored data to conditionally discharge the BL. A full-swing large signal Sense Amplifier (SA) is used to capture the BL voltage for robust Read operation. Since the disabled WWLA and WWLB isolate 'Q' and 'QB' from the BL during the Read mode, the Read Static Noise Margin (RSNM) of the 9T SRAM cell is almost equal to its Hold SNM and is much larger than that of 6T SRAM cell. The 9T SRAM cell has a Read SNM of 151 mV at 0.4 V while that of a 6T SRAM cell is 57 mV as shown in Fig. 3. Even though the 6T SRAM cell can be sized up by increasing width of the pull-down NMOS transistors to mitigate Read disturb, the RSNM of the upsized 6T SRAM cell only improves to 62 mV. Although the 9T SRAM cell (i.e., 420 F<sup>2</sup>) has 97% cell area overhead compared with the 6T SRAM cell (i.e., 214 F<sup>2</sup>) with the same 65 nm logic rule, the 9T SRAM cell gains 2.65 × RSNM improvement compared with the 6T SRAM cell. The RSNM improvement is $2.44 \times$ even when the 6T SRAM cell is sized up to the same area of the 9T SRAM cell. In addition, since devices are stacked in the BL leakage (Read buffer) path and VVSSs of unselected cells sharing the same BL are held at V<sub>DD</sub>, the BL leakage of the 9T SRAM cell is $5.38 \times$ less than that of the 6T SRAM cell as shown in Fig. 4. The Read current of the 9T SRAM cell is only $1.45 \times$ less than that of the 6T SRAM cell. Thus, a BL can afford more cells during Read. # B. Write Operation With Cross-Point Data-Aware WLs Fig. 5(a) illustrates the Write operation of the 9T SRAM cell with data-aware column-based WWLs. In Write '1' mode, WL Fig. 5. (a) Write operations with data-aware Write WLs, (b) Write Half-selected cells in the active row and an active column, and (c) Monte-Carlo simulated Write half-selected SNM comparison at $0.4~\rm V~V_{\rm DD}$ . and WWLA are enabled and VVSS and BL are forced to ground while WWLB remains disabled. Then, node 'Q' is discharged by BL through M6-M7 and by VVSS through M8-M9 to Write '1' into the selected cell. On the other hand, in Write '0' mode, WL and WWLB are enabled and VVSS and BL are forced to ground while WWLA remains disabled. Then, node 'Q' is discharged by BL through M5 and M7 and by VVSS through M8-M9 to write '0' into the selected cell. Notice that the BL always goes down during Write regardless of Write "1" or Write "0". Since both WL and WWLA/WWLB must be enabled to write a cell and each column is selected individually via the values of WWLA and WWLB (i.e., Data-in), the cell provides a cross-point Write structure and writing a cell does not affect the stability of half-selected cells. To mitigate the degradation of Write-ability caused by the series-connected NMOS transistors M5/M6 and M7, a variation-tolerant and area-efficient Negative BL (NBL) Write-assist scheme is employed and illustrated later in detail in Section III.D of Section III. Fig. 5(b) shows half-selected cells at the active row and an active column in Write operation. When the WWLA or WWLB of a selected column is raised to write a cell, WWLAs and WWLBs of unselected columns stay at 0 V to keep M5-6 turn off as shown in Fig. 5(b), thus isolating the 'Q's and 'QB's of the half-selected cells sharing the active WL from BL and VVSS. Hence, the asserted WL does not affect stability of half-selected cells sharing the WL (9T-R). On the other hand, when the WL of a selected row is raised to write a cell, the WLs of unselected rows stay at 0 V to turn off M7-M8 as shown in Fig. 5(b), thus isolating the 'Q's and 'QB's of the half-selected cells sharing the active WWLAs/WWLBs from BL and VVSS. Consequently, the asserted WWLAs or WWLBs do not affect stability of halfselected cells sharing the active WWLAs/WWLBs (9T-C). The disturb-free feature facilitates bit-interleaving architecture to reduce area overhead of peripheral circuits and reduce the multiple-bit soft errors with ECC circuit. Fig. 5(c) shows the Monte-Carlo simulation results of SNM of Write half-selected cells. In Fig. 5(c), 6T-R denotes the Write half-selected cells (in the selected row) whose Write mechanism is the same as the conventional 6T SRAM cell, such as conventional 6T, 8T[10]–[19], and other cells [10]–[29]. The SNMs of both 9T-R and 9T-C are significantly larger than that of 6T-R at 0.4 V. The SNM of both 9T-R and 9T-C with process variations are larger than $0.1*V_{\rm DD}$ (0.04 V), while that of 6T-R has about 1% of cells with negative Write half-selected SNM value which means that 1% of cell storage values are destroyed. Even with the 6T SRAM cell upsized (upsized 6T-R), the Write-Half-Select stability of the 9T SRAM cell is still significantly better due to its disturb-free nature/feature. # C. Single BL for BL Power Saving The proposed 9T SRAM cell has only single BL for both Read and Write operations. The conventional 6T SRAM cell and the cells in [7], [8], [32] use differential BLs for both Read and Write operations. The cells with single Read buffer in [9]–[23], [26]–[31] use differential Write BLs (WBL) for Write operation and single Read BL for Read operation. Fig. 6 compares BL charging and discharging probability (P<sub>BL</sub>) among differential-BL (DBL) cell, differential-WBL + single-RBL (DWBL+SRBL) cell, and single-BL (SBL) cell with precharged BL (to V<sub>DD</sub>). Assume that the probabilities of Read and Write are 50% and 50%, and the probabilities of '1' and '0' of Write data or storage data are 50% and 50%. Due to large process variation in subthreshold region, most cells must discharge BL voltage to ground level in order to ensure that the tail (weak) cells develop enough BL voltage difference for differential sense amplifier [17]. One can thus assume that the average BL Fig. 6. Probability comparison of BL charging and discharging among differential-BL cell, differential-WBL + single-RBL cell, and single-BL cell. voltage difference for differential sensing scheme is 80% of $V_{\rm DD}$ . In the selected cell, the $P_{\rm BL}$ s of DWBL+SRBL cell and SBL cell are 0.75 and smaller than that of DBL cell, which is 0.9. In unselected cells at the active row, the $P_{\rm BL}$ of SBL is 0.5 and is lower than those (0.8 and 0.75) of DBL cell and DWBL+SRBL cell. Furthermore, in bit-interleaving structure (or called as column-MUX structure), the amount of unselected cells is usually more than that of selected cells in an active row. Therefore, SBL cell consumes less BL power dissipation compared to DBL cell and DWBL+SRBL cell. The BL power dissipation is a portion of the total power dissipation, which includes the power dissipation of WLs, WWLAs, WWLBs, VVSSs, and peripheral circuits and depends on process and operation voltage. Table I shows the feature comparison of the cells. The proposed 9T SRAM cell with data-aware Write WLs eliminates not only Read disturb but also Write Half-Select disturb. Furthermore, its area is smaller than the other disturb-free cell [32]. # D. Threshold Voltage and Sizing Considerations For robust subthreshold operation, a SRAM requires 1) high Hold stability, 2) high Read stability, and 3) high Write-ability. SNM is a common metrics for stability evaluation [41]. However, SNM only illustrates voltage noise margin, not current noise margin or the total energy needed to flip the cell. Therefore, SNM alone is not enough to represent stability of a cell. Another stability metrics is N-curve [42], which provides not only Static Voltage Noise Margin (SVNM) but also Static Current Noise Margin (SINM). As such, noise immunity of a cell is better evaluated by both SVNM and SINM. The decision on technology choice plays a key role in designing a proper SRAM macro for the intended application. At a given supply voltage, although the SNMs (and SVNMs) of a cell with low-threshold-voltage devices and a cell with high-threshold-voltage devices may be comparable, the SINM of the cell with high-threshold-voltage devices is lower than that of the cell with low-threshold-voltage devices. Thus, the cell with high-threshold-voltage devices has weaker noise immunity than the cell with low-threshold-voltage devices. In this work, since the 9T SRAM design aims for applications like wireless body-sensing network and low-power hearing aid system with | | 6T | Asy.<br>Cell [4] | ST<br>Cell [7] | 8T | 9T [9] | 8T [18] | 10T<br>[20] | 10T<br>[21] | 8T [27] | 8T<br>[35] | 10T<br>[32] | 9T<br>(This work) | |---------------------------|-------|------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|------------|-------------|-------------------| | Read<br>Disturb Free | Х | Х | Х | 0 | 0 | 0 | 0 | 0 | 0 | Х | 0 | 0 | | Write HS<br>Disturb Free | Х | Х | х | Х | Х | Х | Х | х | х | 0 | 0 | 0 | | Bitline Num. <sup>1</sup> | 2-BL | 2-BL | 2-BL | 2-WBL<br>1-RBL | 2-WBL<br>2-RBL | 2-WBL<br>1-RBL | 2-WBL<br>1-RBL | 2-WBL<br>1-RBL | 2-WBL<br>2-RBL | 2-BL | 2-BL | 1-BL | | Area | 0.77X | 1.1X | 1.62X | 1X | 1.4X | 1.2X | 1.6X | 1.6X | 1X | 1.2X | 1.6X | 1.52X | # TABLE I SRAM CELL COMPARISON. less than 0.5 V $V_{\rm DD}$ , 65 nm Low-Leakage (LL) technology is selected to reduce the total power around 0.5 V $V_{\rm DD}$ . The threshold voltage of the 65 nm LL technology is about 0.5 V. To enhance Read performance and Write-ability, Reverse Short Channel Effect (RSCE) has been utilized to increase the current driving capability of NMOS transistors by lengthening channel lengths in [9], [12], [19], [21], and [30]. However, according to simulation results, RSCE in subthreshold region is not consequential to increase the transistor current strength across all process corners in the 65 nm low-leakage technology. Furthermore, in deeply-scaled technologies, increasing channel width causes V<sub>TH</sub> to increase due to Reverse Narrow Width Effect (RNWE). Thus, increasing channel width is also not consequential to increase the current driving capability. On the positive side, according to simulation results, increasing either channel length or width improves the Ion/Ioff ratio in all process corners in the 65 nm low-leakage technology, and increasing channel length provides more improvement in Ion/Ioff ratio than increasing channel width. Therefore, in our logic rule based cell design, all transistors of the 9T cell have 120 nm minimum channel width. The channel lengths of all cell transistors are increased from the minimum channel length (60 nm) to 70 nm to optimize the ratio between the active discharge current and Standby leakage current of the Read port. # III. READ- AND WRITE-ASSIST CIRCUITS #### A. Read Speed and Correctness Enhancement Techniques Since the 9T SRAM cell does not have Read disturb problem, Read-assist techniques can be used to enhance Read speed and Read correctness without stability degradation. Correct Read operation is determined by four major factors: 1) discharge current of the selected cell, 2) total leakage current of unselected cells on the selected BL, 3) sensing margin of Sense Amplifier (SA), and 4) timing and duration of WL assertion. As mentioned in Section I, since Ion/Ioff ratio of transistors is degraded by scaling $V_{\rm DD}$ down, the ratio of discharge current of a selected cell to total leakage current of unselected cells on the same BL (hereafter referred to as Read current ratio) in subthreshold region is much smaller than that in super-threshold region, thus degrading Read correctness. Several techniques can be used to enhance the discharge current of an active cell: 1) raising cell supply voltage, 2) negative cell ground voltage [33], Fig. 7. Normalized Read '0' BL current versus Boosted WL voltage (64 cells per BL) at 0.4 V $\rm V_{DD}$ , $25^{\circ}\rm C$ . and 3) boosting active Read WL voltage [20], [32]. To mitigate total leakage current of unselected cells, negative unselected WL voltages [9] can be used. However, the technique of negative unselected WL voltage incurs more circuit/area overhead than boosting single active WL. Fig. 7 shows the simulated normalized Read '0' BL current versus boosted WL voltage at 0.4 V V<sub>DD</sub>, 25°C, and three process corners. Compared with increasing channel length (utilizing RSCE) or width, boosting WL voltage increases Read '0' BL current across all three corners, thus enhancing Read current ratio effectively. Hence, both Read speed and Read correctness are enhanced at the same time. However, since Read performance is not the most critical operation for both $V_{\rm MIN}$ and speed of the proposed cell, the technique of boosting WL voltage is not employed in this work in order to save power and area. For Read '1', the number of cells per BL directly affects BL leakage current, and short BL can achieve higher Read current ratio for robust subthreshold operation. In the proposed 9T SRAM cell, the VVSS control can be either column-based or row-based. Although the cell area with column-based VVSS structure is 9.1% less than that with row-based VVSS structure, column-based VVSS structure suffers from Read '1' BL leakage which degrades the Read '1' margin for the selected columns. In column-based VVSS structure, the VVSSs of unselected cells on the selected column are connected to Ground, thus increasing the BL leakage. On the other hand, in row-based VVSS structure, the VVSSs of unselected cells on the selected column are raised to $V_{\rm DD}$ , thus reducing the BL leakage. As shown in Fig. <sup>&</sup>lt;sup>1</sup> BL(bit-line), WBL(Write bit-line), RBL (Read bit-line). Fig. 8. Normalized Read '1' BL leakage current with column-based and row-based VVSS (64 cells per BL) at 0.4 V $\rm V_{DD}$ , TT, 25° C. 8, with 64 cells per BL, row-based VVSS structure with unselected VVSSs raised to $V_{\rm DD}$ achieves 1.54 $\times$ lower Read '1' BL leakage current than column-based VVSS structure, thus increasing Read correctness directly. Hence, this work employs row-based VVSS structure with unselected VVSSs raised to $V_{\rm DD}$ . To Read correct data from BL at a given Read current ratio, sensing margin of SA and timing/duration of WL assertion play key roles. The sensing margin of SA and timing/duration of WL assertion are affected by PVT variations. Adjustable SAs have been applied to enhance Read correctness in [18], [19], [21]. In [18], [19], multiple dummy SAs are implemented and the best SA that achieves highest Read correctness is selected/ used based on post-silicon characterization. Differential SA can be used in single-ended fashion with an adjustable reference voltage (Vref) at the other input. The adjustable Vref is generated by external voltage source or built-in Vref generator to achieve higher Read correctness during test procedure. In [21], a BL replica is implemented to trace the total BL leakage current of unselected cells with the worst-case BL data pattern. Then, the emulated total BL leakage current is used to generate the ground voltage of a large-signal single-ended sense amplifier. Hence, the trip voltage of the large signal sense amplifier is adjusted automatically to account for PVT variation to achieve higher Read correctness. The timing and pulse width of WL pulse are other adjustable factors affecting Read correctness [12], [22]. In this work, an adaptive Read operation timing tracing circuit is employed to adjust optimum operation time automatically by tracking PVT variation. # B. An Adaptive Read Operation Timing Tracing (AROTT) Circuit In subthreshold region, the device current changes significantly with process, voltage, temperature variation as shown in Fig. 9(a), leading to over 2-orders of magnitude variation of the Read access times at 0.4 V (Fig. 9(b)). In Read mode, BLs are discharged to ground or remain at $V_{\rm DD}$ depending on the stored values of selected cells. When WL pulse width is too short, BLs cannot be discharged below the trip voltage of the large-signal SA, thus leading to Read '0' failures. On the other hand, when WL pulse width is too long, BL leakage may cause BL voltage to drop below the trip voltage of the large-signal SA, thus leading to Read '1' failures. Due to the deterioration of Ion/Ioff ratio in low voltage subthreshold operation, the impact of BL leakage is significantly larger than that in super-threshold operation, and a fixed WL pulse width won't be able to cover the over 2 orders of Read access time variations. Additionally, SRAM compiler needs to provide various memory capacities and configurations, thus requiring different WL pulse widths. Therefore, an Adaptive Read Operation Timing Tracing (AROTT) circuit is employed to track Read '0' access times resulting from PVT variations and various memory capacities/configurations. The structure of the AROTT circuit is shown in Fig. 9(c). The AROTT circuit consists of a Read '0' column replica, a row replica and a Finite State Machine (FSM). The Read '0' column replica is composed of loading dummy cells and sinking cells. In order to generate operation times with proper timing margin, the discharging ability of the sinking cells is designed to be weaker than the actual cells. Furthermore, to mitigate the impact of random process variation, users can, based on test results, control which sinking cells are turned on to obtain the most appropriate operation time. In Standby, node 'WLE' is '0', node 'PM' is '1', Dummy WL (DWL) is '0', and Dummy BL (DBL) is precharged to $V_{\mathrm{DD}}$ . When CLK rises, node 'PM' is discharged to ground through M1-M2, causing node 'WLE' to rise to '1'. When node 'WLE' becomes '1', DWL with the loading of the row-replica is charged to V<sub>DD</sub>. When DWL is asserted, the pre-decided replica cells in Read '0' column replica are enabled to discharge DBL with the loading of column replica. The discharged DBL causes node 'WLE' to return to '0'. The pulse width of node 'WLE' is the traced operation time. The loadings of the row replica and column replica are adapted with various memory capacities and configurations. The discharging capability of the replica cells is also affected by PVT variation. Therefore, the AROTT circuit can trace appropriate operation times for PVT variation and various memory capacity and configuration. The simulated traced operation times are about $1.25 \times$ of the Read '0' access times across various process and temperature corners (Fig. 9(d)) at 0.4 V, thus ensuring proper Read operations against PVT variations. #### C. Write- Assist Techniques In a SRAM cell, the current strength of NMOS access transistors is usually designed to overwhelm the current strength of PMOS load transistors, thereby facilitating Write operation to pull down the '1' storage node of a cell. In subthreshold region, due to PVT variation, the current strength of NMOS access transistors could become weaker than that of PMOS load transistors, thus causing Write failure. Write-ability can be improved by four techniques: 1) collapsing cell V<sub>DD</sub> [18]–[20], [22], 2) raising cell VSS [16], [30], 3) boosting Write WL voltage [32], and 4) negative Write BL voltage [24], [34]. Collapsing cell V<sub>DD</sub> and raising cell VSS with column-based or row-based structure result in degradation of the stability of half-selected cells. Both boosting Write WL voltage and negative BL voltage strengthen the NMOS access transistors to improve Write-ability. However, boosting Write WL voltage can only be employed on a cell free of Write Half-Select disturb, since Write Half-Select disturb is aggravated by boosted Write WL voltage. Fig. 9. Simulated (a) On current (Ion) of N-Type FET, and (b) access times for process and temperature variation at 0.4 V $V_{\rm DD}$ , (c) Read '0' timing tracing scheme for PVT variation tolerant operation time, and (d) simulated traced operation times for process and temperature variation at 0.4 V $V_{\rm DD}$ . Fig. 10. Simulated Write SNM comparison at 0.4 V $\rm V_{DD}$ , SNFP, 125 $^{\circ}\rm C$ (subthreshold worst Write corner). While a cross-point cell prevents WHS disturb, it degrades the Write-ability due to writing through series pass-gates. WL boosting circuits in [32] boost both WL and Write WL (WWL) voltages to increase current of the series pass-gates for Write-ability enhancement. Another method to increase current of the series pass-gates is to employ Negative BL (NBL) voltage [24], [34]. Fig. 10 compares the Write SNM (WSNM) of cells without Write-assist circuit, with boosting both WL and WWL voltages by 70 mV, and with Negative BL voltage by -70 mV at SNFP 125°C (subthreshold worst Write corner) and 0.4 V. The cell without Write-assist circuit cannot complete Write operation since the curve of WSNM cannot be open. Both cells with boosting WL/WWL and with Negative BL voltage improve WSNM. Moreover, since Negative BL voltage increases both $V_{GS}$ and $V_{DS}$ , and reduces the threshold voltage (due to forward biased body-to-source voltage) of the series pass-gates, the WSNM of the cell with Negative BL voltage is larger than that of the cell with boosting WL/WWL. Hence, NBL offers better Write-ability enhancement than boosting both WL and WWL. Moreover, NBL scheme with single BL incurs smaller area penalty than dual-WL boosting scheme since only one boosting circuit is needed. Therefore, in this work, NBL scheme is employed to enhance the Write-ability of the 9T SRAM cell. # D. Negative BL Scheme For Write-ability enhancement, the proposed NBL scheme with BL condition detection is shown in Fig. 11(a). The NBL circuit comprises enable logic (X1), initial discharge transistor (Mnd), coupled PMOS capacitor (Mpc), coupled capacitance driver (X4-X5), BL condition detector (X2-X3), and global BL (GBL) pass-gate NMOS transistors (Mn1\_U and Mn1\_D). Each NBL circuit is shared by 16 Local BL (LBL) to reduce the area overhead. Moreover, the NBL scheme utilizes NMOS pass-gate transistors (Mn1\_U and Mn1\_D) to divide the Global BL (GBL) into up (\_U) and down (\_D) plane to reduce the load capacitance of the NBL circuit, thus decreasing the required capacitance of the coupled PMOS capacitor for an intended negative voltage. The operation of the NBL scheme is described as follows. In the initial condition, the 'CT' and 'CB' nodes of the coupled capacitor (Mpc) are set at $V_{\rm DD}$ and ground voltage, respectively. The voltages of GBLs and BLs are precharged to $V_{\rm DD}$ . During Write operation, either WEN\_U or WEN\_D signal is asserted to turn on Mn1\_U or Mn1\_D. At the same time, one of 16 NMOS pass-gate transistors of column multiplexer (Col. MUX) is also turned on to connect the selected BL to the selected GBL. Then Fig. 11. (a) Schematic and (b) waveform comparison of proposed single-ended negative BL scheme for Write-ability enhancement at 0.4 V SNFP, and 125°C. the selected GBL and BL are discharged through Mnd. The negative BL voltage coupling process is enabled only when the GBL voltage is below the trip voltage of NAND (X2), thus ensuring proper timing and efficiency of Negative BL for PVT variation tolerance. When the GBL voltage falls below the trip voltage of NAND (X2), Mnd is turned off first, thus floating node 'CB' at near ground voltage. Then node 'CT' is discharged to ground and couples node 'CB' to negative voltage. When the negative voltage is transferred to BL, the current driving capability of the series pass-gates in the selected cell overwhelms that of PMOS load transistor to discharge storage node 'Q' or 'QB' of the selected cell and the data-in value is written into the selected cell. The simulation waveforms (Fig. 11(b)) show that the NBL scheme is effective in writing a cell at 0.4 V, SNFP corner, and 125°C, which represents the worst-case for Write operation, whereas the Write operation without NBL circuit fails. Since the NBL action is initiated by the low-going GBL itself, it is tolerant to PVT variation. The NBL circuit design also ensures that all selected cells are written while all Write Half-Selected cells on active columns are stable across all corners at 0.4 V. ## IV. TEST CHIP IMPLEMENTATION AND SIMULATION RESULTS A 72 Kb SRAM macro is implemented in 65 nm Low-Leakage (LL) CMOS technology. The floorplan of the 72 Kb SRAM macro is shown in Fig. 12. The 72 Kb SRAM macro consists of 16 blocks with 72 columns $\times$ 64 rows per block. The I/O is 36 bits wide. Due to the disturb-free nature of the 9T cell, the test chip employs distance-4 bit-interleaving architecture, which can reduce soft error by 75% using Single-bit Error Correction (SEC) code compared with non-bit-interleaving architecture [38]. The decoders and other periphery circuits use static CMOS logic for robust subthreshold operation. The entire array functions at one V<sub>DD</sub>, so the 72 Kb SRAM macro can be integrated into a system more easily compared with SRAM macro using multiple supplies. As shown in Fig. 13(a), to facilitate testing, test-assist circuits are employed and synthesized with standard cell library operating at the default supply voltage (1.2 V), and signals are transferred into or out of the 72 Kb SRAM macro through level shifters. Because of large voltage difference between the default supply voltage (1.2 V) and intended V<sub>DD</sub> for the subthreshold SRAM macro, three intermediate voltages are applied in level-down and level-up shifters to ensure signal integrity. A dummy path is used to measure the delay of the level-down and level-up shifter. The measured delay time is then deducted from the measured access time. The schematic of the level-up shifter is shown in Fig. 13(b), where M5-M6 are applied to weaken the pull-up to facilitate pull-down of the high output voltage (at $V_{DDH}$ ) with the low input voltage (at $V_{DDL}$ ). Critical Read path of the 72 Kb SRAM macro is shown in Fig. 14. In initial condition, the signals of clock and WL enable are at '0'. First, the address signals are transferred into the SRAM macro through level-down shifter and decoded by X predecoder. Then, clock signal rises to generate the rising edge of WL enable signal and the pulse width of WL enable signal is determined by the AROTT circuit to track PVT variation. When WL enable signal is asserted, the address signals are latched and the decoded address signals are passed from X predecoder to X decoder. Then, X decoder generates global WL signal, and WL driver utilizes the global WL signal to pull WL and VVSS signals to V<sub>DD</sub> and ground, respectively. Then, all Read buffers of the cells in the active row are enabled. Based on the stored values of the cells in the active row, the Read buffers determine whether to discharge the corresponding BLs, which is precharged to V<sub>DD</sub> and floated. In the SRAM macro design, Read '0' operation forms the most critical timing. When BLs are discharged by Read buffers, GBLs, initially precharged to V<sub>DD</sub>, are also discharged through the selected "On" NMOS column multiplexer pass-gate transistors. The voltage of the GBLs are sensed by large-signal sense amplifiers (inverters), and bus drivers in DIDOs transfer the sensed values to DO buses. Finally, the data of DO buses are transferred out the SRAM macro by level-up shifters. The components of the simulated Read '0' access time at 0.3 V, 25°C, and TT corner are shown in Fig. 15. The most time-consuming component is the discharging of BL and GBL by cell (WL $\nearrow \rightarrow$ BL $\searrow$ and BL $\searrow \rightarrow$ GBL $\searrow$ ). It constitutes 41% of the total Read '0' access time due to weak discharging capability Fig. 12. Architecture of the 72 Kb SRAM macro. Fig. 13. (a) Signal transfer architecture, and (b) level-up shifter schematic. of the cell and large capacitance of BL and GBL. In this work, since the Read access time of the SRAM macro has met the design specifications, the timing performance is not enhanced further in order to minimize power dissipation. If higher timing performance is required, short BL combining with local sense amplifier and WL boosting technique can be employed at the expense of area and power. Fig. 16 shows the layout and die photo of the 72 Kb subthreshold SRAM test chip. The $4 \times 4 \text{ mm}^2$ chip contains four 72 Kb SRAM macros with macro size of $560 \,\mu\text{m} \times 400 \,\mu\text{m}$ . The area of the NBL Write-assist circuit is about 4% of the 72 Kb SRAM macro area. ### V. MEASUREMENT RESULTS 20 dies are measured, and error free full Read and Write functionality is achieved with $V_{\rm DD}$ down to 0.35 V ( $\sim 0.15~{ m V}$ lower than threshold voltage), as shown in Fig. 17(a). The measured Write failure rate from 20 72 Kb SRAM dies is shown in Fig. 17(b). The NBL circuit reduces the Write failure rate by 13.42 × at 0.3 V. Fig. 18 shows the measured maximum frequency and power dissipation at maximum frequency, which are averaged over 20 dies. At 0.35 V, the SRAM operates at 229 KHz and consumes 4.05 $\mu$ W. The $V_{\rm MIN}$ of the 9T SRAM is 350 mV lower than the conventional 6T SRAM [41]. As shown in Figs. 17 and 18, data is held down to 0.275 V where the leakage power is 2.29 $\mu$ W. At 0.275 V, fewer than 0.5% Read/Write errors are observed. Fig. 19 shows the measured energy per operation at the maximum frequency. The minimum energy per operation is 4.5 pJ at 0.5 V. As can be seen in Fig. 18, the 72 Kb SRAM macro has wide operation range from 1.2 V down to 0.35 V. For V<sub>DD</sub> around/above 1.0 V, the 72 Kb SRAM macro is capable of operating around 200 MHz. The 9T SRAM in 65 Fig. 14. Critical Read Path. Fig. 15. Components of Read '0' access time at 0.3 V, TT, and 25° C. Fig. 16. 72 Kb SRAM macro and testchip die photo. nm low-leakage technology achieves better power and energy saving for 0.5 V hearing aid system. The test chip features are summarized in Table II. Table III lists key features of several subthreshold SRAM designs for comparison. # VI. CONCLUSIONS A single-ended disturb-free 9T subthreshold SRAM cell with cross-point data-aware Write word-line structure has been demonstrated in this paper. The 9T cell eliminates Read disturb and Write Half-Select disturb for robust subthreshold operation. An adaptive Read operation timing tracing circuit Fig. 17. Measured (a) bit failure rate and (b) Write failure rate. and negative bit-line circuit are employed in the design for PVT variation-tolerant Read operation and Write-ability enhancement, respectively. A test chip with 72 Kb SRAM macros has been implemented in 65 nm low-leakage CMOS technology. The measured results demonstrate error free full functionality from 1.2 V down to 0.35 V ( $\sim 0.15$ V lower than the threshold voltage). In subthreshold region, the 72 Kb SRAM operates at 229 KHz with 4.05 $\mu$ W power consumption. Data is held down to 0.275 V with 2.29 $\mu$ W Standby power. The minimum energy per operation is 4.5 pJ at 0.5 V. | TABLE II | | | | | | | | |-------------|-------|------|------|-------|--|--|--| | FEATURES OF | THE 7 | 2 KB | TEST | CHIP. | | | | | Technology | 65nm 10-metal Low-Leakage CMOS | | | | | |----------------------|----------------------------------------------------------------------------------------|--|--|--|--| | Chip Size | 4x4mm <sup>2</sup> | | | | | | 72Kb SRAM MACRO AREA | 560x400um <sup>2</sup> | | | | | | V <sub>DD</sub> min | 0.35V (70% V <sub>TH</sub> ) (limited by Write operation)<br>0.275V for data retention | | | | | | Read Access Cycle | 229KHz @ 0.35V, 27°C | | | | | | Total Power | 4.05μW @ 0.35V, 27°C | | | | | | Leakage Power | 3.6μW @ 0.35V, 27°C | | | | | | (72Kb SRAM) | 2.29µW @ 0.275V, 27°C | | | | | #### TABLE III FEATURE LIST OF WORKS. | | [7] | [12] | [18] | [19] | This Work | | |------------------|------------|--------|-------------|-------------|------------|--| | Cell | Schmitt 9T | 8T | Modified 8T | Modified 8T | 9T | | | Process | 130um | 130nm | 65nm | 65nm | 65nm | | | Capacity | 4Kb | 64Kb | 256Kb | 64Kb | 72Kb | | | Bit-Interleaving | No | 8 | No | No | 4 | | | # Cells/BL | 256 | 512 | 256 | 64 | 64 | | | VCCmin | 160mV | 230mV | 350mV | 250mV | 350mV | | | Frequency | 620KHz | 100KHz | 25KHz | 20KHz | 229KHz | | | | @400mV | @230mV | @350mV | @250mV | @350mV | | | Power | 0.146uW | 4.3uW | 2.7uW | 0.4uW(LP1) | 4.05uW | | | Power | @400mV | @230mV | @350mV | @250mV | @350mV | | | Min. Energy / | N.A. | N.A. | N.A. | 11pJ/acc. | 4.5pJ/acc. | | | access | N.A. | IN.A. | N.A. | @400mV | @500mV | | Fig. 18. Measured (a) maximum frequency and (b) power dissipation at maximum frequency versus $V_{\rm DD}\,.$ #### REFERENCES [1] C. T. Chuang, S. Mukhopadhyay, J. J. Kim, K. Kim, and R. Rao, "High-performance SRAM in nanoscale CMOS: Design challenges and techniques," in *IEEE Int. Workshop on Memory Technology, Design and Testing*, Dec. 3–5, 2007, pp. 4–12. Fig. 19. Measured energy per operation at maximum frequency versus $V_{\rm DD}.$ The minimum energy per operation is 4.5 pJ at 0.5 V. - [2] T. Azam, B. Cheng, S. Roy, and D. R. S. Cumming, "Robust asymmetric 6T-SRAM cell for low-power operation in nano-CMOS technologies," *Electron. Lett.*, vol. 46, no. 4, pp. 273–274, Feb. 2010. - [3] B. Wang and J. B. Kuo, "A novel two-port 6T CMOS SRAM cell structure for low-voltage VLSI SRAM with single-bit-line simultaneous read-and-write access (SBLSRWA) capability," in *Proc. IEEE ISCAS*, 2000, vol. 5, pp. 733–736. - [4] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, "A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 113–121, Jan. 2006. - [5] R. E. Aly and M. A. Bayoumi, "Low-power cache design using 7T SRAM cell," *IEEE Trans. Circuits Syst. II: Expr. Briefs*, vol. 54, no. 4, pp. 318–322, Apr. 2007. - [6] S. Nalam and B. H. Calhoun, "5T SRAM with asymmetric sizing for improved read stability," *IEEE J. Solid-State Circuits*, vol. 46, no. 10, pp. 2431–2442, Oct. 2011. - [7] J. P. Kulkarni, K. Kim, and K. Roy, "A 160 mV robust Schmitt trigger based subthreshold SRAM," *IEEE J. Solid-State Circuits*, vol. 42, no. 10, pp. 2303–2313, Oct. 2007. - [8] J. P. Kulkarni and K. Roy, "Ultralow-voltage process-variation-tolerant Schmitt-trigger-based SRAM design," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 2, pp. 319–332, Feb. 2012. - [9] M. F. Chang, S. W. Chang, P. W. Chou, and W. C. Wu, "A 130 mV SRAM with expanded write and read margins for subthreshold applications," *IEEE J. Solid-State Circuits*, vol. 46, no. 2, pp. 520–529, Feb. 2011 - [10] L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini, and W. Haensch, "Stable SRAM cell design for the 32 nm node and beyond," in *Symp. VLSI Technology Dig.*, Jun. 14–16, 2005, pp. 128–129. - [11] R. Joshi, R. Houle, K. Batson, D. Rodko, P. Patel, W. Huott, R. Franch, Y. Chan, D. Plass, S. Wilson, and P. Wang, "6.6+ GHz low Vmin, read and half select disturb-free 1.2 Mb SRAM," in *IEEE Symp. VLSI Circuits Dig.*, Jun. 14–16, 2007, pp. 250–251. - [12] T. H. Kim, J. Liu, and C. H. Kim, "A voltage scalable 0.26 V, 64 kb 8T SRAM with Vmin lowering techniques and deep sleep mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1785–1795, Jun. 2009. - [13] L. Chang, R. K. Montoye, Y. Nakamura, K. A. Batson, R. J. Eickemeyer, R. H. Dennard, W. Haensch, and D. Jamsek, "An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 956–963, Apr. 2008. - [14] M. Qazi, K. Stawiasz, L. Chang, and A. P. Chandrakasan, "A 512 kb 8T SRAM macro operating down to 0.57 V with an AC-coupled sense amplifier and embedded data-retention-voltage sensor in 45 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 85–96, Jan. 2011. - [15] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An area-conscious low-voltage-oriented 8T-SRAM design under DVS environment," in *IEEE Symp. VLSI Circuits Dig.*, Jun. 14–16, 2007, pp. 256–257. - [16] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, "A stable 2-port SRAM cell design against simultaneously read/writedisturbed accesses," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2109–2119, Sep. 2008. - [17] H. Noguchi, S. Okumura, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi, and M. Yoshimoto, "Which is the best dual-port SRAM in 45-nm process technology? 8T, 10T single end, and 10T differential," in *IEEE Int. Conf. Integrated Circuit Design and Technology and Tutorial (ICICDT)*, Jun. 2–4, 2008, pp. 55–58. - [18] N. Verma and A. P. Chandrakasan, "A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008. - [19] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, "A reconfigurable 8T ultra-dynamic voltage scalable (U-DVS) SRAM in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3163–3173, Nov. 2009. - [20] B. H. Calhoun and A. P. Chandrakasan, "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007. - [21] T. H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008. - [22] S. A. Verkila, S. K. Bondada, and B. S. Amrutur, "A 100 MHz to 1 GHz, 0.35 V to 1.5 V supply 256 × 64 SRAM block using symmetrized 9T SRAM cell with controlled read," in *Int. Conf. VLSI Design (VLSID)*, Jan. 4–8, 2008, pp. 560–565. - [23] S. Lin, Y. B. Kim, and F. Lombardi, "A highly-stable nanometer memory for low-power design," in *IEEE Int. Workshop on Design* and Test of Nano Devices, Circuits and Systems, Sep. 29–30, 2008, pp. 17–20. - [24] N. Shibata, H. Kiya, S. Kurita, H. Okamoto, M. Tan'no, and T. Douseki, "A 0.5-V 25-MHz 1-mW 256-kb MTCMOS/SOI SRAM for solar-power-operated portable personal digital equipment sure write operation by using step-down negatively overdriven bitline scheme," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 728–742, Mar. 2006. - [25] Z. Liu and V. Kursun, "Characterization of a novel nine-transistor SRAM cell," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 4, pp. 488–492, Apr. 2008. - [26] A. Tajalli and Y. Leblebici, "Subthreshold SCL for ultra-low-power SRAM and low-activity-rate digital systems," in *Proc. ESSCIRC*, Sep. 14–18, 2009, pp. 164–167. - [27] J. J. Wu, Y. H. Chen, M. F. Chang, P. W. Chou, C. Y. Chen, H. J. Liao, M. B. Chen, Y. H. Chu, W. C. Wu, and H. Yamauchi, "A large $\sigma \rm V_{TH}/VDD$ tolerant zigzag 8T SRAM with area-efficient decoupled differential sensing and fast write-back scheme," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 815–827, Apr. 2011. - [28] T. Suzuki, S. Moriwaki, A. Kawasumi, S. Miyano, and H. Shinohara, "0.5-V, 150-MHz, bulk-CMOS SRAM with suspended bit-line read scheme," in *Proc. ESSCIRC*, Sep. 14–16, 2010, pp. 354–357. - [29] J. P. Kulkarni, A. Goel, P. Ndai, and K. Roy, "A read-disturb-free, differential sensing 1R/1W port, 8T bitcell array," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 9, pp. 1727–1730, Sep. 2011 - [30] M. H. Tu, J. Y. Lin, M. C. Tsai, S. J. Jou, and C. T. Chuang, "Single-ended subthreshold SRAM with asymmetrical write/read-assist," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 57, no. 12, pp. 3039–3047, Dec. 2010 - [31] S. A. Tawfik and V. Kursun, "Low power and robust 7T dual-Vt SRAM circuit," in *Proc. IEEE ISCAS*, May 18–21, 2008, pp. 1452–1455. - [32] I. J. Chang, J. J. Kim, S. P. Park, and K. Roy, "A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 650–658, Feb. 2009. - [33] M. H. Chang, Y. T. Chiu, S. L. Lai, and W. Hwang, "A 1 kb 9T subthreshold SRAM with bit-interleaving scheme in 65 nm CMOS," in *Proc. Int. Symp. Low Power Electronics and Design (ISLPED)*, Aug. 1–3, 2011, pp. 291–296. - [34] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, Y. Nakase, and H. Shinohara, "A 45 nm 0.6 V cross-point 8T SRAM with negative biased read/write assist," in *Symp. VLSI Circuits Dig.*, Jun. 16–18, 2009, pp. 158–159. - [35] R. V. Joshi, R. Kanj, and V. Ramadurai, "A novel column-decoupled 8T cell for low-power differential and domino-based SRAM design," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 5, pp. 869–882, May 2011. - [36] A. T. Do, J. Y. S. Low, J. Y. L. Low, Z. H. Kong, X. Tan, and K. S. Yeo, "An 8T differential SRAM with improved noise margin for bit-interleaving in 65 nm CMOS," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 58, no. 6, pp. 1252–1263, Jun. 2011. - [37] H. Fuketa, M. Hashimoto, Y. Mitsuyama, and T. Onoye, "Alpha-particle-induced soft errors and multiple cell upsets in 65-nm 10T subthreshold SRAM," in *Proc. IEEE Int. Reliability Physics Symp.*, May 2–6, 2010, pp. 213–217. - [38] S. Baeg, S. J. Wen, and R. Wong, "SRAM interleaving distance selection with a soft error failure model," *IEEE Trans. Nucl. Sci.*, vol. 56, no. 4, pp. 2111–2118, Aug. 2009. - [39] S. Mukhopadhyay, R. M. Rao, J. J. Kim, and C. T. Chuang, "SRAM write-ability improvement with transient negative bit-line voltage," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 1, pp. 24–32, Jan. 2011. - [40] K. Kushida, A. Suzuki, G. Fukano, A. Kawasumi, O. Hirabayashi, Y. Takeyama, T. Sasaki, A. Katayama, Y. Fujimura, and T. Yabe, "A 0.7 V single-supply SRAM with 0.495 μm² cell in 65 nm technology utilizing self-write-back sense amplifier and cascaded bit line scheme," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1192–1198, Feb. 2009. - [41] E. Seevinck, F. J. List, and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," *IEEE J. Solid-State Circuits*, vol. 22, no. 5, pp. 748–754, Oct. 1987. - [42] E. Grossar, M. Stucchi, K. Maex, and W. Dehaene, "Read stability and write-ability analysis of SRAM cells for nanometer technologies," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2577–2588, Nov. 2006. - [43] W. S. Lau, K. S. See, C. W. Eng, W. K. Awl, K. H. Jo, K. C. Tee, J. Y. M. Lee, E. K. B. Quek, H. S. Kim, S. T. H. Chan, and L. Chan, "Anomalous narrow width effect in NMOS and PMOS surface channel transistors using shallow trench isolation," in *Proc. IEEE Conf. Electron Devices and Solid-State Circuits*, Dec. 19–21, 2005, pp. 773–776. - [44] N. H. E. Weste and D. Harris, CMOS VLSI Design: A Circuit and Systems Perspective, 3rd ed. New York: Addison-Wesley, 2004, pp. 208–210. - [45] A. Kumar, H. Qin, P. Ishwar, J. Rabaey, and K. Ramchandran, "Fundamental data retention limits in SRAM standby experimental results," in *Proc. 9th Int. Symp. Quality Electronic Design*, Mar. 17–19, 2008, pp. 92–97. - [46] S. M. Jahinuzzaman, J. S. Shah, D. J. Rennie, and M. Sachdev, "Design and analysis of a 5.3-pJ 64-kb gated ground SRAM with multiword ECC," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, pp. 2543–2553, Sep. 2009. Ming-Hsien Tu received the B. S. and M. S. degrees in electrical engineering from National Central University, Taiwan, in 2004 and 2006, respectively. He received the Ph.D. degree in electronics from National Chiao Tung University, Taiwan, in 2011. His research interests include noise suppression design technologies, embedded measurement circuit design, and ultra-low-power SRAM design. **Huan-Shun Huang** received the B.S. degree in information engineering from I-Shou University, Taiwan, in 1997. His research interests include nonvolatile memories and SRAM testing. Jihi-Yu Lin was born in Taichung, Taiwan. He received the B.S. degree in electrical engineering from National Central University, Chung-Li, Taiwan, in 2007 and the M.S. degree in electronics from National Chiao Tung University, Hsinchu, Taiwan, in 2009. His research interests are in the areas of low-power and low-voltage embedded memory circuit design. **Kuen-Di Lee** received the M.S. degree in electronics engineering from National Chiao Tung University, Taiwan, in 1999. Since 1999, he has been working on advanced memory compiler circuit design and flow development in Faraday Technology Corporation, Taiwan. Ming-Chien Tsai received the B.S. degree in electrical engineering from National Central University, Taoyuan, Taiwan, in 2008, and the M.S. degree in electronics from National Chiao Tung University, Hsinchu, Taiwan, in 2010. He is currently working at TSMC for advanced memory development. His research interests include low-power digital circuit design, SRAM design and monitoring structure for NBTI/PBTI degradation of nanoscale CMOS SRAM. **Wei-Chiang (Willis) Shih** (M'00) received the B.S. degree and the M.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1994 and 1996, respectively. From 1996 to 1999, he was with Faraday Technology, Hsinchu, Taiwan, working on the datapath, SRAM and ROM compilers. From 1999 to 2000, he was with Chingis Technology, Hsinchu, Taiwan, working on the eeprom and flash designs. From 2000 to 2002, he was with Virtual Silicon Technology, Sunnyvale, CA, working on the PLL, SRAM, ROM and nonvolatile memory compilers. From 2002 to 2011, he was with Faraday Technology, Sunnyvale, CA, working on the memory compiler architecture and design flows. In July 2011, he joined the memory group at M31 Technology, Zhubei, Taiwan. He is currently working on ultra-high-speed memory, low-power memory and next-generation memory compilers. Chien-Yu Lu received the B. S. degree in electrical engineering of National Chung Cheng University, and then started the M. S. and Ph.D. degrees in the Electronics Department of National Chiao Tung University, Taiwan, from 2008 to present as a Ph.D. student. His research interests focus on digital memory system design and low-power VLSI circuit design with particular emphasis on sub-threshold technology application. **Yuh-Jiun Lin** received the B. S. and M. S. degrees in electrophysics engineering from National Chiao Tung University, Taiwan, in 1997 and 1999, respectively. From 2001 to 2003, he was working on the logic process integration in the Research Department with UMC, Taiwan. From 2003 to 2005, he was working on the SRAM and ROM design in EES. From 2005 to 2011, he was working on the memory compiler architecture and design flow at Faraday Technology Corporation, Taiwan. Currently he is working on ultra- high-speed, low-power and next-generation memory compilers in M31 Technology, Taiwan. **Shyh-Jye Jou** received the B. S. degree in electrical engineering from National Chen Kung University in 1982, and the M.S. and Ph.D. degrees in electronics from National Chiao Tung University in 1984 and 1988, respectively. He joined Electrical Engineering Department of National Central University, Chung-Li, Taiwan, from 1990 to 2004 and became a Professor in 1997. Since 2004, he has been Professor of Electronics Engineering Dept. of National Chiao Tung University and became the Chairman from 2006 to 2009. From August 2011 he becomes the Dean of Office of International Affair, National Chiao Tung University. He was a visiting research Professor in the Coordinated Science Laboratory at University of Illinois, Urbana-Champaign during 1993–1994 and 2010 academic years. In the summer of 2001, he was a visiting research consultant in the Communication Circuits and Systems Research Laboratory of Agere Systems, USA. His research interests include design and analysis of high speed, low power mixed-signal integrated circuits, communication and Bio-Electronics integrated circuits and systems. Dr. Jou was the Guest Editor, IEEE JOURNAL OF SOLID STATE CIRCUITS, Nov. 2008. He served as the Conference Chair of IEEE International Symp. on VLSI Design, Automation and Test (VLSI-DAT) and International Workshop on Memory Technology, Design, and Testing. He also served as Technical Program Chair or Co-Chair in IEEE VLSI-DAT, International IEEE Asian Solid-State Circuit Conference, IEEE Biomedical Circuits and Systems, and other international conferences. He received Outstanding Engineering Professor Award, Chinese Institute of Engineers at 2011. He has published more than 100 IEEE journal and conference papers. **Meng-Hsueh Wang** received the B. S. and M. S. degrees in electrical engineering from National Chung Cheng University, Taiwan, in 2001 and 2003, respectively. His research interests include power optimization algorithmic platform by using multiple supply voltages, cell based digital IP design and 32-bit microprocessor design. Ching-Te Chuang (S'78–M'82–SM'91–F'94) received the B.S.E.E. degree from National Taiwan University, Taipei, Taiwan, in 1975 and the Ph.D. degree in electrical engineering from University of California, Berkeley, in 1982. From 1977 to 1982, he was a research assistant in the Electronics Research Laboratory, University of California, Berkeley, working on bulk and surface acoustic wave devices. He joined the IBM T. J. Watson Research Center, Yorktown Heights, NY in 1982. From 1982 to 1986, he worked on scaled bipolar devices, technology, and circuits. He studied the scaling properties of epitaxial Schottky barrier diodes, did pioneering works on the perimeter effects of advanced double-poly self-aligned bipolar transistors, and designed the first sub-nanosecond 5-Kb bipolar ECL SRAM. From 1986 to 1988, he was Manager of the Bipolar VLSI Design Group, working on low-power bipolar circuits, high-speed high-density bipolar SRAMs, multi-Gb/s fiber-optic data-link circuits, and scaling issues for bipolar/BiCMOS devices and circuits. Since 1988, he has managed the High Performance Circuit Group, investigating high-performance logic and memory circuits. Since 1993, his group has been primarily responsible for the circuit design of IBM's high-performance CMOS microprocessors for enterprise servers, PowerPC workstations, and game/media processors. Since 1996, he has been leading the efforts in evaluating and exploring scaled/emerging technologies, such as PD/SOI, UTB/SOI, strained-Si devices, hybrid orientation technology, and multi-gate/FinFET devices, for high-performance logic and SRAM applications. Since 1998, he has been responsible for the Research VLSI Technology Circuit Co-design strategy and execution. His group has also been very active and visible in leakage/variation/degradation tolerant circuit and SRAM design techniques. Dr. Chuang has received one Outstanding Technical Achievement Award, one Research Division Outstanding Contribution Award, five Research Division Awards, and 12 Invention Achievement Awards from IBM. He took early retirement from IBM to join National Chiao-Tung University, Hsinchu, Taiwan, as a Chair Professor in the Department of Electronics Engineering in February 2008. He is the founding Director of ASE/NCTU 3D IC Joint Research Center at National Chiao-Tung University. He has received the Outstanding Scholar Award from Taiwan's Foundation for the Advancement of Outstanding Scholarship for 2008 to 2013. Dr. Chuang served on the Device Technology Program Committee for IEDM in 1986 and 1987, and the Program Committee for Symposium on VLSI Circuits from 1992 to 2006. He was the Publication/Publicity Chairman for Symposium on VLSI Technology and Symposium on VLSI Circuits in 1993 and 1994, and the Best Student Paper Award Sub-Committee Chairman for Symposium on VLSI Circuits from 2004 to 2006. He was elected an IEEE Fellow in 1994 "for contributions to high-performance bipolar devices, circuits, and technology". He has authored many invited papers in international journals such as International Journal of High Speed Electronics, Proceedings of IEEE, IEEE Circuits and Devices Magazine, and Microelectronics Journal. He has presented numerous plenary, invited or tutorial papers/talks at international conferences such as International SOI Conf., DAC, VLSI-TSA, ISSCC Microprocessor Design Workshop, VLSI Circuit Symposium Short Course, ISQED, ICCAD, APMC, VLSI-DAT, ISCAS, MTDT, WSEAS, VLSI Design/CAD Symposium, and International Variability Characterization Workshop. He was the co-recipient of the Best Paper Award at the 2000 IEEE International SOI Conference. He holds 39 U.S. patents with another 15 pending. He has authored or coauthored over 310 papers.