# Separate Clock Network Voltage for Correcting Random Errors in ULV Clocked Storage Cells

Shien-Chun Luo, Kuo-Chiang Chang, Ming-Pin Chen, Ching-Ji Huang, Yi-Fang Chiu, Po-Hsun Chen, Liang-Chia Cheng, Chih-Wei Liu, and Yuan-Hua Chu

Abstract—This brief presents an implementation of ultralowpower microcontrollers that use a separate clock network voltage (SCNV) to correct unexpected errors produced by on-chip variations (OCVs). Separating the clock network voltage requires amendments in the standard cell library and physical designs. Here, the experiments used a 65-nm technology that exhibited considerable OCVs, which caused write and retention errors in clocked storage cells and limited the voltage scaling of microcontrollers. Using the SCNV provides an extraordinary operability to correct errors in the low-voltage clocked storage cells. In addition, the area overhead of the proposed implementation is negligible. Applying the SCNV, the measurement results indicate that the microcontrollers can be operated below 0.3 V, over 0.15-V extension in voltage scaling, and achieve the optimal energy consumption at 0.34 V. Separating the clock network voltage has tradeoff issues in system timing and energy consumption based on the measurement results, and this brief discusses proper applications.

*Index Terms*—Digital clocking, dynamic voltage scaling (DVS), flip-flop, process variation, subthreshold circuit.

#### I. Introduction

LTRALOW-POWER and ultralow-voltage (ULV) circuits and systems enable miniature devices to embed multifarious applications and accelerate the development of the Internet of Things [1]. Although the advantages of ULV integrated circuits (ICs) have been widely promoted, designing ULV ICs is still challenging in the presence of technology variability [2], [3]. Compensating for on-chip variations (OCVs) is a tough task because OCVs are random and hard to be controlled using conventional corner-based methods. A commercial 65-nm technologies are mature and might be cost-effective for future low-cost and distributed ICs. However, the OCVs of the 65-nm technology severely restricted the energy scalability of prior ULV ICs.

The aforementioned OCVs were observed in previous ULV SRAMs and microcontrollers fabricated using the 65-nm technology. After a series of analyses, the writable voltage of SRAMs was reported to be a critical cause that clamped the dynamic voltage scaling (DVS) range. Fig. 1 displays the SRAM

Manuscript received July 1, 2014; revised August 25, 2014; accepted September 3, 2014. Date of publication September 10, 2014; date of current version December 1, 2014. This brief was recommended by Associate Editor J. P. de Gyvez.

S.-C. Luo, M.-P. Chen, C.-J. Huang, Y.-F. Chiu, P.-H. Chen, L.-C. Cheng, and Y.-H. Chu are with Industrial Technology Research Institute, Hsinchu 31040, Taiwan (e-mail: scluo@itri.org.tw).

K.-C. Chang and C.-W. Liu are with the Department of Electronic Engineering, National Chiao Tung University, Hsinchu 300, Taiwan.

Color versions of one or more of the figures in this brief are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSII.2014.2356913



Fig. 1. Distribution of the minimal writable voltage of the SRAM cells fabricated by the 65-nm technology employed in this work.

layout and one of the measurement results, where the minimal writable voltage of every bit was scanned. A small number of bit cells in the SRAM required a high write voltage, where these bit cells were randomly distributed. Although most of the SRAM cells could be written at less than 0.4 V, the stable write voltage was up to 0.55 V, which substantially restricted the energy scalability.

ULV clocked storage elements (CSEs) such as data flipflops (DFFs) and latches are vulnerable circuits when OCVs are severe [4]. The intrinsic setup and hold stabilities of ULV CSEs may refer to the write and retention strength of ULV SRAM cells, respectively. The intrinsic setup stability indicates the writing robustness of the CSEs, whereas the intrinsic hold stability indicates the retention strength of the CSEs. Although CSEs can be upsized to mitigate the effect of OCVs, the DVS limit still appeared at a voltage approximately 0.1 V lower than the minimal operating voltage of the aforementioned SRAM. Therefore, the shmoo plots of the previous microcontrollers were sharply cut off at approximately 0.45 V, although a medium frequency was available above this voltage. To compensate for random OCVs, the methods used in ULV SRAMs, such as altering the write and retention voltages, might be references for designing ULV CSEs [5]. However, CSEs are distributed in digital circuits, and the area overhead must be considered when implementing multiple supply voltages.

This work adopted a separate clock network voltage (SCNV) method, which was known for reducing the power consumption of high-speed processors by lowering the voltage of clock networks [6]. Alternatively, the clock voltage can be higher than the domain voltage to improve the robustness of ULV CSEs. A high-voltage clock distribution was recently presented for correcting the hold time violations, and the chip operating voltage can be extended by 10–20 mV [7]. Although the SCNV idea is quite similar, this brief presents different measurement results



Fig. 2. (a) Schematic of an SCNV latch. (b) Layouts of the conventional and SCNV clock buffers and CSEs. The VCLK pin used the second-layer metal.

from those in [7], where the operating voltage of the SCNV microcontroller can be reduced by more than 150 mV. Statistical simulations verified the SCNV method effectively correct intrinsic DFF errors due to OCVs. Additionally, performance tradeoffs were identified from the measurement results, and proper applications of the SCNV method need to be clarified.

#### II. IMPLEMENTATION OF THE SCNV

## A. SCNV Standard Cell Library

The impact of OCVs is explained here using a basic latch. The drive strength and loop gain of a ULV latch provide a tradeoff between intrinsic setup and hold stabilities, as shown in Fig. 2(a). ULV transistors exhibit low on–off current ratios, indicating that ULV transistors are leaky switches. Therefore, high drive strength indicates a high leakage that tends to interfere with the write and retention functions. Although the loop gain and writability can be simulated and verified at various process, voltage, and temperature (PVT) corners, OCVs determine the measurement outcomes, where errors may occur at qualified PVT corners.

The SCNV method is thus adopted to improve the robustness of latches and DFFs. An area-efficient SCNV implementation is proposed here. The logic gates that construct the clock networks, such as the clock inverters, buffers, and internal clock inverters of latches and DFFs, are adapted to have a separate source voltage, as shown in Fig. 2(a). The pMOS source voltage (VCLK) controls the gate nodes of the clocked transistors. A high VCLK provides a high on–off current ratio and improves the switch characteristic of the clocked transistors. The nonclocked transistors in the SCNV CSEs still use the original supply voltage. The pMOS transistors are slightly forward body-biased when a high VCLK is applied.

This SCNV implementation exhibits a low area overhead because the body is shared and the VCLK is connected by power pins. Consider a conventional layout style of dual-VDD cells, which is equipped with a VDD and a VCLK rail, the cell height would require three extra routing tracks (at least two for linewidth and one for spacing) for attaching the second VDD line. Cell height is a global parameter that affects all standard



Fig. 3. Simulation setup for DFFs in the presence of OCVs.

TABLE I
INTRINSIC FAILURE RATES OF DFFS(VDD = VCLK)

| Supply Voltage   | 0.35 V | 0.4 V | 0.45 V | 0.5 V |
|------------------|--------|-------|--------|-------|
| Setup Error Rate | 29.2%  | 12.0% | 5.7%   | 2.5%  |
| Hold Error Rate  | 7.1%   | 6.0%  | 4.2%   | 3.7%  |



Fig. 4. Error rates of SCNV DFFs.

cells. An additional power line thus brings a 25% increase in core area regarding that the applied 65-nm standard cells use 12 routing tracks.

The proposed implementation uses power pins instead of rails. The cell height remains 12 tracks, and the increase in cell width is less than three tracks. The area of nonclocked cells is unaffected, and the clocked cells have a small area overhead. Fig. 2(b) exhibits examples of SCNV clocked cells. Compared with the conventional cells, all SCNV clock buffers and inverters occupy the same area, and the latches and DFFs have a maximal area increase of less than  $1.06 \times$ .

## B. Simulation Results of Conventional and SCNV DFFs

The characteristics of conventional and SCNV DFFs were simulated and compared. The DFFs adopted in this work used the conventional master-slave latches; both latches were the same, as shown in Fig. 2(a), except for the opposite clock phases. Fig. 3 shows the simulation setup. The DFF under test was at the middle of the test structure, and the input and output circuits comprised two identical DFFs without variations. Buffers were inserted to prevent hold time violation, and a slow frequency was adopted to prevent setup time violation. Therefore, the intrinsic setup/hold failures dominated the error rate. This test structure used  $3\sigma$  random OCVs and 1000 Monte Carlo simulation runs. Table I presents that the error rates of the DFF were high (e.g., totally 6.2% at 0.5 V). However, these error rates should have been overestimated because the previous measurements showed a high yield in DFFs at 0.5 V. Although being pessimistic, these overestimated data can show the trend of the SCNV effect without numerous simulation runs.

Under the same simulation setup, Fig. 4 displays the errorrate curves of the SCNV DFF with respect to the increase in the VCLK. A 0.1-V VCLK boost reduced the error rate by approximately half, whereas the error rate converged when the padded VCLK voltage exceeded 0.1 V. Fig. 5 displays the error-rate curves at various temperatures. Although OCVs increase the



Fig. 5. Error rates of SCNV DFFs at various temperatures.



Fig. 6. FOM plots of SCNV DFFs.

TABLE II SCNV DFF CHARACTERISTICS AT 0.45 V

| VCLK boost       | 0 V  | 0.05 V | 0.1 V | 0.15 V | 0.2 V |
|------------------|------|--------|-------|--------|-------|
| Error Rate Ratio | 1.00 | 0.55   | 0.42  | 0.37   | 0.36  |
| Delay Ratio      | 1.00 | 0.69   | 0.53  | 0.46   | 0.41  |
| Power Ratio      | 1.00 | 1.10   | 1.20  | 1.35   | 1.47  |
| FOM              | 1.00 | 2.39   | 3.68  | 4.42   | 4.69  |

error rate at a low temperature, the error-reduction ability of the SCNV method is similar among temperatures. Increasing the VCLK voltage reduced the DFF delay but increased the power consumption. Obtaining a complete figure of merit (FOM) of the SCNV method is complicated considering various topologies of clock networks; however, a simplified FOM with respect to the SCNV DFF is written as the following equation:

$$FOM = \left(\frac{ER_{conv}}{ER_{SCNV}}\right) \times \left(\frac{D_{conv}}{D_{SCNV}}\right) \times \left(\frac{PW_{conv}}{PW_{SCNV}}\right) \quad (1)$$

where ER is the error rate,  ${\cal D}$  is the delay, and PW is the power consumption.

The delays of (1) are the DFF CLK-Q delays, and the power consumption is limited to the DFFs. Fig. 6 shows the FOM curves of the SCNV DFF. Increasing the clock voltage exerted a positive effect on the FOM, although the advantage saturated after a 0.1-V VCLK boost. The FOM curves exhibited a nonlinear relation with the logic VDD, and the highest FOM appeared at 0.45 V. Table II presents the corresponding delay, power, and error-rate ratios at 0.45 V. The power consumption increased with respect to the VCLK boost, and the error and delay reduction rates saturated at a high VCLK. The threshold of the VCLK boost (0.1 V) may explain the implementation presented in the previous work [7], and here, the tradeoffs of the SCNV method are clarified. However, setting a constant VCLK boost is not the conclusion to apply the SCNV method. Limits were observed in the measurements when various operating voltages were given.



Fig. 7. Die photo of the test chip and the layout of the SCNV RISC core.

# C. Test Chip Design

Test chips were fabricated to verify the effect of using SCNV CSEs. Fig. 7 shows the die photo of the test chip, which comprised 16-bit reduced instruction set computing (RISC) microcontrollers, including a conventional and an SCNV core. The system also comprised a ROM (test instruction memory), an SRAM (data memory), and a register file (instruction memory). The ROM stored build-in self-test (BIST) codes, in which the test programs of microcontrollers and SRAM were included. A system controller unit (SCU) is equipped with several multiplexing functions, allowing flexible tests among functional blocks and including fail-proof testing modes. The SCU and system buses used the standard supply voltage (1 V) of this technology. Fig. 7 also shows the layout of the SCNV core, where the highlighted wires are the branches of the VCLK power lines that connect to the power mesh. The test chip was assigned an independent voltage domain for each memory and RISC core. Four identical RISC cores were synthesized using different CSEs: the first core used conventional CSEs, and the third core used SCNV CSEs. The proposed SCNV implementation had little area overhead, and the silicon area of the first and third cores was actually the same.

To ensure that specific RISC cores use correct clocked cells, each RISC core was synthesized by excluding the clocked cells of the other families. After respective synthesis, these gatelevel RISC cores were locked in the synthesis of the entire test chip. During placement and route, the clock root of the SCNV domain was set as an individual clock source to synthesize the SCNV clock tree, and the power pins of the SCNV clocked cells were connected to the VCLK power mesh that had been planned.

In this brief, the SCNV was applied as a postsilicon compensation method because the VCLK boost was uncertain in the design stage. The timing library of the SCNV CSEs was characterized using a single supply voltage, and the SCNV effect was invisible in the static timing analysis. The clock voltage was not separated until functional errors appeared in the measurements. Therefore, the conventional timing analysis and sign-off were unaffected. The SCNV function may be masked when the chip under test meets the specifications.



Fig. 8. (a) Cumulative shmoo plot of the conventional clocking operation. (b)–(d) Differential shmoo plots, displaying the additional pass numbers after applying the SCNV method. Operating conditions: (b) VCLK = VDD + 50 mV, (c) VCLK = VDD + 100 mV, and (d) VCLK = VDD + 150 mV.

## III. MEASUREMENT RESULTS

A stand-alone test mode was designed to execute the BIST program for validating the main functions of the system, including the RISC core, memory devices, and SCU. This BIST program loaded the instructions from the ROM, executed arithmetic operations, wrote the SRAM, read the SRAM, and compared the read data. This test program covered all CSEs (405 DFFs in a RISC core). The following measurement data are based on these BIST results.

Fig. 8(a) shows the cumulative shmoo plot of the SCNV RISC cores operating under the conventional clocking condition (VCLK = VDD). Here, the other DVS domains were assigned a fixed 0.7 V to avoid errors due to OCVs, and the SCU and buses were operated at 1 V. The color of the bricks represents the total pass number of 12 chips. The darkest brick indicates 12 passes, and a white brick indicates 12 fails. Under the conventional clocking condition, the RISC core achieved 17 MHz at 0.5 V, and the minimal supply voltage was near 0.45 V, where the shmoo plot was sharply cut off. For reference, the threshold voltage of this technology approximates to 0.4 V; the corner-based simulations passed the test at 0.4 V. These results reproduced those that had been observed in the past experiments, in which OCVs clamped the range of voltage scaling. The conventional and SCNV RISC cores exhibited similar measurement results under the conventional clocking condition. For simplicity, only the SCNV RISC before and after activating the SCNV is presented here.

Fig. 8(b)–(d) displays the differential shmoo plots of the SCNV operations using 50-, 100-, and 150-mV VCLK boosts,



Fig. 9. Test environment and measured results at the minimal voltage.

respectively. The differential shmoo plots present the additional pass number of using the SCNV method. A positive number indicates an improvement, and a negative number indicates a regression. The SCNV method effectively enhanced the function yield at below 0.45 V and improved the operating speed at 0.45–0.5 V, whereas it caused speed regressions at 0.6–0.7 V.

Fig. 8 shows that the minimal VDD of using the conventional clocking condition was 0.44–0.45 V, which was extended to less than 0.3 V under a 100-mV VCLK boost. This yield improvement should have a high correlation to the correction of intrinsic CSE errors based on the simulation results in Section II-B. The speed improvement and regression (> 0.45 V region) were related to the setup time violation. Although a high VCLK slightly reduced the setup time of DFFs, setup time violations occurred at high voltages because of the change in clock skew. A higher VCLK resulted in greater regression. Therefore, the SCNV method in this technology should be limited to near-and subthreshold operations to ensure the efficacy and avoid the drawback.

The aforementioned statistical measurement data were collected from an IC tester, which was suitable for drawing shmoo plots but experienced resolution limits in the delay and current measurement. Therefore, several test chips were mounted on printed circuit boards for capturing accurate performance, and the test chip of the minimal operating voltage was selected to present. Fig. 9 displays the chip-on-board (COB) test environment and the operating conditions at the minimal VDD, which were as follows: the VDD was 0.22 V, the VCLK was 0.29 V, the maximal frequency was 200 kHz, and the total power consumption was 425 nW.

Performance, power, energy, and voltage scaling of the COB measurement results are compared using Fig. 10. The label *DVFS* (dynamic voltage and frequency scaling) in Fig. 10 indicates that the power/energy consumption was obtained at the maximal available frequency of the corresponding voltage; the label *DFS* indicates that the power/energy consumption was measured using the frequency scaling at 0.4 V because the conventional DVFS curve of this selected chip stopped at 0.4 V and 7 MHz. A constant-VCLK SCNV scenario was



Fig. 10. Measurement results and comparisons. (a) Power consumption versus operating frequency. (b) Power consumption percentage of CLK network versus operating frequency. (c) Core voltage versus operating frequency. (d) Core voltage versus energy consumption. (a), (b), and (c) use the same horizontal axis for cross reference; (c) and (d) use the same vertical axis for cross reference.

adopted in this experiment. Therefore, the SCNV DVFS curves of constant 0.4- and 0.35-V VCLKs were plotted, and the minimal operating point was marked.

Fig. 10(a) shows the measured power with respect to the operating frequency, where the SCNV DVFS reduced more power consumption than the conventional DFS. Fig. 10(b) plots the power percentage of the clock network. The power percentage of the clock network increased from 45% to 80% as the SCNV was applied. Fig. 10(c) displays the relationship between the frequency and RISC VDD to offer a reference to Fig. 10(a) and (b). Fig. 10(d) exhibits the corresponding energy consumption, where applying the SCNV DVFS continued the energy scaling to reach the minimal energy consumption of the RISC core (1.3 pJ, 0.34-V VDD, and 0.4-V VCLK).

The experiments developed for plotting Figs. 8 and 10 employed different SCNV strategies. A constant VCLK – VDD difference (see Fig. 8) resulted in a performance regression at a high VDD, and thus, an upper bound should be set for activating the SCNV method. The constant-VCLK scenario (see Fig. 10) suggested activating the SCNV method at the minimal VDD of the conventional DVFS operation. Although the minimal VDD may vary from die to die, a voltage based on statistical results, 0.45 V for example, can be set as the activating voltage of the SCNV method. A power management policy may refer to alternative SCNV implementations to fit the design purpose.

## IV. CONCLUSION

This brief introduced an SCNV method to compensate for OCVs. The SCNV method was implemented by separating the source and body voltages of the pMOS transistors of clock

inverters and buffers. The SCNV CSEs exhibits a negligible area overhead compared with the conventional implementation. Additional efforts were required in cell layout, logic synthesis, and power routing. Nevertheless, the static timing analysis remained unchanged when using the SCNV as a postsilicon compensating method. The experiments used 65-nm microcontrollers, of which the voltage scalability was severely affected by OCVs. The experiment results revealed a considerable enhancement of the functional yield in the near- and subthreshold regions, enabling the energy optimization that was restricted by OCVs.

#### REFERENCES

- [1] L. Atzori, A. Iera, and G. Morabito, "The Internet of Things: A survey," *Comput. Netw.*, vol. 54, no. 15, pp. 2787–2805, Oct. 2010.
- [2] B. Nikolic et al., "Technology variability from a design perspective," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 9, pp. 1996–2009, Sep. 2011.
- [3] M. Alioto, "Ultra-low power VLSI circuit design demystified and explained: A tutorial," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 1, pp. 3–29, Jan. 2012.
- [4] P. Meinerzhagen, S. M. Y. Sherazi, A. Burg, and J. N. Rodrigues, "Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 1, no. 2, pp. 173–182, Jun. 2011.
- [5] B. Zimmer et al., "SRAM assist techniques for operation in a wide voltage range in 28-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 12, pp. 853–857, Dec. 2012.
- [6] J. Pangjun and S. S. Sapatnekar, "Low-power clock distribution using multiple voltages and reduced swings," *IEEE Trans. VLSI Syst.*, vol. 10, no. 3, pp. 309–318, Jun. 2002.
- [7] M. Nomura et al., "0.5 V image processor with 563 GOPS/W SIMD and 32 bit CPU using high voltage clock distribution (HVCD) and adaptive frequency scaling (AFS) with 40 nm CMOS," in *Proc. Symp. VLSIC*, Jun. 2013, pp. C36–C37.