# **A Static-Switching Pulse Domino Technique for Statistical Power Reduction of Wide Fan-in Dynamic Gates**

Rahul Singh<sup>1</sup>, Jae-Cheol Son<sup>2</sup>, Ukrae Cho<sup>2</sup>, Gunok Jung<sup>2</sup>, Min-Su Kim<sup>2</sup>, Hyoungwook Lee<sup>2</sup>, and Suhwan Kim<sup>1</sup>

> <sup>1</sup> Electrical Engineering, Seoul National University, Seoul 151-744, Korea <sup>2</sup>SOC Platform Team, Samsung Electronics, Gyunggi-Do, Korea {rahul,suhwan@snu.ac.kr}

# **ABSTRACT**

In wide fan-in dynamic domino gates, the two phase evaluateprecharge operation leads to high switching activity at the dynamic and the output nodes which introduces a significant power penalty. In this paper, we propose a pulse domino technique to reduce the overall power consumption of a wide fanin dynamic gate by having static-like switching behavior at the dynamic node, the gate input and the output terminals. Dynamic multiplexers designed and simulated in 90-nm CMOS are used to demonstrate the energy effectiveness of the proposed design style.

## **Categories and Subject Descriptors**

B.6.1 [**Logic Design**]: Design Styles – *Combinational logic, Logic arrays.* 

#### **General Terms**

Design, Performance, Reliability

#### **Keywords**

Domino logic, dynamic circuits, low-power design, switching activity, high speed, noise immunity.

#### **1. INTRODUCTION**

Domino circuits employ a dual-phase dynamic logic style with each clock cycle divided into a precharge and an evaluate phase. This mechanism permits high-speed operation and enables the implementation of complex functions with a single NMOS evaluation network. As an example, a simple implementation of a dynamic multiplexer employed in the read port of a register file is shown in Fig. 1. On the other hand, a static CMOS implementation of the same multiplexer would require a long stack of PMOS transistors or a multi-level structure both of which are performance-limiting. Hence, high-performance compact dynamic domino circuits are frequently employed to implement wide-OR gates like decoders and comparators in high performance microprocessors, digital signal processors, and other VLSI circuits.

Copyright 2011 ACM 978-1-4503-0667-6/11/05…\$10.00.



**Figure 1: Local bit-line (LBL) organization of the read port of a register file (RF) using a conventional n-bit footless dynamic multiplexer with its input and output switching**  waveforms. RS<sub>i</sub> and D<sub>i</sub> are respectively the row-select and **data inputs.** 

Although fast and compact, wide fan-in dynamic circuits suffer from several limitations. Cumulative leakage from the parallel evaluation paths renders the gate susceptible to several chargeloss mechanisms severely compromising the gate's tolerance to input noise [1]. Clock signal distribution incurs sequencing and routing overheads and adds to the power dissipation due to the unity switching factor of the clock network [2]. Power consumption also increases due to the high switching rate of the dynamic and the output node. In Fig. 1, we see that the output node is reset during every precharge phase even when the inputs and the logical output value across two consecutive cycles are unchanged. In addition, the inputs of a footless domino logic style (clocked footer transistor is absent) are driven by dynamic buffers which pull down all inputs to logical zero at the start of precharge in every clock cycle. Assuming similar loading, these dynamic buffers are therefore more energy expensive than static buffers. Thus, while the static gates consume power only when a toggling event occurs at the output node (output switching-dependent), the dynamic gates are output state-dependent consuming power in every clock cycle where the output is logically high during the evaluation phase [3]. This power penalty due to redundant switching becomes especially significant for wide fan-in gates where the dynamic node sees a high capacitance due to the large parasitic contribution from the evaluation network. Charging and discharging this node and the output node every clock cycle significantly increases the power overhead.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

*GLSVLSI'11*, May 2-4, 2011, Lausanne, Switzerland.



**Figure 2: Wide domino gate using (a) Limited Switch Dynamic Logic (LSDL) [5] (b) Single-Phase SP-Domino [3].** 

To reduce the switching power dissipation while still maintaining high performance, we propose a wide fan-in domino design technique that minimizes redundant switching at the internal dynamic and output nodes thereby achieving static-like behavior. In Section II, we review previous switching-aware design techniques and discuss their limitations. In Section III, we introduce and analyze the proposed static-switching pulse domino (SSPD). In Section IV, we provide the simulation results and conclude the paper in Section V.

#### **2. PREVIOUS WORKS**

In [4], a new class of logic family called the limited switch dynamic logic (LSDL) was proposed to exploit the performance and area savings of dynamic circuits while avoiding the excessive dynamic power penalty. A basic wide fan-in implementation of an LSDL gate  $[5]$  is shown in Figure 2(a). The gate has a pull-down network similar to a footed conventional domino gate but the output inverter and the keeper transistor of domino logic, which together form a half-latch, is now replaced by a static latch structure (M4, M5). An additional gain stage (M1, M2) is added to prevent back-propagation of the latched signal to the dynamic node which precharges every clock cycle. Two gain stages provide adequate buffering of the evaluation logic from the output load and therefore, the nMOS logic tree can be minimum sized further reducing the area overhead. Reduced capacitance of the dynamic load due to the minimum-sized evaluation tree requires only a small precharge transistor which reduces the clock loading and clock power but exacerbates the instability of the floating dynamic node. The insertion of the static latch eliminates redundant switching at the output but the internal dynamic node still has an enhanced switching rate. Therefore, the LSDL gate lacks a truly static switching behavior.

To achieve static input/output characteristics, a domino technique was proposed in [3] called the single phase SP-Domino (Figure 2(b)). Similar to clock-delayed domino [6], it uses a delayed clock requiring the latest arriving input to arrive with or before the rising edge of the delayed clock. The gate has a single phase operation as both the pull-up and pull-down of the dynamic node occurs during the evaluation phase. A pMOS transistor, MP1, functions both as the keeper and the pull-up device. A pulsegenerator block turns on MP1 unconditionally at the start of every evaluation cycle. If the pull-down network is on, it overpowers MP1 and the dynamic node is either maintained at or transitions to the low logic state. Alternatively, if the pull-down network is off



**Figure 3: Simulated waveforms of a 16-bit dynamic multiplexer in 1.2V 90-nm industrial CMOS process using the SP-Domino technique.** 



**Figure 4: Rise and Fall delays of 8- and 16-bit SP-Domino dynamic multiplexers with output load of 1 fanout-of-4 inverter (FO4).** 

at the start of evaluation, MP1 evaluates the dynamic node to the high logic state. The duration of the pulse at the gate of MP1 is equal to the delay of three inverters and the NAND gate of the pulse-generator block. If at the end of this pulse window, the dynamic node is in logic state '0', MN remains turned off and P is pulled up high (see Figure 3) by the action of MP2 and this turns off MP1. If, however, the dynamic node has been charged up enough to turn-on MN, the charging operation can continue even after the pulse window has elapsed.

However, the design of SP-domino suffers from several limitations. Consider the lack of flexibility in the sizing of the transistor MP1. Increasing the size of MP1 increases the keeper ratio (K, defined as the ratio of the average current drivability of the keeper transistor to that of a single evaluation path of the wide pull down network) and this in turn increases the low-to-high delay  $(T_{rise})$  due to increased contention while increasing the highto-low delay  $(T_{fall})$ . To have symmetric rise and fall delays, SPdomino requires a fixed K value (see Figure 4) and the gate cannot be tuned for a specific noise performance. In addition, MP1 should be sized large enough to ensure MN turns-on before



**Figure 5: Dynamic multiplexer implemented with the SSPD technique. G1 and G2 are the two gates of the Pulse generator.** 

the end of the pulse window. This further restricts K to moderateto-high values (0.46 for 8-bit, 0.58 for 16-bit multiplexer) and thereby reducing performance.

The power consumption of an SP-domino gate is given by

$$
P_{\text{Total}} = P_{\text{CAPACTIVE}, GATE} + P_{\text{CAPACTIVE}, PG} + P_{\text{SC}} + P_{\text{CLK}}
$$
  
=  $\frac{1}{2} \alpha (C_{\text{DYN}} + C_{\text{SWITCHING}}) V_{\text{DD}}^2 f_{\text{clk}}$  (1)  
+  $C_{\text{SWITCHING}, PG} V_{\text{DD}}^2 f_{\text{clk}} + P_{\text{OUT}} (1) I_{\text{SC}} V_{\text{DD}} f_{\text{clk}} + P_{\text{CLK}}$ 

where  $\alpha$  is the switching probability of the gate output, *PCAPACITIVE,GATE* and *PCAPACITIVE,PG* are respectively the capacitive switching power consumed by the gate and the pulse generator.  $P_{SC}$  is due to the short-circuit current  $I_{SC}$  flowing in every cycle in which the output is in logic state '1' (represented by the probability  $P_{out}(1)$ ) and  $P_{CLK}$  is the clocking power overhead.  $C_{DYN}$ represents the dynamic node capacitance and *CSWITCHING* is the switching capacitance contribution from the internal and the output nodes. Notice that the gate consumes switching power only when the output switches and even though the pulse generator has a unity switching factor, limited switching at dynamic and output nodes leads to large power savings [3]. Eq. (1) further shows that the gate draws some power even when the output is stable at state '1' as MP1 is unconditionally turned on the rising edge of every clock. In order to prevent this short-circuit power from overriding the benefits of a reduced switching factor, the pulse window must be appropriately small which further constrains the sizing of MP1.

We therefore see that the SP-domino is heavily disadvantaged by the use of the same transistor to perform both the pull-up as well as the keeper action. While a static-like switching behavior renders it advantageous in terms of power, it is inflexible and has significant design overheads.

#### **3. PROPOSED DOMINO TECHNIQUE**

In this section, we introduce the SSPD technique which achieves a static switching factor like SP-domino but avoids its inflexibility by offering tunable delay and noise performance. The schematic and simulation waveforms of the proposed static-switching pulse domino (SSPD) are shown in Figure 5 and Figure 6 respectively. Similar to an SP-Domino gate, it is a clock-delayed footless domino gate with static input/output characteristics. However to avoid the several design constraints introduced by combining the keeper and pull-up action, we separate the pull-up transistor (MP1) from the keeper transistor (MP2). This enables the use of a conditional pulse generator which turns on MP1 during evaluation only if the dynamic node has been discharged during a previous cycle. If the dynamic node has not been discharged, MP1 is not turned on and the value is maintained by the keeper transistor MP2 which forms a half-latch with the output inverter. Consequently, the switching factor of the internal nodes of the pulse generator becomes output-state dependent (consuming power only when output is in logic state '1') helping to reduce the power overhead of the pulse generator block. In conventional domino design, the keeper ratio (K) is the most important design parameter in determining the gate's delay performance and noise robustness. However, since SSPD has an additional transistor MP1 specifically to function as the pull-up device, an additional design parameter, the width of the pull-up transistor MP1, requires simultaneous consideration along with the keeper ratio to characterize the gate's performance.

We also employ a clocked isolation transistor MN1 to separate the drain terminal of the pull-down network with large capacitive loading (DYN2) from the main dynamic node (DYN1) which is inversely coupled to the output. The purpose of the isolation transistor in the SSPD gate is to shield the large parasitic capacitance at DYN2 (due to the wide pull-down network) from MP1 during a pull-up operation. Consider a situation where both DYN1 and DYN2 have been discharged to logical ground in the previous evaluation cycle (see Figure 6). At the start of the next clock cycle, if the pull-down network is off, the pull-up transistor MP1 will evaluate DYN1 to the logical high state. Contrary to the case in an SP-Domino gate where the pull-up device has to be adequately sized to charge the large capacitance on the dynamic node, most of MP1's initial current drive will be utilized to quickly charge up the much smaller capacitance on DYN1 as the current drained by the isolation transistor MN1 would be limited by its near-zero drain-to-source voltage. Thus, the sizing



**Figure 6: Simulated waveforms of a SSPD 16-bit dynamic multiplexer.** 

constraint on the pull-up device to equalize the high-to-low delay of the gate with its low-to-high delay is now much relaxed. In addition, the voltage swing on DYN2 is also reduced by  $V_{TN}$ (nMOS threshold voltage) leading to additional power savings. Also, note that MN2 is only a minimum-sized nMOS keeper for the node DYN2.

Further, the pulse generator is made conditional by generating two additional clock phases, CLKd and CLKi, as shown in Figure 6. CLKd behaves as the delayed version of clock and CLKi as the inverse of the clock only if the main dynamic node (DYN1) has been discharged during an evaluate cycle which will make a pullup operation in the ensuing cycle probable. If, however, the dynamic node is maintained high, CLKd and CLKi are pulled down to the low logic state (using feedback from DYN1) half a clock period apart (CLKd is pulled down only at the next negative clock edge). Thus, no pulse is generated at the output of the pulse generator during the next cycle. The pulse generator is therefore off and no extravagant switching activity is seen on its internal nodes. If the pull-down network turns on during the next cycle, it faces contention only from the keeper transistor MP2 and not from the turned-off MP1. The situation is depicted in Figure 7(d). Thus, the keeper ratio, like in a conventional domino, affects only the low-to-high delay of the gate and the noise robustness.

Consider the case when DYN1 is evaluated to the low state during a clock cycle and then pulled-up high by MP1 during the next cycle. The situation is depicted in Figure 7(b). Since MP2 and the nMOS evaluation network is off, the speed of pull-up is determined only by the size of MP1 (assuming MN1, like the evaluation transistors, is fixed-sized). Thus, the gate's fall delay can be independently tuned by only modifying the width of MP1.The action of CLKd and CLKi also extends the pulse width to nearly the on-period of the clock during a pull-up operation. This is made possible by turning on Path 2 and turning off Path 1 in the gate G1 of the pulse generator. The extended pulse width further relaxes the design constraint on MP1.



**Figure 7: Charging and contention currents at the two dynamic nodes - DYN1 and DYN2 of an SSPD gate under four different cases. 1 and 0 represent the logic high and low respectively.**



**Figure 8: Delay and UNG-tunable performance of SSPD Domino. Simulation results are of a 16-bit dynamic multiplexer gate with 1FO4 load.** 

The design of the SSPD can thus be accomplished in two simple steps. In the first step, to meet a particular noise target and delay performance, MP2 is sized to achieve a particular keeper ratio. In the second step, MP1 is sized to equalize the gate's high-to-low delay with the low-to-high delay (determined by K). Note that the two steps are independent and affords the designer the flexibility of designing for a wide set of specifications. Figure 8 shows the delay and Unity Noise Gain (UNG) variation of a 16-bit SSPD multiplexer for a keeper ratio between 0.1-1 designed in the manner discussed above. For UNG measurement, determined as



**Figure 9: Variation of average power with** *Pout***(1) of 16-bit dynamic multiplexers for (a) 2FO4 inverter load (b) 3FO4 inverter load (c) Different K (keeper ratio) values in the SSPD gate.** *Pout***(1) represents the probability of output to be in the logic high state.** 

in [7] at the worst-case leakage corner (fast NMOS, slow PMOS at 110°C), the noise pulse width is fixed at the approximate rise/fall time of a fanout-of-4 inverter (FO4) in 90-nm technology (≈50ps). A minor design consideration is to ensure that MP1 is not sized too large as a large MP1 increases the contention power consumed by the gate (Figure  $7(a)$ ) and increases the disturbance at the dynamic nodes during the period of contention (Figure 6). However, even for a K value of 0.1 (when the size of MP1 is the largest) and a pulse width of around 15% of the total clock period, the maximum disturbance on the dynamic nodes DYN1 and DYN2 was only  $\sim 0.2$  V, much below the switching point of the output inverter. The only major design constraint of the SSPD scheme then is the overhead involved in maintaining a clockdelayed operation under PVT variations.

The power equation of an SSPD gate is given by

$$
P_{Total} = P_{CAPACTIVE, GATE} + P_{CPACTIVE, PG} + P_{SC} + P_{CLK}
$$
  
\n
$$
= \frac{1}{2} \alpha (C_{DYN1} + C_{SWTCHING}) V_{DD}^2 f_{clk}
$$
  
\n
$$
+ \frac{1}{2} \alpha C_{DYN2} (V_{DD} - V_{TN})^2 f_{clk}
$$
  
\n
$$
+ P_{OUT}(1) C_{SWTCHING, PG} V_{DD}^2 f_{clk} + P_{OUT}(1) I_{SC} V_{DD} f_{clk} + P_{CK}
$$
  
\n(2)

Assuming  $V_{DD}$ - $V_{TN}$   $\approx 0.75 V_{DD}$ , (2) reduces to

$$
P_{\text{total}} \approx \frac{1}{2} \alpha (C_{\text{DNN1}} + \frac{C_{\text{DNN2}}}{2} + C_{\text{SWTCHING}}) V_{\text{DD}}^2 f_{\text{clk}} + P_{\text{OUT}}(1) C_{\text{SWTCHING}, \text{PG}} V_{\text{DD}}^2 f_{\text{clk}} + P_{\text{OUT}}(1) I_{\text{SC}} V_{\text{DD}} f_{\text{clk}} + P_{\text{CLK}}(3)
$$

Comparing (3) with (1), we see that SSPD's capacitive power is different from SP-Domino in that the capacitive contribution from the drain terminal of the evaluation network is now reduced to half due to the smaller voltage swing (between VDD-V $_{TN}$  and 0). Additionally, the contribution from the pulse generator to the overall power consumption is now output-state dependent. We expect the above two factors should help to offset the increase in power consumption of an SSPD gate due to the overhead of generation of two additional local clock phases - CLKd and CLKi.

#### **4. SIMULATION RESULTS**

16-input footless dynamic multiplexers are simulated in 1.2V 90 nm industrial CMOS process. The average power consumption for different output state probabilities of an SP-Domino gate, optimized for equal rise and fall delays, is compared with a conventional footless domino gate (equal-UNG conditions) and the proposed SSPD gate (under equal delay and equal-UNG conditions). To account for the overhead of clocked transistors, the power consumption of the local clock buffer is included in the power measurements while that of the input buffer is excluded. The evaluation transistors of the pull-down network are sized equally for all three designs. Simulations are done by varying the output state probability (*Pout*(1)) between 0.1 and 1. For each value of  $P_{out}(1)$ , the maximum possible value of the input switching factor ( which is equal to the output switching factor for SP-Domino and SSPD -  $\alpha$ ), is chosen so as to have the maximum power dissipation. As an example, for *Pout*(1) equal to 0.5, when *α* can assume a value of either 0.2 or 1, the input is varied to have an  $\alpha$  value of 1. Similarly,  $\alpha$  is 0.2 for  $P_{out}(1)$  equal to 0.1 and 0.9, and 0.4 for  $P_{out}(1)$  equal to 0.2 and 0.8 and so on.

Power measurements with 2FO4 and 3FO4 output loads are shown in Figure 9 (a),(b). It is seen that when  $P_{out}(1)$  is less than 0.5, the SSPD gate has a similar power consumption to that of the same-UNG conventional gate. However, the power advantage due to the static-switching behavior becomes apparent for output state probabilities greater than 0.5. For equal noise robustness and *Pout*(1) greater than 0.5, SSPD gate offers a 18-35% power reduction for a 2FO4 load and around 20-44% power reduction for a 3FO4 load when compared to the conventional domino gate. This is because  $\alpha$  is greater than  $P_{out}(1)$  when  $P_{out}(1)$  is less than 0.5 but starts decreasing for larger values of  $P_{out}(1)$ . Since the capacitive power consumption of an SSPD gate is dependent on *α* (see (3)), this leads to a power reduction as well. Also notice that although the  $\alpha$  values are same for  $P_{out}(1)$  equal to 0.2 and 0.8, the power demand in the latter case is higher due to the larger power consumption by the pulse generator and contention currents which are output-state dependent.

Due to the reduced activity at the output, the power advantage also increases with larger output loads. In Figure 9(c), the variation of average power with different keeper ratios for the SSPD gate is shown. With increasing K, the size of and the

| Threshold      |                        | Conventional           | SP-Domino      | <b>SSPD</b>            | <b>SSPD</b>            |
|----------------|------------------------|------------------------|----------------|------------------------|------------------------|
| Variation      |                        | $(Iso-UNG)$            |                | (Iso-delay)            | (Iso-UNG)              |
| $(\sigma/\mu)$ |                        | $\mu$ ( $\sigma/\mu$ ) | $\mu$ (σ/μ)    | $\mu$ ( $\sigma/\mu$ ) | $\mu$ ( $\sigma/\mu$ ) |
| $1\%$          | $T_{\text{delay}}(ps)$ | $142(0.8\%)$           | $140.6(0.5\%)$ | $140.6(0.6\%)$         | $171(0.7\%)$           |
|                | Power( $\mu$ W)        | 46.4                   | 49.6           | 43.7                   | 51.7                   |
| $5\%$          | $T_{\text{delay}}(ps)$ | $144(3.2\%)$           | $141.1(1.8\%)$ | 140.8 $(1.6\%)$        | $171.5(2.2\%)$         |
|                | Power( $\mu$ W)        | 46.6                   | 49.6           | 43.6                   | 51.8                   |
| 10%            | $T_{\text{delay}}(ps)$ | $145(6.2\%)$           | $141.8(3.8\%)$ | 141.2 (3.4%)           | $172.7(4.4\%)$         |
|                | Power(uW)              | 46.9                   | 49.7           | 43.7                   | 52                     |

**Table 1: Comparison of delay distribution and power for conventional, SP-Domino and SSPD at**  $P_{out}(1) = 0.5$ **. T<sub>delay</sub> is the average of the rise and the fall delays.** 

contention current due to MP2 (Figure  $7(d)$ ) increases while the size of and contention due to MP1 (Figure  $7(a)$ ) decreases. For lower output state probabilities, contention due to MP2 is dominant and hence the average power with increasing K increases. However, for higher output state probabilities, contention due to MP1 becomes more frequent and therefore, the power follows a decreasing trend with increasing K values.

The power performance of SP-Domino is marginally better  $($   $\sim$  5-12%) than the SSPD gate. This can be explained by the use of a simpler pulse generator, which contributes around 21% of the total gate power at the highest value of *Pout*(1). The same value for the SSPD gate is closer to 25%. Therefore, the simulation results show that while both SP-Domino and SSPD techniques offer significant power reductions for biased output states  $(P_{out}(1) > 0.5$ , which is usually the case for high fan-in gates), the SSPD gate has the important advantage of being easily modified for a particular delay or noise performance.

The three designs are also analyzed for process variations by performing 500-point Monte Carlo simulations with the standard deviation of threshold variations set to 1%, 5% and 10% of the mean value. The average delay and its variation, and the average power values are shown in Table 1. The variation in power is found to be negligible and is omitted. Since both the SSPD and domino gates were designed to have a sufficiently wide pulse width to account for variations, the delay spread of both the techniques is similar to that of the conventional scheme and the pulse generator is found to not increase the performance variability.

## **5. CONCLUSION**

We propose a static-switching pulse domino technique that utilizes a conditional pulse generator and an isolation transistor to remove the inflexibility of an SP-Domino gate while retaining the power advantages from having a static-switching behavior. The proposed circuit can easily be designed to meet a wide range of delay and noise specifications. Further, for biased output states, we observe as much as 44% reduction in power at equal noise robustness of 16-bit dynamic multiplexers in 90-nm CMOS technology.

## **6. ACKNOWLEDGEMENT**

This work was supported in part by Samsung Electronics. The authors gratefully acknowledge the support of the IC Design Education Center (IDEC), by the provision of CAD tools, and the support of the Inter-university Semiconductor Research Center (ISRC).

## **7. REFERENCES**

- [1] K. L. Shepard and V. Narayanan, "Noise in deep submicron digital design," in *Proc. IEEE Int. ASIC/SOC Conf.* , 1996, pp. 524-531.
- [2] K. Bernstein, K. M. Carrig, C. M. Durham, P. R. Hansen, D. Hogenmiller, E. J. Nowak, and N. J. Rohrer, *High Speed CMOS Design Styles*, Kluwer Academic Publishers, 1999.
- [3] C. J. Akl and M. A. Bayoumi, "Single-Phase SP-domino: a limitedswitching dynamic circuit technique for low-power wide fan-in logic gates," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 55, no. 2, pp. 141-145, Feb. 2008.
- [4] R. Montoye, *et al.*, "A double precision floating point multiply," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2003, pp. 336-337.
- [5] J. Sivagnaname, H.C. Ngo, K. J. Nowka, R. K. Montoye, and R. B. Brown, "Wide limited switch dynamic logic implementations," in *Proc. IEEE Int. Conf. on VLSI Design*, 2006.
- [6] G. Yee and C. Sechen, "Clock-delayed domino for dynamic circuit design," *IEEE Trans. Very Large-Scale Integr. (VLSI*) *Syst.*, vol. 8, no. 4, pp. 425-430, Aug. 2000.
- [7] H. Mahmoodi-Meimand and K. Roy, "Diode-footed domino: a leakage-tolerant high fan-in dynamic circuit design," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 3, pp. 495-503, Mar. 2004.