# A True Single-Phase Energy-Recovery Multiplier

Suhwan Kim, *Member, IEEE*, Conrad H. Ziesler, *Student Member, IEEE*, and Marios C. Papaefthymiou, *Senior Member, IEEE* 

Abstract—In this paper, we present the design and experimental evaluation of an 8-bit energy-recovery multiplier with built-in self-test logic and an internal single-phase sinusoidal power-clock generator. Both the multiplier and the built-in self-test have been designed in SCAL-D, a true single-phase adiabatic logic family. Fabricated in a 0.5- $\mu$ m standard n-well CMOS process, the chip has an active area of 0.47 mm<sup>2</sup>. Correct chip operation has been verified for clock rates up to 140 MHz. Moreover, chip dissipation measurements correlate well with HSPICE simulation results. For a selection of biasing conditions that yield correct operation at 140 MHz, total measured average dissipation for the multiplier and the power-clock generator is 250 pJ per operation.

*Index Terms*—Adiabatic design, arithmetic circuits, charge recovery, dynamic circuits, LC tank circuits, low-energy circuits, low-power circuits, very large scale integration (VLSI) design.

#### I. INTRODUCTION

**E** NERGY recovering (a.k.a. adiabatic) logic presents a promising alternative to conventional CMOS for the realization of low-energy electronics. In adiabatic circuits, energy dissipation is kept low by maintaining low voltage drops across conducting devices. Moreover, undissipated energy related to charges stored in parasitic capacitors is recycled through an inductor or network of switched capacitors [1]–[3]. Thus, adiabatic circuitry can potentially achieve sub- $CV^2$  energy dissipation per cycle, where C is the total switched capacitance and V is the peak supply voltage.

A plethora of adiabatic circuit topologies has been proposed over the past decade. Most of these circuits have relied on multiple-phase power-clocks to steer currents and recycle charges [4]–[9]. Thus, they are not attractive for high-speed design, due to their relatively complex control requirements, further exacerbated by the data-dependent fluctuations of the power-clock load [10]. In contrast to multiphase circuits, true single-phase adiabatic circuits rely on just one phase of a power-clock waveform for power and synchronization [11]–[13]. Thanks to their simple clocking requirements, single-phase circuits enjoy minimal control overheads and are thus capable of operating at high speeds, while achieving high-energy efficiency.

Manuscript received November 13, 2001; revised August 3, 2002. This work was supported in part by the Army Research Office under Grant DAAD19-99-1-0304 and by an AASERT Grant DAAG55-97-1-0250.

S. Kim was with the Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA. He is now with Low Power Circuits and Technology, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA (e-mail: suhwan@us.ibm.com).

C. H. Ziesler and M. C. Papaefthymiou are with the Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: cziesler@eecs.umich.edu; marios@eecs.umich.edu).

Digital Object Identifier 10.1109/TVLSI.2003.810795

We recently embarked on the design of an 8-bit adiabatic multiplier using SCAL-D, a source-coupled adiabatic logic family that operates with a single-phase power-clock [14], [15]. In view of previous adiabatic multiplier designs that used multiple power-clocks and achieved rather modest operating speeds [16]-[18], the main objective of our research project was to demonstrate the feasibility of designing adiabatic circuitry that functions correctly and efficiently at relatively high power-clock frequencies (100 MHz and above). To enable at-speed testing and the measurement of power dissipation without the complications and interference caused by chip I/O, our multiplier included built-in self-test (BIST) circuitry that was also designed in SCAL-D. Moreover, to decouple the testing of the multiplier core from the power-clock generation circuitry, our chip was designed so that the multiplier could be driven using either an external or an internal power-clock generator.

Our multiplier chip was fabricated in a 0.5  $\mu$ m standard n-well CMOS process through MOSIS. The correct operation of our chip was validated in the lab with a 3 V power supply for clock rates up to 140 MHz. The power supply level was not limited by SCAL-D. It was primarily dictated by the peripheral circuits, including adiabatic-to-digital converters and output pad drivers, and our experimental setup, including the test board and oscilloscope probes. In general, the measured trend of dissipation correlated well with HSPICE simulations for identical biasing conditions. For example, under a set of conditions that ensured correct operation at 140 MHz, the total measured dissipation of our multiplier, including its internal power-clock generator, was 250 pJ per operation, approximately 20% lower than HSPICE-based results.

This paper describes the design and experimental evaluation of our adiabatic multiplier. We discuss in detail the circuit, layout, and power-clock distribution issues of our adiabatic chip and internal power-clock generator. We also discuss the design and verification methodologies developed to facilitate the use of SCAL-D in conjunction with the layout tool MAGIC, the circuit simulator HSPICE, and structural Verilog-HDL descriptions.

In addition to power measurements, our paper includes a simulation-based comparison of our multiplier with several static CMOS multipliers synthesized for low power. In HSPICE simulations with post-layout extracted parasitics, our multiplier functions correctly at clock rates exceeding 200 MHz, while dissipating up to four times less energy than its CMOS counterparts. Both our experimental measurements and our simulation results suggest that SCAL-D is a practical circuit topology for high-performance, low-energy adiabatic systems.

The remainder of this paper has seven sections. In Section II, we describe the structure and operation of SCAL-D. In Sec-



Fig. 1. (a) A pMOS and (b) an nMOS buffer in SCAL-D.

tion III, we present simple analytic loss models for SCAL-D. The design and implementation of the 8-bit SCAL-D multiplier along with its BIST logic and internal power-clock generator are described in Section IV. Section V provides an overview of our design methodologies we developed for SCAL-D. Section VI presents a comparison of our adiabatic multiplier with comparable static CMOS designs. Section VII provides results from the testing and experimental evaluation of our fabricated chip. Our contributions are summarized in Section VIII.

#### II. SOURCE-COUPLED ADIABATIC LOGIC

This section describes SCAL-D, our source-coupled adiabatic logic with diode-connected transistors. The structure and operation of SCAL-D are similar to those of SCAL [12]. SCAL-D retains all the positive features of SCAL [13], including correct operation with single-phase power-clock. Furthermore, it achieves energy efficient operation across a broad range of operating frequencies by using a pair of cross-coupled transistors and an individually tunable current source at each gate.

#### A. SCAL-D Gates

The basic structure of a SCAL-D pMOS gate is shown in Fig. 1(a). This pMOS buffer comprises a pair of cross-coupled transistors (MP1 and MP2), a pair of diode-connected transistors (DN1 and DN2), a pair of current control switches (MP3 and MP4), two evaluation trees (MP5 and MP6), and a current source (MP7 and MP8) biased by a constant voltage  $V_{\rm BP}$  and a sinusoidal power-clock  $V_{\rm PC}$ . A constant supply voltage  $V_{\rm DD}$  is required to impart an initial charge to one of the output nodes (of and ot) through the evaluation tree. The quantity of charge transferred from  $V_{\rm dd}$  is controlled by the W/L ratio of the cascoded transistors of the current source (MP7 and MP8) and the bias voltage  $V_{\rm BP}$ .

The basic structure of a SCAL-D nMOS gate is shown in Fig. 1(b). It has the same structure as the pMOS gate, but its current source is tied to  $V_{SS}$ . The biasing voltage  $V_{BN}$  may differ from  $V_{BP}$ .

## B. SCAL-D Operation

In this section, we highlight the operation of SCAL-D pMOS gates and points out its differences from SCAL. The operation of SCAL-D nMOS gates is similar to that of their pMOS counterparts.

Each SCAL-D pMOS gate goes through an *evaluate* and a *discharge* phase. During *discharge*, the energy stored in the node of or ot is recovered through the pair of cross-coupled transistors (MP1 or MP2) and the pair of diode-connected transistors (DN1 and DN2). In this phase,  $V_{PC}$  starts from *high* and ramps down toward *low*, pulling both of and ot down toward the predischarge voltage. This state change tracks the power-clock and, thus, recovers charge from the output nodes. It is not fully adiabatic, however, since some of the charge is recovered through a device with a constant voltage drop.

A new output is computed during *evaluate*. In the beginning of this phase, a dissipative current is directed by the evaluation tree, creating a small voltage imbalance between the cross-coupled transistors. Even though all the gates of the same type have the same biasing voltage, the amount of this current is individually controlled by the W/L ratios of the current source transistors. Thus, the dissipation of each gate can be minimized under any output load or operating frequency. After the nonadiabatic evaluation, the current source is turned off, and the load is driven adiabatically by the sense amplifier with the rising of  $V_{PC}$ .

The operation of SCAL-D is similar to that of SCAL. The main difference between the operation of the two gates is in the method that output loads are discharged and in the flexibility provided for tuning the current source. In SCAL-D, the discharge time is shortened by adding the diode-connected transistors to provide additional current. Moreover, the magnitude of the current in the beginning of *evaluate* can be controlled by adjusting the W/L ratios of MP7 and MP8 in each gate and by tuning the dc biasing voltage  $V_{\rm BP}$  that is shared by all pMOS gates. For a given operating frequency, the duration of the nonadiabatic stage during *evaluate* can be controlled by adjusting the biasing voltage  $V_{\rm BP}$ .

The operation of the SCAL-D nMOS gate is similar to that of the pMOS gate. Instead of *precharge*, the first phase in the



Fig. 2. Four-stage PNPN cascaded pipeline in SCAL-D.

operation of the nMOS gate is *charge*. The second phase is still *evaluate*.

## C. SCAL-D Cascades

To build SCAL-D cascades, pMOS, and nMOS gates are chained alternately. The only signal required to control a SCAL-D cascade is a single phase of the power-clock  $V_{PC}$ . The speed and energy efficiency of a SCAL-D cascade can be tuned by sizing the current source in each gate and choosing the two dc biasing voltages. Since individual gates can be tuned independently, efficient operation can be achieved for a broad range of operating frequencies.

A four-stage PNPN cascade structure of SCAL-D buffers is shown in Fig. 2. This figure also shows the power clock  $V_{PC}$ and the two dc biasing voltages  $V_{\rm BP}$  and  $V_{\rm BN}$  that are required to control the SCAL-D cascade. At any time during the operation of the circuit, either all pMOS gates evaluate and all nMOS gates charge, or all pMOS gates discharge and all nMOS gates evaluate. The brief time interval between evaluate/discharge or evaluate/charge during which the outputs of a gate are stable is called the *hold* phase in that gate's operation. While the current switches in the odd stages are off, their outputs are stable. At the same time, the function blocks of the even stages are connected to  $V_{DD}$  (or  $V_{SS}$ ) through the current sources and can safely evaluate their outputs. After half a cycle, while the current switches of the even stages are off, the inputs of the odd stages are stable, and their function blocks are connected to  $V_{SS}$  (or  $V_{DD}$ ) through their current sources.

### **III. POWER-LOSS MODELS**

In this section, we present analytic power-loss models for the power-clock distribution wire, the diode-connected transistors, the cross-coupled transistors, and the nonadiabatic function evaluation tree in SCAL-D. Each SCAL-D nMOS or pMOS gate precharges or discharges its output nodes, evaluates its function evaluation tree, and drives its outputs within  $2\pi$  radians, with the power-clock source given by  $v_{Pc}(\theta) = V_A \cdot \sin(\theta)$ , where  $V_A = V_{dd}/2$  and  $\theta = \omega \cdot t$ . For the power-loss models of a SCAL-D nMOS gate, we assume that the precharge of output nodes through diode-connected transistors occurs between  $-\pi/2$  and  $\pi/2$ , the nonadiabatic charge transfer through the current source occurs between  $\pi/3$  and  $2\pi/3$ , and the cross-coupled transistors boost the voltage difference between the two output nodes between  $\pi/2$  and  $3\pi/2$ . The SCAL-D pMOS gates operate  $\pi$  radians out of phase with the nMOS gates.

#### A. Energy Dissipation of Power-Clock Distribution Wire

When modeling a power-clock distribution wire, as shown in Fig. 3(a), the total resistance is lumped into a single resistor  $R_{\text{wire}}$ , and the global capacitance is combined into a single capacitor  $C_{\text{load}}$ . The current through a power-clock distribution wire  $i_{\text{wire}}(\theta)$  is defined as  $i_{\text{wire}}(\theta) = \omega \cdot C_{\text{load}} \cdot V_{\text{A}} \cdot \cos(\theta)$ , when  $-\pi/2 \leq \theta < 3\pi/2$ . Then, the energy consumption per cycle due to this wire  $E_{\text{wire}}$  is given by

$$E_{\text{wire}} = \pi \cdot R_{\text{wire}} \cdot \omega \cdot C_{\text{load}}^2 \cdot V_{\text{A}}^2.$$
(1)

#### B. Energy Dissipation of Diode-Connected Transistors

The diode-connected transistors can be replaced by ideal diodes, considered to be switches in series with a diode forward voltage  $V_{\text{diode}}$  as shown in Fig. 3(b). The output node capacitances  $C_{xt}$  and  $C_{xf}$  are precharged through the forward biased diode and represent the sum of all diffusion, junction, wiring, and fanout capacitances connected to the output nodes xt and xf, respectively.

If we assume that  $V_{xt} = V_A - V_{diode}$  and  $V_{xf} = -V_A$ , then the diode currents  $i_{xt}(\theta)$  and  $i_{xf}(\theta)$  are defined as  $i_{xt}(\theta) = 0$  and  $i_{xf}(\theta) = \omega \cdot C_{xf} \cdot V_A \cdot \cos(\theta)$ , when  $-\pi/2 \leq \theta < \sin^{-1}((V_A - V_{diode})/V_A)$ . Then, the energy consumption per cycle of the diode-connected transistors,  $E_{diode}$ , is given by the expression

$$E_{\text{diode}} = (2 \cdot V_{\text{A}} - V_{\text{diode}}) \cdot V_{\text{diode}} \cdot C_{\text{xt}}.$$
 (2)

C. Energy Dissipation of Non-Adiabatic Function Evaluation Tree

Non-adiabatic operation occurs in the beginning of each gate's function evaluation phase. During this phase, charge stored in the output nodes xt and xf is discharged through the *true* and *false* branches of the gate's function evaluation tree, as shown in Fig. 3(c). The result of this operation is to create a voltage difference  $\Delta V = Q_{xt}/C_{xt} - Q_{xf}/C_{xf}$  between the two



Fig. 3. Loss models of (a) power-clock distribution wire, (b) diode-connected transistors, (c) sense amplifier structure, and (d) functional evaluation tree.

output nodes xt and xf, where  $Q_{xt}$  and  $Q_{xf}$  denote the charge transferred from the output nodes xt and xf. If we assume that  $Q_{xt}$  and  $Q_{xf}$  are transferred from the output nodes at their precharge voltage  $(2 \cdot V_A - V_{diode})$ , then the approximate loss per cycle during the nonadiabatic function evaluation phase is given by

$$E_{\text{eval}} \simeq (Q_{\text{xt}} + Q_{\text{xf}})(2 \cdot V_{\text{A}} - V_{\text{diode}}). \tag{3}$$

#### D. Energy Dissipation of Sense Amplifier Structure

The sense amplifier structure, composed of a pair of cross-coupled transistors, is activated by the rising (pMOS) or falling (nMOS) edge of the power-clock and is sensitized by the voltage difference left from the nonadiabatic evaluation phase. In the beginning of the sense amplifier's activation, a small amount of current passes through both sides of the amplifier. As activation proceeds, the current through one side increases, whereas current through the other side drops to near zero. Assuming that the transistors used in the sense amplifier structure are in the linear region for most of the activation, they can be modeled as ideal switches in series with a resistor  $R_{\text{sense}}$ , as shown in Fig. 3(d). These equivalent resistors can be chosen so that the total energy dissipation between the actual circuit and the model are the same. Their values can be approximated by the average  $1/G_m$  of the device over the region  $-V_{A} < V_{gs} < V_{A} - V_{diode}$ .



Fig. 4. Block diagram of 8-bit SCAL-D multiplier with BIST logic and internal power-clock generator.

The current through the sense amplifier structure  $i_{\text{sense}}(\theta)$ is approximated by  $i_{\text{sense}}(\theta) = \omega \cdot C_{\text{xt}} \cdot V_{\text{A}} \cdot \cos(\theta)$ , when  $\pi - \sin^{-1}((V_{\text{A}} - V_{\text{diode}})/V_{\text{A}}) \le \theta < 3\pi/2$ . Then, the per-cycle



energy consumption of the sense amplifier structure  $E_{sense}$  is approximated by

$$E_{\text{sense}} \simeq \pi/2 \cdot R_{\text{sense}} \cdot \omega \cdot V_{\text{A}}^2 \cdot C_{\text{xt}}^2.$$
 (4)

#### E. Total Energy Dissipation

The total energy consumption per cycle is given by the sum of (1) through (4) for  $2\pi$  radians of the power-clock. The loss of the power-clock distribution wire and sense amplifier structure is proportional to its operating frequency. The loss of diode-connected transistors is proportional to  $V_{diode}$ .

Low energy consumption is achieved by careful transistor sizing with the aim to balance the various dissipation components while providing for correct functionality at an efficient operating condition (frequency, supply voltage, bias voltage). The primary tradeoff is between the energy dissipated through the nonadiabatic function evaluation tree and the energy dissipated in the sense amplifier structure. This tradeoff is affected both by transistor sizing and choice of bias voltages.

## IV. MULTIPLIER, BIST LOGIC, AND POWER-CLOCK GENERATOR

To demonstrate the practicality of SCAL-D, we used it to design and fabricate an 8-bit multiplier with BIST logic in a 0.5  $\mu$ m standard n-well CMOS process. The design was fabricated by TSMC through MOSIS. In this Section, we present the circuit, logic, and structure of our 8-bit SCAL-D multiplier, BIST logic, and internal single-phase power-clock generator.

Typically, multiplication involves the evaluation and accumulation of shifted partial products. These operations can be performed in several different ways. In general, the choice of a particular multiplication scheme is based on factors such as speed, throughput, numerical accuracy, area, and power [19]–[24]. Since each SCAL-D gate combines both logic and



Fig. 6. Layout of a parallel multiplier cell containing a 1-bit full-adder and a buffer cell in SCAL-D.

state holding functions, we chose to implement a fully pipelined carry-save multiplier architecture. As shown in Fig. 4, our system consists of an 8-bit unsigned multiplier, BIST logic, and an internal single-phase power-clock generator. The multiplier and BIST logic are implemented entirely in SCAL-D. The voltages  $V_{\rm dd}$ ,  $V_{\rm ss}$ , and  $V_{\rm PC}$  are supplied to each SCAL-D gate.

## A. Multiplier and BIST Logic

The multiplier is constructed from a set of basic cells, the most critical and complicated of which is the 1-bit full adder cell. The schematic and physical layout of this 1-bit full adder cell in SCAL-D, shown in Figs. 5 and 6, contain the equivalent of 3-bits of state elements along with the full adder logic. Our SCAL-D implementation uses 85 transistors. An equivalent static CMOS implementation would require 100 transistors, including 28 transistors for a 1-bit full adder logic and 72 transistors for 3 flip-flops [25].

Our BIST logic is based on Koenemann's built-in logic block observer (BILBO) [26], which uses a linear feedback shift register (LFSR) to both generate a pseudorandom pattern for the



Fig. 7. TYPE-2 implementation of 16-stage BILBO in SCAL-D.



Fig. 8. Full-custom layout of 8-bit multiplier with associated BIST logic in SCAL-D.

circuit under test and perform signature analysis [25], [27]. We implemented our BILBO entirely in SCAL-D, thus enabling full-speed testing without any I/O synchronization issues. Fig. 7 shows the block diagram of our SCAL-D BILBO. To generate the maximum-length sequences, a primitive polynomial 1+x+ $x^4 + x^7 + x^{10} + x^{16}$  was used to build a LFSR. In normal operation mode, the BILBO acts as a set of latches. In the self-test mode, however, the BILBO is converted into LFSRs. Mode selection is controlled by the signals  $S_1$  and  $S_2$ . Each circuit under test has pseudorandom patterns applied to its inputs, and its outputs are compressed to form a signature. Specifically, BILBO-1 and BILBO-2 in Fig. 4 are configured as a pseudorandom pattern generator and a multiple input signature register, respectively. The output sequences of BILBO-1 and BILBO-2 ix and ox must be observed to infer the correct operation of the entire multiplier.

The full-custom layout of the 8-bit multiplier and its associated BIST logic in Fig. 8 was designed and implemented in SCAL-D. It contains 11 854 transistors, of which 2806 transistors (about 24%) are used in the BIST logic, and the remaining 9048 transistors (about 76%) in the multiplier. The total area of the design is approximately 0.710 mm<sup>2</sup> (=0.829 mm × 0.857 mm), including the multiplier core of 0.470 mm<sup>2</sup> (=0.781 mm × 0.607 mm), in a 0.5- $\mu$ m standard n-well CMOS process. The



Fig. 9. Internal single-phase power-clock generator and off-chip connections.

Asynchronous State Machine



Fig. 10. Balanced pulse-alternating asynchronous state machine used in power-clock generator.

capacitance of our power-clock distribution wires contributes 57% of the total capacitance loading of the power-clock node. The remaining capacitance of power clock is distributed between the signal interconnect and the transistor loads, both of which were drawn to be balanced with respect to the dual rails. These factors contribute to an effective power-clock capacitance that is nearly time-invariant.

For the evaluation tree of each gate, minimum-size transistors were used with W/L ratio equal to 3/2. To enhance the energy efficiency of SCAL-D, the W/L ratio of each current source and cross-coupled transistor pair was selected according to the target operating frequency, the capacitive load at the output of the gate, and the power-clock supply voltage level.  $V_{PC}$ ,  $V_{DD}$ , and  $V_{SS}$  were routed using wide wires primarily on metal layers 2 and 3.  $V_{BP}$  and  $V_{BN}$  were routed primarily on polysilicon under local  $V_{DD}$  and  $V_{SS}$  interconnect. This approach minimizes differential mode noise between the bias and supply nodes while consuming very few routing resources.

## B. Power-Clock Generator

An internal single-phase sinusoidal power-clock generator was designed using the topology shown in Fig. 9. It consists of an LC resonant system, two power switches (pMOS and nMOS), an asynchronous state machine, and a three-element differential logic ring oscillator. The ring oscillator is tuned over a small range of frequency and duty cycles by two bias voltages. The output of the ring oscillator is a train of pulses which feeds the asynchronous state machine. The state machine, as shown in the schematic of Fig. 10, simply alternates the pulses



Fig. 11. Full-custom layout for the power-clock generator, including guard rings and driver transistors.

to each of its two outputs while preserving the pulse width. The two outputs are then buffered and drive the gates of the pMOS and nMOS switches. Fig. 11 shows the final layout for the power-clock generator. Not shown is a large bypass capacitor which is four times the size of the power-clock generator.

The LC system is composed of an off-chip/bondwire inductor coupled with the on-chip adiabatic load capacitance. This simple harmonic resonant system is pumped using a zero-voltage switching scheme. The pMOS switch is turned on at the peak of the sinusoid, where the voltage difference between  $V_{\rm dd}$  and  $V_{\rm PC}$  is nearly zero. The nMOS switch is turned on at the negative peak of the sinusoid, where the voltage difference between  $V_{\rm PC}$  and  $V_{\rm ss}$  is nearly zero. This scheme can minimize both conduction and switching losses [28]. Specifically, conduction losses are minimized, because each device conducts a current at most  $I_{\rm switch} < (2E_{\rm loss, half-cycle}/L_{\rm resonant})^{1/2}$ . Switching losses are minimized at the same time, since the energy stored in the parasitic source-to-drain capacitance is nearly zero when the voltage difference between  $V_{\rm dd}$  and  $V_{\rm PC}$ or between  $V_{\rm PC}$  and  $V_{\rm ss}$  is zero.

The overall losses of the power-clock generator can be classified into two components:  $E_{fixed}$ , which is independent of load size, and  $E_{variable}$ , which scales with the load.  $E_{fixed}$  includes the losses of the asynchronous state machine and ring oscillator, and amounts to roughly 1-2 mW at 3 V, depending on ring oscillator tuning. Evariable includes all losses attributable to transistors whose sizes need to be scaled with the load, including the power FETs and its drivers. Because our multiplier is a relatively small design, these transistor sizes are not very large, and so we observe  $E_{\text{variable}}$  to be nearly the same as  $E_{\text{fixed}}$  (i.e., in the range of a few mW, depending on frequency and load dissipation). Thus, as we consider larger designs, we expect the relative overhead of the power-clock generator to decrease, as the overall dissipation tracks  $E_{variable}$ , which itself scales linearly with load dissipation, and the constant overhead of  $E_{fixed}$ is amortized over a larger total dissipation.

In our design, we chose to slightly oversize switches S1 and S2 so as to make the clock generator capable of driving the adiabatic multiplier even if the dissipation of the multiplier was higher than expected. As the optimum condition for sizing S1, S2, and their associated driver buffers is rather broad, it is relatively easy to choose appropriate sizes. The sizes are, of course, very dependent on the total load dissipation, as this determines the peak current constraints placed on S1 and S2.

Another important aspect of our power-clock generator is the effect of design size scaling on the losses attributed to bondwire and package parasitics. From the familiar relation  $1/\omega^2 = L \cdot C$ , note that the frequency  $\omega$  is inversely proportional to L and C. Thus, for a design of size  $10 \cdot S$  that uses ten inductor wires in parallel (thus reducing parasitic resistance), we expect the same relative fraction of parasitic losses as we would for a design of size S that uses one inductor.

#### V. DESIGN METHODOLOGIES

This section describes the design methodologies developed for our adiabatic circuits. Our design methodologies differ from conventional design practices for several reasons. First, each gate is inherently a combinational circuit plus a state element. Therefore, each pipeline stage includes only one level of logic. Second, since every signal is in phase with the power-clock signal, timing analysis is replaced with analog signal integrity analysis. Phase delays are thus interpreted as a reduction in signal amplitude. Third, as each gate often only needs minimum sized evaluation transistors, fanout load is a function of the logic being implemented and wire length. Consequently, transistor sizing for each gate is independent of other gates to a certain extent.

In static CMOS we can define delays based on rising and falling edges of signals and sum up the worst case delay through a network of gates. In SCAL-D logic, however, every signal in the system shares the same fundamental sinusoid as power clock. A "slow" or "weak" gate results in lower amplitude signals than a "fast gate." Furthermore, a "long" wire results in both a lower amplitude and a slightly out-of-phase output signal. Since our adiabatic gates have a discrete period in which they sample their inputs, we can convert any phase delay into an equivalent amplitude reduction, as seen by the inputs of the next stage. Thus, the timing analysis problem is transformed in the adiabatic domain into a signal integrity checking problem. Our functional verification tool also analyzes all the internal adiabatic signals to determine if they meet our specification for signal integrity. The maximum clock frequency our design is capable of operating at is defined as the maximum frequency at which every signal meets our signal integrity check.

As the design progressed into layout, HSPICE simulations were run on the adiabatic subcircuits. The trace data from these simulations were post-processed by a custom tool which verified that each gate's input and output voltage waveforms corresponded to correct logical evaluation. The output of this tool was a list of failing gates along with layout coordinates, which enabled us to rapidly diagnose and correct failures resulting from subtle analog considerations. This tool also enabled us to apply several heuristic transistor sizing rules, based on extracted parasitic information from the layout.

#### VI. SIMULATION RESULTS

Although other researchers have designed multipliers to test various novel CMOS circuit families, (e.g., the serial-parallel locally-clocked dynamic-logic multiplier in [19] and the low-power multiplier with pulse-triggered flip-flops in [20]),



Fig. 12. Relative energy consumption per cycle versus operating frequencies for 8-bit multipliers with associated BIST logic.

any performance comparison of our design with published ones would necessarily involve extrapolating the published data to compensate for differences in process technology, bit-widths, multiplier architectures, target throughput, and experimental methodology. Thus, to perform a well-defined comparison, we implemented fully-static CMOS versions of our adiabatic multiplier, synthesized for low power using best-practice design automation tools. In this section, we present a simulations-based comparative evaluation of our adiabatic multiplier and pipelined counterparts of our design in static CMOS.

Three pipelined CMOS multipliers with latency 2, 4, and 8, respectively, were synthesized using a library of standard cells for the same 0.5  $\mu$ m process in which our multiplier was fabricated. Flip-flops were used as the state elements. Layouts were generated using the Epoch design automation tool with standard cells optimized for power dissipation. The static CMOS designs used 5,146, 6,518, and 9,926 transistors for the two-, four-, and eight-stage pipelines, respectively. Their area was approximately 0.20 mm<sup>2</sup>, 0.25 mm<sup>2</sup>, and 0.36 mm<sup>2</sup>, respectively.

Fig. 12 gives the relative energy consumption per cycle of the multipliers and associated BIST logic when operating at 50 MHz, 100 MHz, and 200 MHz. For each operating frequency, the energy dissipation of each multiplier was obtained using the lowest supply voltage that ensures its correct operation at that frequency. Minimum supply voltage values are given next to the corresponding data points. The simulations accounted for the dissipation of the gates and internal clock lines and assumed a lossless external clock generator.

The energy consumption of our adiabatic multiplier scales almost linearly with supply voltage and remains pretty much constant with respect to frequency. Based on our energy loss models, diode and evaluation tree dissipation dominate sense amplifier dissipation in the operating regime of our simulations. Even though our adiabatic multiplier was designed conservatively at the expense of some energy consumption or performance optimizations, our results show that it is more energy efficient than its pipelined static CMOS counterparts across the entire frequency range of our simulations. At 50 MHz, it is 1.4 to 1.8 times more efficient than the pipelined static CMOS designs. At 100 MHz, it is 2.1 to 3.2 times more efficient than the three static CMOS designs. At 200 MHz, our SCAL-D design is about four times more efficient than the eight-stage pipelined static CMOS. The two- and four-stage static CMOS multipliers do not function correctly at that frequency. In the pipelined static CMOS multipliers, flip-flops are used to reduce critical paths and decrease the required supply voltage. However, the resulting increase in the circuit's effective capacitance and clock distribution network limits energy savings. Thus, SCAL-D presents a promising approach to further reducing the dissipation of static CMOS designs that have reached their voltage scaling limits.

#### VII. EXPERIMENTAL RESULTS

In this section, we present the results of our experimental measurements from the fabricated chip. To increase the testability of our design, two identical 8-bit SCAL-D multipliers were placed on each chip. One was powered by an external signal generator, while the other was wired directly to the internal power-clock generator. We validated correct operation of each of the components of the chip independently. The multiplier was validated up to a frequency of 130 MHz using an external power-clock generator, limited by the slew rate of the pads and coupling between the pins. The power-clock generator was also validated to function correctly at 140 MHz.

The energy dissipation of the multiplier while driven by an external source of power-clock was measured for frequencies between 40–130 MHz. These measurements were within 20% of HSPICE simulations for the 40–100-Hz range. The combined multiplier and clock generator dissipation was measured at a frequency of 140 MHz, and was within 20% of the HSPICE simulation.



Fig. 13. Microphotograph of the chip.



Fig. 14. Simplified floorplan of the chip shown in Fig. 13

A die photo and simplified floorplan of our 8-bit SCAL-D multipliers and internal power-clock generator are shown in Figs. 13 and 14, respectively. The chip was fabricated

in a 0.5- $\mu$ m single-poly triple-metal n-well CMOS process through MOSIS. The chip includes the two 8-bit SCAL-D multipliers with BILBO, an internal power-clock generator,



Fig. 15. Experimental setup for testing and measuring the power dissipation of the 8-bit SCAL-D multiplier chip.

adiabatic-to-digital converters (denoted by A/D), and I/O pads. The entire chip area is  $4.83 \text{ mm}^2$  (=2.60 mm × 1.84 mm).

The A/D converters consist of cross-coupled inverters with two pull-down transistors connected to the adiabatic inputs. The cross-coupled inverters restore the signal to full swing. The A/D conversion circuitry was designed to enable testing using conventional digital signaling conventions. It is powered from the I/O power supply to enable full-swing drive of the I/O pads, regardless of any supply voltage differences between I/O and the multiplier under test. Thus, our power dissipation measurements do not include the A/D converter dissipation.

Fig. 15 shows our setup for test and measurement of the chip. On the right we have two dc power supplies and a Tektronix TDS754D 4-channel  $4 \times 10^9$  samples/s digitizing oscilloscope, configured with two 1-Hz high-impedance active probes, a 1-Hz differential active probe, and one passive probe. On the left we have a HP 8647A synthesized signal generator, and a digital multimeter. Not shown is a 2-RF amplifier and power supply used for boosting the amplitude from the signal generator. Our device under test is inserted into a socket on the test board seen in the center of the picture. We verified the functional correctness of our SCAL-D multiplier and measured its power consumption, including its BIST logic, for up to 50  $\mu$ s, the limit of the digital oscilloscope memory. All of the measurements were performed with the supply voltage fixed at 3 V, as dictated by the peripheral circuits, including adiabatic-to-digital converters and pad drivers.

The correct operation of the adiabatic multiplier and its BIST logic was verified up to 130 MHz, using an external power-clock generator. Figs. 16 and 17 show the measured waveforms of the 8-bit SCAL-D multiplier and its BIST logic in self-test mode at 50 MHz and 130 MHz, respectively. Channels Ch1, Ch2, Ch3, and Ch4 show the power-clock  $V_{\rm PC}$ , a BILBO control signal  $S_2$ , the output sequence ix of BILBO-1, and the output sequence ox of BILBO-2 are fed through the adiabatic-to-digital converters before being buffered and output to the pads. The logic was supplied power from an external sinusoidal  $V_{\rm PC}$  and a constant supply  $V_{\rm dd}$  of amplitude 3 V. When the operating frequency of the power-clock was 50 MHz, the pMOS and nMOS biasing voltages were  $V_{\rm BP} = 1.90$  V and  $V_{\rm BN} = 1.45$  V, respectively. When the operating frequency of the power-clock



Fig. 16. Measured waveforms of 8-bit SCAL-D multiplier with associated BIST logic in self-test mode at 50 MHz.

was 130 MHz, the pMOS and nMOS biasing voltages were  $V_{\text{BP}} = 1.12$  V and  $V_{\text{BN}} = 2.70$  V, respectively. The measured waveforms matched the behavior observed in HSPICE simulations, which is shown in Fig. 18.

The internal power-clock generator was verified at a frequency of 140 MHz. Fig. 19 shows the waveforms of the power-clock generator operating correctly at 140 MHz using a surface mount inductor. The large sinusoid is the power-clock signal with an amplitude of 3 V. The two other waveforms are buffered signals used to observe the internal operation of the power-clock generator.

Fig. 20 shows a schematic diagram of the experimental setup we used for measuring the power dissipation of our chip at frequencies below the operating region of the internal powerclock generator. These measurements were made using an external source of power-clock. Measurements were taken on a TDS754D four-channel Tektronix digitizing scope, using 1-Hz Tektronix active probes for voltage and a 1-Hz differential probe for current measurements. The sampling rate was 1 GHz. Currents were inferred by measuring the voltage drop across a small resistor. We sampled and stored the following signals: dc currents  $I_{VDD}$ ,  $I_{PC}$ ,  $I_{BN}$ , and  $I_{BP}$ , the AC current  $i_{PC}$ , the dc voltages  $V_{BN}$ , and  $V_{BP}$ , and the AC voltage without dc bias  $v_{PC}$ . We then calculated the energy consumption per cycle,  $E_{cycle}$ , after transferring the sampled data to Matlab, using the equation

$$E_{\text{cycle}} = \int_{0}^{N \cdot T} \begin{pmatrix} (I_{\text{VDD}} \cdot V_{\text{dd}}/2 \\ + (I_{\text{VDD}} + I_{\text{Pc}}) \cdot V_{\text{dd}}/2) \\ + (v_{\text{Pc}} \cdot i_{\text{Pc}}) \\ + (V_{\text{BP}} \cdot I_{\text{BP}}) + (V_{\text{BN}} \cdot I_{\text{BN}}) \end{pmatrix} dt/N$$

where N is the number of measured cycles, and T is the cycle time of the power-clock.

We obtained an energy measurement for the multiplier and the internal power-clock generator at 140 MHz, by observing the dc supply current while the system was oscillating in a steady state. Fig. 21 shows a schematic diagram of the experimental setup we used for measuring the power dissipation of our multiplier and integrated clock generator. As capacitors



Fig. 17. Measured waveforms of 8-bit SCAL-D multiplier with associated BIST logic in self-test mode at 130 MHz.



Fig. 18. Reference pattern for self-testing of multiplier from HSPICE simulation.

C1 and C2 shunt the AC power-clock currents, the energy measurement is particularly simple, namely

$$E_{\texttt{cycle}} = T \cdot (I_{\texttt{R1}} \cdot V_{\texttt{dd}} + I_{\texttt{R2}} \cdot V_{\texttt{dd}}/2 + V_{\texttt{BP}} \cdot I_{\texttt{BP}} + V_{\texttt{BN}} \cdot I_{\texttt{BN}})$$

where T is the cycle time of the power-clock.

Fig. 22 gives the energy consumption of the 8-bit SCAL multiplier and associated BIST logic for various pMOS and nMOS biasing voltages. Measurements in the 40–130 MHz operating frequency range were obtained using an external source of power-clock, with both the amplitude of the sinusoidal power-clock  $V_{\rm PC}$  and the constant supply voltage  $V_{\rm dd}$  set to 3 V.



Fig. 19. Measured waveforms of internal power-clock generator at 140 MHz.



Fig. 20. Experimental setup for the dissipation measurements of multiplier only.



Fig. 21. Experimental setup for the dissipation measurements of multiplier and internal power-clock generator.



Fig. 22. Measured energy consumption per cycle for various pMOS/nMOS biasing voltages.

At 140 MHz, the measurement was obtained for the combined clock generator and multiplier system also at 3 V.

## VIII. CONCLUSION

Fig. 23 shows the maximum relative difference between our measurements and simulation results under the same operating frequencies and voltages. The externally driven power-clock measurements above 100 MHz suffered from relative errors exceeding 20%, since the quantization and phase errors inherent in integrating the product of sampled AC voltage and currents become dominant. At the direct dc measurements with the integrated clock generator at 140 MHz, however, none of these errors exist.

We presented the design and experimental evaluation of an 8-bit adiabatic multiplier with an internal single-phase sinusoidal power-clock generator. To provide design-for-test capability, our design included built-in self-test circuitry based on built-in logic block observation. Both the multiplier and the self-test circuitry were designed in SCAL-D, an adiabatic logic family that operates with a single-phase sinusoidal power-clock.

The adiabatic circuitry we used for our design does not rely on any particular process features. While our chip was not



Fig. 23. Maximum relative difference between measurements and simulations at equivalent operating conditions.

designed in the latest process technology, our charge recovering circuitry is expected to become increasingly energy efficient in the nanometer process regime, as relative wire load increases and the increased energy stored in the interconnect capacitance is recovered and recycled. Furthermore, as current advanced process technologies allow multiple threshold implants and triple-well structures, the performance of the sense amplifiers used in SCAL-D will increase and the loss presented by the diode-connected FETs will decrease.

Our chip was fabricated in a 0.5- $\mu$ m standard n-well CMOS process. The correct operation of the multiplier core and the internal power-clock generator were independently validated in the lab. Energy measurements were collected at several operating points. Our testing was restricted by the limitations of the peripheral circuits such as adiabatic-to-digital converters and pad drivers. These difficulties effectively limited our experimental characterization to 3 V.

In a simulations-based comparison with a family of pipelined static CMOS multipliers of the same overall architecture, our adiabatic multiplier was up to four times more energy efficient at 200 MHz. Despite the limitations of our simple experimental setting, we were able to validate our HSPICE simulation results for a range of operating points. Our results suggest that for throughput-intensive applications, the adiabatic family SCAL-D presents a viable and attractive alternative option for low-energy, high-speed VLSI design.

#### REFERENCES

- W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and Y. Chou, "Low-power digital systems based on adiabatic-switching principles," *IEEE Trans. VLSI Syst.*, vol. 2, pp. 398–406, Dec. 1994.
- [2] J. S. Denker, "A review of adiabatic computing," in *Proc. 1994 Symp. Low-Power Electronics/Dig. of Tech. Papers*, Napa, CA, Apr. 1994, pp. 94–97.
- [3] S. G. Younis and T. F. Knight, "Practical implementation of charge recovering asymptotically zero power CMOS," in *Proc. 1993 Symp. Inte*grated Syst., Seattle, WA, Mar. 1993, pp. 234–250.

- [4] A. Kramer, J. S. Denker, B. Flower, and J. Moroney, "2nd order adiabatic computation with 2N-2P and 2N-2N2P logic circuits," in *Proc. 1995 Int. Symp. on Low-Power Design*, Dana Point, CA., Aug. 1995, pp. 191–196.
- [5] Y. Moon and D. Jeong, "An efficient charge recovery logic circuit," *IEEE J. Solid-State Circuits*, vol. 31, pp. 514–522, Apr. 1996.
- [6] V. G. Oklobdzija and D. Maksimovic, "Pass-transistor adiabatic logic using single power-clock supply," *IEEE Trans. Circuits Syst. II*, vol. 44, pp. 842–846, Oct. 1997.
- [7] W. C. Athas, N. Tzartzanis, L. Svensson, L. Peterson, H. Li, X. Jiang, P. Wang, and W.-C. Liu, "AC-1: A clock-powered microprocessor," in *Proc. Int. Symp. Low-Power Electronics and Design*, Monterey, CA, Aug. 1997, pp. 328–333.
- [8] D. Maksimovic, V. G. Oklobdzija, B. Nikolic, and K. W. Current, "Clocked CMOS adiabatic logic with integrated single-phase power-clock supply," *IEEE Trans. VLSI Syst.*, vol. 8, pp. 460–463, Aug. 2000.
- [9] Y. Yibin and K. Roy, "QSERL: Quasi-tatic energy recovery logic," *IEEE J. Solid-State Circuits*, vol. 36, pp. 239–248, Feb. 2001.
- [10] P. Wayner, "Silicon in reverse," BYTE, vol. 19, pp. 67-71, Aug. 1994.
- [11] S. Kim and M. C. Papaefthymiou, "True single-phase energy-recovering logic for low-power, high-speed VLSI," in *Proc. Int. Symp. Low-Power Electron. Design*, Monterey, CA, Aug. 1998, pp. 167–172.
- [12] —, "Single-phase source-coupled adiabatic logic," in *Proc. Int. Symp. Low-Power Electron. Design*, San Diego, CA, Aug. 1999, pp. 97–99.
- [13] —, "True single-phase adiabatic circuitry," *IEEE Trans. VLSI Syst.*, vol. 9, pp. 52–63, Feb. 2001.
- [14] S. Kim, C. H. Ziesler, and M. C. Papaefthymiou, "Design, verification, and test of a true single-phase 8-bit adiabatic multiplier," in *Proc.* 19th Conf. Advanced Research VLSI, Salt Lake City, UT, Mar. 2001, pp. 42–58.
- [15] —, "A true single-phase adiabatic multiplier," in *Proc. 38th Design Automation Conf.*, Las Vegas, NV, June 2001, pp. 758–763.
- [16] S. G. Younis, "Asymptotically zero energy computing using split-level charge recovery logic," Ph.d. dissertation, MIT, Cambridge, MA, 1994.
- [17] D. Mateo and A. Rubio, "Design and implementation of a 5\*5 trits multiplier in a quasiadiabatic ternary CMOS logic," *IEEE J. Solid-State Circuits*, vol. 33, pp. 1111–1116, July 1998.
- [18] D. Suvakovic and C. Salama, "Two phase nonoverlapping clock adiabatic differential cascode voltage switch logic (adcvsl)," in *Proc.* 2000 Int. Solid-State Circuits Conf., San Francisco, CA, Feb. 2000, pp. 364–365.
- [19] G. N. Hoyer and C. Sechen, "Locally-clocked dynamic logic serial/parallel multiplier," in *Proc. Custom Integrated Circuits Conf.*, Orlando, FL, May 2000, pp. 481–484.
- [20] J. Wang, P. Yang, and D. Sheng, "Design of a 3-V 300-MHz low-power 8-b multiplied by 8-b pipelined multiplier using pulse-triggered TSPC flip-flops," *IEEE J. Solid-State Circuits*, vol. 35, pp. 583–592, Apr. 2000.

- [21] G. Ma and F. J. Taylor, "Multiplier policies for digital signal processing," IEEE ASAP Mag., vol. 7, pp. 6-19, Jan. 1990.
- [22] C. F. Law, S. S. Rofail, and K. S. Yeo, "A low-power 16 × 16-b parallel multiplier utilizing pass-transistor logic," IEEE J. Solid-State Circuits, vol. 34, pp. 1395-1399, Oct. 1999.
- [23] I. S. Abu-Khater, A. Bellaouar, and M. I. Elmasry, "Circuit techniques for CMOS low-power high-performance multipliers," IEEE J. Solid-State Circuits, vol. 31, pp. 1535-1546, Oct. 1996.
- [24] G. N. Hoyer, G. Yee, and C. Sechen, "Locally-clocked dynamic logic," in Proc. IEEE Midwest Symp. Circuits and Systems, Notre Dame, IN, Aug. 1998, pp. 10-12.
- [25] J. M. Rabaey, Digital Integrated Circuits: A Design Perspective. Englewood Cliffs, NJ: Prentice-Hall, 1996.
- [26] B. Koenemann, J. Mucha, and G. Zwiehoff, "Built-in logic block observation techniques," in Proc. 1979 Test Conf., Washington, DC, Sept. 1979, pp. 37-41.
- [27] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital System *Testing and Testable Design.* Piscataway, NJ: IEEE Press, 1990. [28] N. O. Sokal and A. D. Sokal, "Class E—A new class of high-efficiency
- tuned single-ended switching power amplifiers," IEEE J. Solid-State Circuits, vol. SC-10, pp. 168-176, June 1975.
- [29] C. H. Ziesler, S. Kim, and M. C. Papaefthymiou, "A power-clock generator for true single-phase adiabatic logic," in Proc. Int. Symp. Low-Power Electron. Design, Huntington Beach, CA, Aug. 2001, pp. 159-164.



Suhwan Kim (S'97-M'01) received the B.S. and M.S. degrees in electrical engineering and computer science from Korea University, Seoul, Korea, in 1990 and 1992, respectively, and the Ph.D. degree in electrical engineering and computer science from the University of Michigan, Ann Arbor, in 2001.

From 1993 to 1997, he was with LG Electronics, Seoul, Korea, where he designed several multimedia systems-on-a-chip (SOC), including an MPEG2, CODEC for audio, video systems. In 2001, he

Alers and here and here and here and here and

joined IBM T. J. Watson Research Center, Yorktown Heights, NY, where he is currently a Research Staff Member. His research

interests include high-performance and low-power circuits and technology and low-power design methodologies for high-performance VLSI signal processing.

Dr. Kim received the 1994 Best Student Paper Award of the IEEE Korea Section, and the First Prize in the VLSI Design Contest of the 2001 ACM/IEEE Design Automation Conference. He has participated on the organizating committee and the technical program committee of the IEEE International ASIC/SOC Conference and on the technical committee of the International Symposium on Lowpower Electronics and Designs.



Conrad H. Ziesler (S'99) received the B.S. degree in electrical engineering from the California Institute of Technology, Pasadena, in 1999 and the M.S. degree in electrical engineering and computer science from the University of Michigan, Ann Arbor, in 2002. He is currently pursuing the Ph.D. degree at the University of Michigan, Ann Arbor.

His research interests include energy recovering circuits for low-energy and high-performance VLSI systems, as well as parallel architectures and algorithms for scientific computing.

Mr. Ziesler received the First Prize in the VLSI Design Contest of the 32nd ACM/IEEE Design Automation Conference and a NSF Graduate Research Honorable Mention.



Marios C. Papaefthymiou (M'93-SM'02) received the B.S. degree in electrical engineering from the California Institute of Technology in 1988 and the S.M. and Ph.D. degrees in electrical Eengineering and computer science from the Massachusetts Institute of Technology, Cambridge, in 1990 and 1993, respectively.

After a three-year term as an Assistant Professor at Yale University, he joined the University of Michigan, where he is currently an Associate Professor of electrical engineering and computer

science and Director of the Advanced Computer Architecture Laboratory. His research interests include algorithmic, architectural, and circuit issues in the design of very large scale integration systems with a primary focus on energy and timing optimization. He is also active in the field of parallel and distributed computing.

Dr. Papaefthymiou received an ARO Young Investigator Award, an NSF CA-REER Award, and several IBM Partnership Awards. With his students, he has received a Best Paper Award in the 32nd ACM/IEEE Design Automation Conference and the First Prize (Operational Category) in the VLSI Design Contest of the 38th ACM/IEEE Design Automation Conference. He is an associate editor for the IEEE TRANSACTIONS ON THE COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS, IEEE TRANSACTIONS ON COMPUTERS, and IEEE TRANSACTIONS ON VLSI SYSTEMS. He has served as the General Chair and as the Technical Program Chair for the ACM/IEEE International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems. He has also participated several times in the Technical Program Committee of the IEEE/ACM International Conference on Computer-Aided Design.