# A 1.0–4.0-Gb/s All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Proportional Gain Control

Heesoo Song, Student Member, IEEE, Deok-Soo Kim, Student Member, IEEE, Do-Hwan Oh, Suhwan Kim, Senior Member, IEEE, and Deog-Kyoon Jeong, Senior Member, IEEE

Abstract-This paper describes the design and implementation of an all-digital clock and data recovery circuit (ADCDR) for multigigabit/s operation. The proposed digitally-controlled oscillator (DCO) incorporating a supply-controlled ring oscillator with a digitally-controlled resistor (DCR) generates wide-frequency-range multiphase clocks with fine resolution. With an adaptive proportional gain controller (APGC) which continuously adjusts a proportional gain, the proposed ADCDR recovers data with a low-jitter clock and tracks large input jitter rapidly, resulting in enhanced jitter performance. A digital frequency-acquisition loop with a proportional control greatly reduces acquisition time. Fabricated in a 0.13-µm CMOS process with a 1.2-V supply, the ADCDR occupies 0.074 mm<sup>2</sup> and operates from 1.0 Gb/s to 4.0 Gb/s with a bit error rate of less than  $10^{-14}$ . At a 3.0-Gb/s  $2^{31} - 1$  PRBS, the measured jitter in the recovered clock is 3.59  $ps_{rms}$  and 29.4  $ps_{pp}$ , and the power consumption is 11.4 mW.

*Index Terms*—All-digital clock and data recovery, clock and data recovery, digitally-controlled oscillator, digitally-controlled resistor, high-speed interface.

# I. INTRODUCTION

S CMOS technology continues to advance, the computing capabilities of integrated circuits are expanding dramatically. This brings demand for high interchip bandwidths, leading to the development of high-speed interface circuits [1]-[6]. Nowadays, it is becoming more common for serial links to be compatible with deep-submicron CMOS technologies because multiple serial links are integrated into large digital systems as an IO subblock. A clock and data recovery circuit (CDR) is a critical building block in a serial link. The conventional CDR is based on a charge-pump phase-locked loop (CPPLL), as shown in Fig. 1, since this design has proven good jitter performance and low power consumption. However, as CMOS processes scale down into deep-submicron regimes, the design of analog CPPLLs encounters many challenges [7], [8]. First, the performance of most of analog building blocks is degraded because of low voltage headroom, low output impedances of transistors, and large process variations. Second, it

The authors are with the Department of Electrical Engineering and Computer Science, Seoul National University, Gwanak-Gu, Seoul 151-742, Korea (e-mail: dkjeong@snu.ac.kr).

Digital Object Identifier 10.1109/JSSC.2010.2082272

occupies a larger area in spite of exploiting scaled technologies, since bulky metal capacitors must be used for a loop filter, instead of MOS capacitors, which are area-efficient but leaky. Third, it has difficulty in achieving good jitter performance with a noise-susceptible analog control which suffers from the increased gain of the voltage-controlled oscillator (VCO) induced by reduced supply voltages and large noise generated from adjacent digital circuits.

Recently, several all-digital phase-locked loops (ADPLLs) have been reported [8]–[14] which overcome the problems described above. They are able to operate at a low supply voltage and make the most of a scaled technology since all their building blocks are made up of synthesizable digital circuits, except perhaps a digitally-controlled oscillator (DCO). A digital loop filter (DLF) occupies a much smaller area than its analog counterpart, while guaranteeing constant transfer characteristics which are immune to process, voltage, and temperature (PVT) variations. In addition, since phase and frequency information is processed in the digital domain, it is noise tolerant and can be processed with versatile digital processing. Finally, ADPLLs have better portability and programmability since all their building blocks, excluding a DCO, are digital circuits which can be easily synthesized from hardware-description languages.

In this paper, we present a multigigabit/s all-digital clock and data recovery circuit (ADCDR) which has excellent jitter characteristics. Its building block is a DCO which mitigates the limits on resolution imposed by the quantization effect. This DCO is separately controlled by proportional and integral paths, and provides wide-range frequencies with fine resolution. In the integral path, the DCO varies the period of its output clocks in 1.0-ps steps, as determined by an 11-bit control word, and a delta-sigma modulator (DSM) further improves the resolution by dithering. In the proportional path, the period is quickly changed by a small fraction. An architectural issue with the use of bang-bang control in phase detection is that the small proportional step needed to achieve small bang-bang jitter reduces the tracking bandwidth [15], [16]. We decouple this trade-off between bang-bang jitter and tracking bandwidth by the use of an adaptive proportional gain controller (APGC) that continuously adjusts the proportional gain to cope with a slewing situation. As a consequence, this ADCDR is able to recover data with a low-jitter clock while rapidly tracking large input jitter.

The rest of the paper is organized as follows. In Section II, we introduce our ADCDR architecture and present general considerations relating to good jitter characteristics. In Section III, we

Manuscript received April 08, 2010; revised August 24, 2010; accepted September 20, 2010. Date of publication October 28, 2010; date of current version January 28, 2011. This paper was approved by Associate Editor Jafar Savoj.



Fig. 1. Conventional PLL-based CDR.



Fig. 2. Block diagram of the proposed ADCDR.

describe the circuit that implements each building block. Experimental results are presented in Section IV, followed by conclusions in Section V.

# II. ARCHITECTURE

## A. ADCDR Architecture

Fig. 2 illustrates the overall block diagram of the proposed ADCDR. It adopts a half-rate architecture as it relaxes the required clock frequency for a given bit rate. A half-rate bangbang phase detector (BB PD) detects the polarity of the phase difference between the recovered clock and the incoming serial data stream. The resulting error signals, up and dnb, are delivered to a DCO along proportional and integral paths. In the proportional path, the error signals are directly forwarded to the DCO so that the phase error is corrected promptly. The error signals are also deserialized and then delivered to a DLF with an APGC which operates with the recovered and divided-by-10 clock. The DLF accumulates the phase errors to track the frequency, and the resulting integral word is delivered to the DCO and determines the frequency of the recovered clock. The APGC sets the proportional gain to match the distribution of the phase errors. The DCO generates half-rate 4-phase clocks which are fed back to the half-rate BB PD for phase comparison.

The narrow pull-in range associated with employing only a PD always requires a separate means for frequency acquisition. A finite-state machine (FSM) accomplishes this task when requested by an external signal. A counter-based logic compares the frequencies of the DCO output and the reference clock, and then adjusts the integral word going into the DCO to take the frequency of the DCO output close to the half bit rate of the incoming data stream.

Most of the digital building blocks, including the DLF with the APGC, the frequency acquisition unit, and the decoders in the DCO are synthesized. Although the half-rate BB PD and the 2-to-10 deserializers are also digital circuits, they are not synthesized but custom designed, so that high-frequency multiphase clocks are routed in a way that minimizes skew and noise.

#### B. Jitter Characteristics

The two important performance criteria for a CDR are jitter generation and jitter tolerance. Jitter is created by a CDR even if its input is jitter-free; and jitter tolerance specifies how much input jitter a CDR can tolerate without an increase in the bit error rate. Although the linear characteristics of linear PDs are desired with low jitter generation and high jitter tolerance, the conversion of a time difference to a fine-resolution digital representation requires complicated circuits with a large area and much power consumption. Therefore, the proposed ADCDR uses a binary PD which only detects the polarity of the phase error. This form of PD has many advantages, such as simple implementation, inherently optimal sampling phase alignment, and adaptability to multiphase sampling structures [17]. However, the bang-bang characteristics of a binary PD are highly non-linear and present a serious design challenge in achieving an acceptable trade-off between jitter generation and tracking bandwidth [15]-[20].

The amount of jitter generated by a CDR with a BB PD is proportional to the proportional step and loop latency. As described in [18], when a CDR with a BB PD responds to a jitter-free data stream with a transition density of 100%, the approximate resulting jitter normalized to one UI can be expressed as follows:

$$jitter_{peak-to-peak} \approx 2\left(\frac{T_{LATENCY}}{T_0} + 1\right) \times \frac{\Delta T_{PROP}}{T_0} \quad (in UIs) \quad (1)$$

where  $T_0$ ,  $T_{\text{LATENCY}}$ , and  $\Delta T_{\text{PROP}}$  represent the bit period, the latency of the proportional path (i.e., the time it takes for the proportional step change to take effects at the output of the DCO upon the detection of the phase error), and the proportional step which denotes the bit-period change per unit PD error, respectively. The (1) assumes that the integral step, which denotes the bit-period change per one LSB change in the integral word, is much smaller than the proportional step. In the locked state, this type of CDR exhibits deterministic jitter which is caused by the alternating patterns of UP and DN pulses. The jitter is inevitably larger than one proportional step, since the phase error is applied after a finite latency, typically of several hundreds of picoseconds, made up of the clock-to-Q delays of samplers, delays in combinational logic and propagating buffers, and delay elements.

The scope for reducing the proportional step is limited. First, it must be much larger than the integral step to ensure stable operation. The quantizing effect of the DCO causes the phase error to drift each cycle, although the extent of this drift is less than half of the integral step. The proportional step must therefore be sufficiently large to reduce the large phase error which is accumulated while the input has no transition; this is because PD is only able to detect the phase error only when the input has a transition. The second difficulty with a small proportional step is that it reduces the tracking bandwidth, which degrades



Fig. 3. Jitter tolerance simulation of the ADCDR with a BB PD, at 3.0 Gb/s.

the jitter tolerance. We explore this effect by simulating a behavioral model of the proposed ADCDR architecture. A sinusoidal-jitter-injected 3.0-Gb/s  $2^{10} - 1$  PRBS pattern is used as an input. The resolution of the DCO with the integral path, i.e., the integral step, is set to 1.0 ps, in other words, the period varies by 1.0 ps with one LSB change in the integral word. The integral coefficient is set to  $2^7$ , in other words, the integral word is changed after  $2^7$  phase errors have been accumulated in the same direction. The simulation results shown in Fig. 3 demonstrate that the jitter tolerance corner frequency, or tracking bandwidth  $f_1$ , is proportional to the proportional step, as expressed by the following equation [19]:

$$f_1 = \alpha \frac{K_{\text{VCO}} I_P R_P}{2} = \alpha \frac{\Delta f_P}{2} = \alpha \frac{\Delta T_P f_0^2}{2} = \alpha \left(\frac{\Delta T_P}{T_0}\right) \frac{f_0}{2}.$$
(2)

The variables are the design parameters of the ADCDR,  $f_0$ ,  $T_0$ ,  $\alpha$ ,  $\Delta T_P$ ,  $\Delta T_I$ , and  $C_I$  which are, respectively, the nominal bit rate, nominal period, transition density of input, proportional step, integral step, and integral coefficient. The tracking bandwidth of a CDR with a proportional step of 1% of the one UI is close to 7.5 MHz at 3.0 Gb/s although it could be slightly varied due to random jitter and inter-symbol-interference (ISI). The occurrence of nonlinear slewing makes the maximum tolerable amplitude rise at a rate of 40 dB/dec below a frequency  $f_2$ , which can be expressed by the following equation [19]:

$$f_2 = \alpha \frac{0.315}{R_P C_P} = 0.315 \alpha \frac{\Delta T_I}{C_I \Delta T_P} f_0.$$
 (3)

The simulation results show that  $f_2$  is 1.1 MHz for a CDR with a proportional step of 1% of one UI. It can also be observed that too small a proportional step, such as 0.25% of one UI, violates an initial assumption that nonlinear slewing occurs at lower frequencies than at frequencies with linear slewing, i.e.,  $f_1 > f_2$ , and causes peaking at midrange jitter frequencies.

All this highlights the importance of selecting an appropriate proportional step. Although a BB PD with a high oversampling ratio shows better jitter characteristics since it provides the coarsely quantized magnitude of the phase error as well as the polarity of the phase error to react the input jitter adequately [20], the requirement for multiphase clocks increases the power



Fig. 4. Bang-bang phase detector and deserializers.

consumption in proportional to the oversampling ratio. Thus, our ADCDR, with its simple binary PD, controls the proportional step adaptively so as to prevent slewing with minimal overhead in circuitry. When the input phase error is so large that slewing occurs, the proportional step is raised so that the phase error is tracked rapidly. When the input phase error is small and alternates frequently, the proportional step is reduced to counteract the bang-bang jitter.

#### **III. BUILDING BLOCKS**

#### A. Phase Detector and Deserializers

Fig. 4 shows the half-rate BB PD and the deserializers. The half-rate bang-bang phase errors are generated by XOR operations on the data sampled by the multiphase clocks. They are serialized, and the resulting full-rate error signals are forwarded directly to the DCO. Compared with the scheme of transferring half-rate error signals, this design reduces the number of varactor arrays in the DCO by half, which gives a higher bit rate due to the reduction in intrinsic delay. The phase errors are also demultiplexed and synchronized with the divide-by-5 clock of the recovered half-rate clock, and finally delivered to the DLF.

# B. Digitally Controlled Oscillator

Fig. 5 shows the proposed DCO, which consists of a digitally-controlled resistor (DCR), delay elements with varactor loads, and level converters. A pseudo-differential inverter-based ring oscillator is used as the core oscillator. It has a wider tuning range and lower cost than *LC* oscillators [21]. Although a 2-stage ring oscillator is sufficient to generate 4-phase clocks, we used a 4-stage oscillator to leave room for future bit-rate expandability. The ring oscillator generates multiphase clocks whose frequency is controlled by the integral word, which alters the resistance of the DCR so as to vary the supply voltage of the oscillator. This supply-control scheme with the linear DCR achieves a wide tuning range and good linearity without the need for any precise analog circuits. The phase errors coming directly from the PD and the proportional gain from the APGC are fed to the varactors to change the load capacitance. These varactors consist of nMOS transistors with sources and drains tied together. Level converters restore the ground-referenced small-swing clocks of the supply-controlled ring oscillator to full-swing signals and forward them to other units. A bypass capacitor C<sub>BYPASS</sub> filters out high-frequency noise, which is incurred by the current spike of other digital blocks, the varying current drawn by the ring oscillator, and dithering by a DSM. A large capacitor lowers the cut-off frequency to reject the noise further, but occupies a large area. A 10-pF capacitor is implemented with an array of MOS transistors under this trade-off. Since the cut-off frequency tracks the operating frequency due to the variable resistance of the DCR, the cut-off frequency is maintained at lower than one-tenth of the operating frequency over all operating conditions.

Fig. 6(a) shows the linear DCR based on [13]. It is composed of a row decoder, a column decoder, 32 row cells, and a series variable resistor. The resistances of these row cells are changed by the control codes generated by the row and column decoder using a glitch-reduction decoding. Fig. 6(b) shows the resistor network that is equivalent to the DCR when the 11-bit integral word is expressed as follows:

$$N * 2^5 + M + F * 2^{-1}(M, N = 0 \sim 31, F = 0 \sim 1).$$
 (4)

Here, N, M, and F are the inputs of the row decoder, column decoder, and series variable resistor, respectively. The DCR operates so that all the shunting pMOS transistors in the first Nrows and M shunting pMOS transistors in the (N + 1)-th row are turned on. The effective resistance contributed by the first Nrows and the gate-grounded pMOS transistor  $M_S$  in the (N +



Fig. 5. Digitally-controlled oscillator.

1)-th row is modeled as  $R_{\text{TOP}}$ . The effective resistance formed by a sequence of M parallel-shunted pMOS transistors in the next row cell is modeled as  $R_{\text{VARI}}$ . All the remaining row cells can be modeled by the series resistor  $R_S$  which is formed by the gate-grounded pMOS transistor  $M_S$  only, since all the shunting transistors in each row cells are turned off. Thus, the total equivalent resistance  $R_{\text{EQUIV}}$  can be expressed as follows:

$$R_{\rm EQUIV} = R_{\rm TOP} / / R_{\rm VARI} + (31 - N)R_S + R_F \quad (5)$$

where  $R_{\text{EQUIV}}$ ,  $R_{\text{TOP}}$ ,  $R_{\text{VARI}}$ ,  $R_S$ , and  $R_F$  are indicated in Fig. 6. It is noteworthy that the value of  $R_{\text{TOP}}$  is equal to that of  $R_{S0}$ , which is the series resistor of the first row cell, regardless of N provided that the value of  $R_{S0}$  meets the following condition:

$$R_{S0}//R_{P0}//R_{P1}\cdots//R_{P31}+R_S=R_{S0}.$$
 (6)

The (5) expresses the monotonic characteristics of the DCR, and its overall linearity depends on the linearity of the resistance of  $R_{\text{TOP}}$  and  $R_{\text{VARI}}$  in parallel. The pMOS transistors  $M_{\text{P0}}, M_{\text{P1}}, \dots, M_{\text{P31}}$ , that perform the shunting are individually sized to achieve this linearity.

In the previous DCO design [13] which only has row cells, it is necessary to use more and wider pMOS transistors in order to achieve the same tuning range with finer resolution. Instead, a variable resistor  $R_F$  implemented with pMOS transistors,  $M_{SE}$ and  $M_F$ , is now inserted in series. This arrangement has allowed us to double the resolution with the minimum overhead.

Simulation results show that the DCR-based DCO has a tuning range of 0.5 GHz to 2.2 GHz thanks to the DCR ranging from 300  $\Omega$  to 3.4 k $\Omega$  in normal operations. The proportional step normalized to one UI ranges from 0.4% to 1.1% depending on the 4-step proportional gain at 3.0 Gb/s.

# C. Digital Loop Filter With Adaptive Proportional Gain Controller

The DCO control of our design is separated into two paths: the proportional path, which demands low latency, and the integral path, which needs accurate processing. The DLF takes care of the integral path. An 18-bit accumulator operating with a recovered and divided-by-10 clock accumulates the deserialized phase errors, des\_up $\langle 0: 9 \rangle$  and des\_dnb $\langle 0: 9 \rangle$ , received from the PD. The output of the accumulator is dithered by a first-order DSM and then the 11 MSBs of the result becomes the integral word delivered to the DCO. Dithering effectively reduces the integral step. Since this block is entirely implemented with digital circuits, it has no leakage and easily achieves a low integral gain with a small area, in contrast to the analog *RC* filters used in CPPLLs.

The APGC adjusts the proportional gain based on the distribution of UP's and DN's within the 10 deserialized phase errors. The proportional gain is set by the two MSBs of a 9-bit long index, which is adjusted by the following rules. When the deserialized 10 phase errors are consistent, that is, either all UP's or all DN's excluding the neutral ones, the index is increased by twice the number of UP's or DN's. For example, if the 10 phase errors have 6 UP's and 0 DN's, the index is increased by 12. On the other hand, when the 10 phase errors have a mixture of UP's and DN's, the index is decreased by 1. In this way, when the phase error is large so that the phase error generates consecutive UP's or DN's, the above-mentioned algorithm increases the proportional gain so that the CDR can track the jitter faster. On the other hand, when the CDR reaches close to the lock condition, the phase error alternates between UP and DN, the algorithm reduces the proportional gain so that the bang-bang jitter is minimized.



Fig. 6. Digitally-controlled resistor: (a) schematic diagram and (b) equivalent resistor model.

The described APGC updates the proportional gain at an average rate of one-hundredth of the operating frequency, taking account of the fact that the 9-bit index is updated for every 10 bits and the proportional gain corresponds to its two MSBs. This rather slow update rate is to minimize the possible interference with random jitter, which was seen in prior APGC methods [15], [16]. It is to note that the proposed APGC is not aimed to track high-frequency jitter, but rather the low-frequency jitter in order to meet the low-frequency jitter-tolerance specification.

# D. Frequency Acquisition

Frequency acquisition is performed by a FSM with a digital frequency comparator operating with a reference clock, as shown in Fig. 7. When an external signal indicates that the frequency is not locked, a counter-based logic compares the frequencies of the DCO output and the reference clock. The comparison is made by counting the number of rising reference clock edges during N DCO output clocks and subtracting this



Fig. 7. Frequency acquisition block.



Fig. 8. Layout of the ADCDR.

value from N. The result is approximately equal to the product of the frequency error and the reference number N, as follows:

$$N\left(\frac{f_{\text{REF}}}{f_{\text{DCO}}} - 1\right) + a \quad (-2 \le a \le 2) \tag{7}$$

where a expresses the uncertainty caused by the asynchronous nature of the clock domains. N is set to 512 in our design.

The integral word is changed by an amount proportional to the measured frequency error and then delivered to the DCO to change the frequency. Then, the FSM measures the frequency error after N DCO clock periods, accounting for the response delay of the oscillator supply voltage to the resistance change. Once the measured frequency error has declined below a predetermined value, the integral word is then controlled by the DLF of the main phase-locking loop. This proportional frequencylocking procedure guarantees that the locking is achieved within 30 comparison times.

## **IV. MEASUREMENT RESULTS**

A prototype chip was fabricated in a  $0.13-\mu m$  1P8M CMOS process with a 1.2-V supply and mounted in a 128-pin thin-quad-flat package (TQFP). The chip contains four AD-CDRs which convert four incoming high-speed differential data streams into 40 low-speed data outputs, a first-in-first-out (FIFO) buffer which synchronizes the data outputs using one of four recovered and divided-by-10 clocks, and an I<sup>2</sup>C interface which configures the operating mode and several programmable parameters. The ADCDR occupies an area of 210  $\mu m \times 350 \mu m$ , as shown in Fig. 8.

Fig. 9 shows the period characteristics of the DCO. These were measured in a test mode in which the feedback loop is broken and the 10 MSBs of the 11-bit integral word is directly configured and swept by an external control with an I<sup>2</sup>C interface. The DCO has a tuning range of 0.4 GHz to 2.1 GHz with a 1.0-ps resolution for a 1.2-V supply. Since the series resistance  $R_{S0}$  of the first row cell in the DCR is not larger than the series resistance  $R_S$  of the other row cells in the prototype chip, the differential-nonlinearity (DNL) is worse for lower values of the integral word than for higher values. However,  $R_{\text{TOP}}$  converges to  $R_{S0}$  and the DNL approaches zero as the integral word reaches several tens. The abrupt steps in the DNL curve are repeated with a period of 32, due to the nonlinearity of  $R_{VARI}$ . The main source of jitter is supply noise at the oscillator, because other building blocks are composed of digital circuits. As shown in Fig. 10, the supply sensitivity of the DCO, defined by a percentage change in the oscillation frequency to a percentage change in the supply voltage, ranges from 1.1 to 1.6 with static (DC) supply variation. Since the ring-oscillator-based DCO has poor immunity from supply noise, the prototype chip is assigned a separate pad to provide a clean supply for the DCO.

The ADCDR operates between 1.0 Gb/s and 4.0 Gb/s with a bit error rate of less than  $10^{-14}$ . We observed only one situation in which locking fails, across all the proportional steps and operation setting of the DSM that we investigated. With the minimum proportional step and a disabled DSM, the ADCDR fails to lock with patterns of the run length of more than 10, which can be found in, for example, a  $2^{11} - 1$  PRBS. In this situation, the ratio of the proportional to the integral step is relatively small but the run length of the input is large. Fig. 11 shows the measured jitter for various proportional steps as a function of the operating frequency when the input was a  $2^{31} - 1$  PRBS. It shows that the jitter scales linearly with the proportional step, as expected, and the ADCDR with an active APGC and the minimum proportional step, generates minimal jitter. Fig. 12 is a histogram of the measured jitter in the recovered and divided-by-10 clock in response to a 3.0-Gb/s  $2^{31} - 1$  PRBS. The measured rootmean-square jitter is 3.07  $\ensuremath{\mathsf{ps}_{\mathrm{rms}}}$  and the peak-to-peak jitter is 26.6 ps<sub>pp</sub>. This measurement includes the sampling and triggering jitter in the oscilloscope and signal generator, which are up to 1.2  $\mathrm{ps}_{\mathrm{rms}}$  and 9.0  $\mathrm{ps}_{\mathrm{pp}},$  respectively.

Jitter tolerance tests were carried out with an Agilent J-BERT N4903A serial bit-error-ratio tester which contains a pattern



Fig. 9. Measured results of the DCO: (a) period, (b) differential non-linearity, and (c) integral non-linearity ((b) and (c) with a 1.2-V supply).

|                 | [10]                                                                            | [12]                                                                             | This work                                                                        |
|-----------------|---------------------------------------------------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| Technology      | 0.13-µm CMOS                                                                    | 0.13-µm CMOS                                                                     | 0.13-µm CMOS                                                                     |
| Operating range | 0.8 - 1.8Gb/s                                                                   | 0.6 - 2.87Gb/s                                                                   | 1.0 - 4.0Gb/s                                                                    |
| Area            | 0.10mm <sup>2</sup>                                                             | 0.13mm <sup>2</sup>                                                              | 0.074mm <sup>2</sup>                                                             |
| Power @ 1.2V    | 12.0mW @ 1.65Gb/s                                                               | 13.2mW @ 2.5Gb/s                                                                 | 11.4mW @ 3.0Gb/s<br>PD&DES : 5.4mW<br>DCO : 2.9mW<br>Synthesized logic : 3.1mW   |
| Jitter          | 8.9ps <sub>rms</sub> , 68.8ps <sub>pp</sub><br>@ 1.6Gb/s 2 <sup>7</sup> -1 PRBS | 7.2ps <sub>rms</sub> , 47.2ps <sub>pp</sub><br>@ 2.5Gb/s 2 <sup>31</sup> -1 PRBS | 3.6ps <sub>rms</sub> , 29.4ps <sub>pp</sub><br>@ 3.0Gb/s 2 <sup>31</sup> -1 PRBS |

TABLE I CHIP SUMMARY AND COMPARISON



Fig. 10. Measured supply sensitivity of the DCO @ 1.2 V.

generation module and an error detection module. The ADCDR

in the prototype chip recovered the clock and data from a sinusoidal-jitter-injected  $2^{10} - 1$  PRBS pattern, which is provided by the pattern generation module. The retimed half-rate data in the PD are combined by a 2-to-1 multiplexer, and then the resulting full-rate data stream is driven to the error detection module. Note that this path is only created for the jitter tolerance test and is not shown in Fig. 2. The error detection module recovered the clock and data from the serialized data stream only and determined the error. When the pattern generation module was connected to the prototype chip with short cables, the jitter tolerance results did not vary with changes in the proportional step setting. There was no error even when the sinusoidal jitter was given at the maximum amplitude as shown by the dashed line in Fig. 13. The high-frequency jitter tolerance was slightly more than 0.5 UI in this condition.

In order to test the effectiveness of the APGC, a 20-inch Nelco 4006-2 trace was inserted between the pattern generation module and the prototype chip, and this induced a significant ISI to reduce the high-frequency jitter tolerance. Fig. 13 shows measurements taken under that condition. With the large



Fig. 11. (a) RMS and (b) peak-to-peak jitter in response to  $2^{31} - 1$  PRBS, for different operating frequencies and proportional steps.



Fig. 12. Recovered clock jitter histogram for 3.0-Gb/s  $2^{31} - 1$  PRBS.

ISI, the high-frequency jitter tolerance was reduced to about 0.23 UI. It was observed that a higher proportional step increases the tracking bandwidth, and the highest jitter tolerance was achieved with the maximum proportional step, at 10 MHz. When the APGC was activated, the jitter tolerance was the same as it was with the maximum proportional step.

Fig. 14 shows the measured power consumption. With a nominal 1.2-V supply, the prototype chip operates at a maximum bit rate of 4.0 Gb/s, which it consumes 14.9 mW. When the supply voltage is lowered to 0.9 V, the maximum bit rate is reduced to 2.5 Gb/s and the power consumption to 4.9 mW. Table I summarizes the performance of this work and compares it with other published ADCDRs which contain DCOs.

# V. CONCLUSION

We have presented a fully-integrated all-digital clock and data recovery circuit, which was specifically designed to address the jitter problems seen in previous ADCDRs. To overcome the problem, our design incorporates a DCR-based DCO, which combines a wide tuning range with high resolution, and an APGC which counteracts jitter resulting from bang-bang control. The prototype chip was fabricated in a 0.13- $\mu$ m CMOS technology and gives good jitter performance across its operating range of 1.0 Gb/s to 4.0 Gb/s. Since all the functions of this CDR, including phase comparison,



Fig. 13. Measured jitter tolerance results with a 20-inch trace at 3.0-Gb/s  $2^{10}\!-\!1$  PRBS.



Fig. 14. Measured power consumption.

phase-error accumulation, adaptive proportional gain control, and frequency acquisition, are performed in the digital domain and implemented using standard transistors alone without any passive devices, our CDR occupies a small area and is able to benefit from aggressive device scaling.

#### ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for providing constructive comments to improve the quality of this paper. Also, the authors wish to acknowledge the helpful discussions with Prof. Jaeha Kim of Seoul National University. The support of IDEC (IC Design Education Center) by the provision of CAD tools, and the support of ISRC (Inter-University Semiconductor Research Center) by the provision of measurement equipment are gratefully acknowledged.

## References

- M.-J. E. Lee, W. J. Dally, and P. Chiang, "Low-power area-efficient high-speed I/O circuit techniques," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1591–1599, Nov. 2000.
- [2] J. Lee and B. Razavi, "A 40-Gb/s clock and data recovery circuit in 0.18-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2181–2190, Dec. 2003.
- [3] K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang, "A 27-mW 3.6 Gb/s I/O transceiver," *IEEE J. Solid-State Circuits*, vol. 39, no. 4, pp. 602–612, Apr. 2004.
- [4] J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally, and M. Horowitz, "A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 12, pp. 2745–2757, Dec. 2007.
- [5] C.-F. Liao and S.-I. Liu, "A 40 Gb/s CMOS serial-link receiver with adaptive equalization and clock/data recovery," *IEEE J. Solid-State Circuits*, vol. 43, no. 11, pp. 2492–2502, Nov. 2008.
- Circuits, vol. 43, no. 11, pp. 2492–2502, Nov. 2008.
  [6] J.-H. Bae, S.-H. Park, J.-Y. Sim, and H.-J. Park, "A low-voltage high-speed CMOS inverter-based digital differential transmitter with impedance matching control and mismatch calibration," *IEEK J. Semiconductor Technology and Science*, vol. 9, no. 1, pp. 14–21, Mar. 2009.
- [7] Y. Frans, N. Nguyen, B. Daly, Y. Wang, D. Kim, T. Bystrom, D. Olarte, and K. Donnelly, "A 1-4 Gbps quad transceiver cell using PLL with gate-current leakage compensator in 90 nm CMOS," in *Symp. VLSI Circuits Dig.*, Jun. 2004, pp. 134–137.
- [8] P. K. Hanumolu, G.-Y. Wei, U.-K. Moon, and K. Mayaram, "Digitallyenhanced phase-locking circuits," in *Proc. IEEE CICC*, Sep. 2007, pp. 361–368.
- [9] J. Lin, B. Haroun, T. Foo, J.-S. Wang, B. Helmick, S. Randall, T. Mayhugh, C. Barr, and J. Kirkpatrick, "A PVT tolerant 0.18 MHz to 600 MHz self-calibrated digital PLL in 90 nm CMOS process," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 488–489.
- [10] R. B. Staszewski *et al.*, "All-digital TX frequency synthesizer and discrete-time receiver for Bluetooth radio in 130-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2278–2291, Dec. 2004.
- [11] P. K. Hanumolu, M. G. Kim, G.-Y. Wei, and U.-K. Moon, "A 1.6 Gbps digital clock and data recovery circuit," in *Proc. IEEE CICC*, Sep. 2006, pp. 603–606.
- [12] J. L. Sonntag and J. Stonick, "A digital clock and data recovery architecture for multi-gigabit/s binary links," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1867–1875, Aug. 2006.
- [13] D.-H. Oh, D.-S. Kim, S. Kim, D.-K. Jeong, and W. Kim, "A 2.8 Gb/s all-digital CDR with a 10 b monotonic DCO," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 222–223.
- [14] J. A. Tierno, A. V. Rylyakov, and D. J. Friedman, "A wide power supply range, wide tuning range, all static CMOS all digital PLL in 65 nm SOI," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 42–51, Jan. 2008.
- [15] H. Lee, A. Bansal, Y. Frans, J. Zerbe, S. Sidiropoulos, and M. Horowitz, "Improving CDR performance via estimation," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 332–333.
- [16] V. Kratyuk, P. K. Hanumolu, K. Mayaram, and U.-K. Moon, "A 0.6 GHz to 2 GHz digital PLL with wide tracking range," in *Proc. IEEE CICC*, Sep. 2007, pp. 305–308.
- [17] R. C. Walker, "Designing bang-bang PLLs for clock and data recovery in serial data transmission systems," in *Phase-Locking in High Performance Systems*, B. Razavi, Ed. New York: IEEE Press, 2003.

- [18] N. D. Dalt, "A design-oriented study of the nonlinear dynamics of digital bang-bang PLL," *IEEE Trans. Circuits Syst. I*, vol. 52, no. 1, pp. 21–31, Jan. 2005.
- [19] J. Lee, K. Kundert, and B. Razavi, "Analysis and modeling of bangbang clock and data recovery circuits," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1571–1580, Sep. 2004.
- [20] M. Brownlee, P. K. Hanumolu, and U.-K. Moon, "A 3.2 Gb/s oversampling CDR with improved jitter tolerance," in *Proc. IEEE CICC*, Sep. 2007, pp. 353–356.
- [21] B. Razavi, "Devices and circuits for phase-locked systems," in *Phase-Locking in High Performance Systems*, B. Razavi, Ed. New York: IEEE Press, 2003.



**Heesoo Song** received the B.S. and M.S. degrees in electrical engineering and computer science from Seoul National University, Seoul, Korea, in 2002 and 2004, respectively, where he is currently working toward the Ph.D. degree.

His research interests include high-speed I/O circuits, PLL/DLL, and low power CMOS circuit design.



**Deok-Soo Kim** received the B.S. and M.S. degrees in electrical engineering and computer science from Seoul National University, Seoul, Korea, in 2003 and 2005, respectively, where he is currently working toward the Ph.D. degree.

His research interests include high-speed I/O circuits, clock multiplication and frequency-synthesis techniques, and low-power mixed-signal circuit design.



**Do-Hwan Oh** received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer science from Seoul National University, Seoul, Korea, in 2001, 2003, and 2009, respectively. His doctoral dissertation focused on designing clock-and-data recovery circuits and frequency synthesizers by exploiting all-digital phase-locked loops.

Since 2001, he has been with MELFAS, Seoul, Korea, developing capacitive sensors for touch screen. His current research interests include noise immunity improvement of capacitive sensors and

low-power system-on-chip designs.



Suhwan Kim received the B.S. and M.S. degrees in electrical engineering and computer science from Korea University, Seoul, Korea, in 1990 and 1992, respectively, and the Ph.D. degree in electrical engineering and computer science from the University of Michigan, Ann Arbor, MI, in 2001.

From 1993 to 1999, he was with LG Electronics, Seoul, Korea. From 2001 to 2004, he was a Research Staff Member with IBM T. J. Watson Research Center, Yorktown Heights, NY. In 2004, he joined Seoul National University, Seoul, Korea, where he

is currently an Associate Professor of electrical engineering. His research interests encompass high-performance and low-power analog and mixed signal integrated circuits and technology, digitally compensated analog circuits, and high-speed I/O circuits.



**Deog-Kyoon Jeong** received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1981 and 1984, respectively, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1989.

From 1989 to 1991, he was with Texas Instruments, Dallas, TX, as a Member of the Technical Staff and worked on the modeling and design of BiCMOS gates and the single-chip implementation of the SPARC architecture. He joined the faculty of

the Department of Electronics Engineering and Inter-University Semiconductor Research Center, Seoul National University, where he is currently a Professor. He is one of the co-founders of Silicon Image which specializes in digital interface circuits for video displays such as DVI and HDMI. His main research interests include the design of high-speed I/O circuits, phase-locked loops, and network switch architectures. He has published more than 80 technical papers and holds 52 U.S. patents.

Dr. Jeong was one of the recipients of the ISSCC Takuo Sugano Award in 2005 for Outstanding Far-East Paper.