# A 5-Gb/s Digitally Controlled 3-Tap DFE Receiver for Serial Communications

Jae-Duk Han<sup>1</sup>, Woo-Yeol Shin<sup>2</sup>, Woo-Seok Choi<sup>2</sup>, Jung-Hoon Chun<sup>3</sup>, Suhwan Kim<sup>2</sup>, and Deog-Kyoon Jeong<sup>2</sup>

<sup>1</sup>TLI Inc. Seongnam, Korea Email: jdhan@tli.co.kr <sup>2</sup>School of Electrical Engineering and Computer Science and Inter-University Semiconductor Research Center Seoul National University Seoul, Korea

<sup>3</sup>Department of Semiconductor Systems Engineering Sungkyunkwan University Suwon, Korea

*Abstract*— Decision feedback equalizers (DFEs) play a critical role in high-speed communications through band-limited channels. We implemented a 3-tap DFE receiver for 5-Gb/s data bandwidth. To realize a multi-tap DFE operation, a digitalcontrol scheme is proposed that does not use analog circuits for biasing, such as DACs. In addition to the conventional loop unrolling, several techniques including combined feedback are used to reduce the latency of the feedback path. Fabricated in a 0.13-µm CMOS process, the prototype of the proposed DFE core has an area of 0.009 mm<sup>2</sup> and consumes 8.4 mW from a 1.2-V supply, achieving a BER of less than 10<sup>-11</sup> over a pair of 28-inch Nelco 4000-6 board traces.

# I. INTRODUCTION

As the clock speeds of silicon processing cores continue to increase, data-rates of modern chip-to-chip communication systems become higher than the allowed channel bandwidth. Transceivers of the systems, in such cases, become vulnerable to distortions arising from the undesirable characteristics of the given channels, such as loss, crosstalk, and reflection. These kind of distortions are called inter-symbol interference (ISI) because a symbol is shaped by its adjacent symbols. ISI has been one of the most speed-limiting factors in high-speed transceivers [1-9].

Many equalization techniques, such as continuous-time equalization [2][3][8], feed-forward equalization [1][5], and decision feedback equalization [1][4-7], have been proposed to overcome the bandwidth limitation. Among them, decision feedback equalization (DFE), thanks to its nonlinear characteristics and noise immunity, is known as the most powerful way to eliminate ISI.

High-speed decision feedback equalizers should resolve two issues: feedback path delay and power consumption. First, the maximum speed of a DFE is constrained by the loop delay of its feedback path. Loop unrolling has been used in previous



Fig. 1. Block diagram of the proposed DCDFE receiver

works to mitigate the problem [1][4][6-7]. Second, because a DFE usually involves many summers, latches, and muxes equalization combined DACs to build with an function, it occupies a large area and exhibits high power consumption [1]. As a solution to these problems, we propose a digitally controlled DFE (DCDFE) that uses digitally controlled samplers (DCS) on top of a loop unrolling. Overall architecture and data paths of the DCDFE are optimized to enhance the speed of the DFE while lowering power consumption. DCS eliminates DACs and current mode logic circuits, saving power and area resources. In addition, DCS effectively saves area and timing resources with a combined feedback. We implemented a 5-Gb/s DCDFE using a 130-nm CMOS process. It is demonstrated that the proposed DCDFE

receiver exhibits low power consumption and occupies a small area.

## II. DESIGN CONSIDERATIONS

Fig. 1 shows the block diagram of the proposed receiver, which contains a DCDFE core, an on-die termination (ODT) to control termination impedance, a loop-back path and clock buffers for testing. A clock recovery operation can be realized by placing additional samplers and CDR units.

Generally, there are two ways of implementing a highspeed DFE. The first approach is using an analog current summer with multiple inputs [1]. Fig. 2(a) shows such an example. One input pair of the summer is connected to the input, and other pairs to the output of the cascaded delay elements. ISI in the received data is effectively eliminated by pulling currents through the coupled input pairs. The magnitude of each current source is set by each DAC. Since it has a simple structure, design complexity is reduced and multi-tap implementation is possible with small overheads because the number of filter taps can be increased easily by adding another differential pair to the output. The main drawback of this current summing approach is its high power consumption. For the proper function of the analog summer, it dissipates currents continuously during the entire period of operation. Moreover, RC loading at the output nodes heavily restricts the attainable data-rate of a DFE. In [4], a currentintegrating scheme is proposed to overcome the speed barrier. However, designing a high-speed integrator is difficult because the integrated value is susceptible to various factors such as PVT variations, data-rate, and device mismatches.

favored method for Another high-speed DFE implementation is using samplers with a variable threshold, whose conceptual diagram is illustrated in Fig. 2(b) [6-7]. In [6], a strong-arm latch with an offset calibration function forms a threshold-adjustable sampler, by intentionally adding an offset to the received signal. The interference can be removed, if the amount of the given offset matches with that of ISI. DACs are usually used to generate the offset voltage. The offset-adding function only requires slicers in the frontend, not current summers, which can be realized with a small power budget. However, it still requires analog biasing circuits and DACs dissipating static currents. Furthermore, multi-tap implementation is possible only by partially adopting the former approach, using current summers for filter taps except for the first tap. This is because the decision thresholds of the samplers cannot be changed instantly, limiting the coverage of the DFE with offset-added samplers only to the first tap.

We realized a high-speed DFE simply by adding switches to the output of samplers. Fig. 2(c) depicts the circuit diagram of the sampler. We call it a digitally controlled sampler (DCS). Its main difference from traditional offset adjusting DFEs is that decision thresholds ( $V_{TH}$ ) of DCSs are changed by turning on/off switches, not by moving analog voltage/current levels. This approach removes DACs, and it is possible to implement a multi-tap DFE easily with simpler circuits because the switching function can be carried out faster than analog operations, as illustrated in Fig. 3. It may increase the capacitance of the output nodes of samplers, but the increment can be minimized by sharing a switch between taps. If the



Fig. 2. Examples of DFE implementation and the proposed DCDFE (a) by using current summers (b) by using samplers with adjustable offset (c) DCDFE

offset code vs. threshold (OFC-V<sub>TH</sub>) curve shows a linear relationship and the sum of tap weights are smaller than the adjustable range of thresholds, taps can share a switch. The proposed DFE has a 3-tap filter as its feedback filter, which, we determined, is good enough for the channel we consider. We decided to share one switch array between the second tap and the third tap (T2 and T3) because the measured OFC-V<sub>TH</sub> curve exhibits almost linear behavior as shown in Fig. 4, and usually the second and third tap coefficients are much smaller than the first tap coefficient.







Fig. 4. Measured OFC-V $_{TH}$  curve of DCS

### III. IMPLEMENTATION

As mentioned before, the major concern in designing a high-speed DFE is its speed limitation posed by the feedback path. Not only is the loop unrolling, introduced in [6], applied to the first tap (T1) feedback stage of this work, but also the second and the third tap feedback path delays are dealt with. We should take care of these paths because they form another critical path in the loop-unrolling DFE. That is, to reduce the input capacitances of the switches in samplers, each switch transistor is split into three series parts and the middle part is always turned on (Fig. 2(c)) as a cascode. To further reduce the second/third feedback path delay, feedback parameters are calculated in advance (Fig. 5). We call it combined feedback. Combined feedback can effectively minimize the feedback delay by reducing the number of logic depths in the critical path.

Many DFEs use tap adaptation algorithms to track the response of a given channel [1][6]. However, auxiliary samplers for the adaptation and on-chip adaptation logic consume extra power. We implemented the DCDFE with two versions, one uses external codes as its tap weights (T1~T3), and the other adapts its tap weights during the period of operation with a sign-sign-LMS (SS-LMS) algorithm. The former consumes lower power by about half, whereas the adaptive version can automatically set its tap codes under channel variations using SS-LMS algorithm.

#### IV. MEASUREMENT RESULTS

The prototype DFE was implemented in a 0.13-um CMOS process. Fig. 6 shows a die photo of the DFE receiver, which contains a DFE core, a clock receiver, terminations and a loop-back driver. Termination resistances of IO ports are calibrated by adjusting the setting codes of ODT. The DCDFE core occupies 0.009 mm<sup>2</sup> (=90  $\mu$ m x 100  $\mu$ m).

The measurement setup is shown in Fig. 7. A PRBS7 signal was generated in a bit-error-ratio tester (BERT) without applying any boosting algorithm at Tx side, and transmitted through a backplane channel. A 5-Gb/s eye diagram and its single-bit response measured at the end of a given channel are shown in Fig. 8. BER was measured in the same instrument and bathtub curves are drawn, as depicted in Fig. 9. A differential pair of 28-inch Nelco 4000-6 traces is selected as the channel, which is embedded in the BERT as an auxiliary module. As illustrated in Fig. 9, the proposed DFE successfully recovered the received data from the contaminated stream, widening the opening of the bathtub curve. As it can be shown in Fig. 9, the measured BER was





Fig. 8. (a) Eye diagram of 5-Gb/s PRBS7 input data stream passing through a pair of 28-inch differential channels and its single bit response (b) 6-Gb/s eye diagram and its single bit response

less than  $10^{-11}$  at the center of the eye after the equalization process. Eye diagrams captured at the output of the loop-back driver are shown in Fig. 10.

The DCDFE core operates at 5-Gb/s from a 1.2-V supply. However the data-rate can be increased up to 6-Gb/s from a 1.4-V supply. It dissipates 8.4 mW at 5-Gb/s data-rate and 12 mW at 6-Gb/s. In the case of the adaptive DFE, the amount of power consumption is increased to 20.4 mW at 5-Gb/s datarate. The increased power budget is consumed by an additional DCS for sampling the level error, a synthesized onchip tap adaptation block with a clock divider and tap retimers.



(b) 6-Gb/s equalized output (under 1.4 V supply)

#### V. CONCLUSION

In this paper, a digitally controlled 3-tap DFE is demonstrated, which offers an adequate capability to remove ISI induced by lossy channels. The digitally controlled DFE realizes a multi-tap equalization without using analog circuits, thereby reduces chip area and power consumption. The circuit is optimized to speed up the equalization process of the proposed DFE, reducing the feedback latency. Also, a two-tap combining feedback scheme, combined feedback, is applied to further increase the speed of the proposed DFE. The performance of the fabricated design is summarized in Table I with comparison with previously published works. The figure of merit in Table I is defined as follows: FOM = (Data-rate x Num. of taps) / (DFE core area x Power consumption)

# TABLE I. PERFORMANCE COMPARISON OF DFEs IN $90 \sim 130$ NM CMOS PROCESSES

|                      | [4]                           | [5]                           | This work                         |
|----------------------|-------------------------------|-------------------------------|-----------------------------------|
| Technology<br>(CMOS) | 90 nm                         | 90 nm                         | 130 nm                            |
| DFE core area        | $0.019 \text{ mm}^{2*1}$      | $0.063 \text{ mm}^2 *^2$      | 0.009 mm <sup>2</sup>             |
| Supply<br>voltage    | 1.0 V                         | 1.4 V                         | 1.2 V                             |
| Data-rate            | 7-Gb/s                        | 16-Gb/s                       | ~ 5-Gb/s * <sup>3</sup>           |
| Architecture         | Current<br>Integrating<br>DFE | Current<br>Summing<br>FFE+DFE | DCDFE<br>(no DACs)                |
| Num. of taps         | 2                             | 4                             | 3                                 |
| Power consumption    | 9.3-mW                        | 69-mW * <sup>2</sup>          | 8.4-mW @ 5-Gb/s<br>12-mW @ 6-Gb/s |
| FOM                  | 79.23                         | 14.72                         | 198.41                            |

\*1: Includes biasing and DACs \*2: Includes DMX and BIST \*3: ~6-Gb/s @ 1.4-V supply

#### ACKNOWLEDGMENT

Authors would like to extend special thanks to S. W. Hong, B. T. Jang, J. S. Yoon, S. H. Ahn, B. H. Lee of TLI Inc., Seongnam, Korea for technical support.

#### REFERENCES

- J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, et al., "A 10-Gb/s 5-Tap DFE/4-Tap FFE Transceiver in 90-nm CMOS Technology," IEEE J. Solid-State Circuits, vol. 41, no.12, pp. 2885-2900, Dec. 2006.
- [2] J. S. Choi, M. S. Hwang, D. K. Jeong, "A 0.18-µm CMOS 3.5-Gb/s Continuous-time Adaptive Cable Equalizer Using Enhanced Lowfrequency Gain Control Method," IEEE J. Solid-State Circuits, vol.39, no.3, pp. 419-425, March 2004.
- [3] M. C. Chou, Q. T. Chen, P. Chen, "A 250Mb/s-to-3Gb/s 5x Oversampling Receiver with an All-digital Adapting Equalizer," A-SSCC 2009, pp.181-184, 16-18, Nov. 2009.
- [4] M. Park, J. Bulzacchelli, M. Beakes, et al., "A 7Gb/s 9.3mW 2-Tap Current-Integrating DFE Receiver," ISSCC Dig. Tech. Papers., pp.230-231, Feb. 2007.
- [5] H. Sugita, K. Sunaga, K. Yamaguchi, et al., "A 16Gb/s 1<sup>st</sup>-Tap FFE and 3-Tap DFE in 90nm CMOS," ISSCC Dig. Tech. Papers, pp.162-163, Feb. 2010.
- [6] V. Stojanovic, A. Ho, B. W. Garlepp, et al., "Autonomous Dual-mode (PAM2/4) Serial Link Transceiver with Adaptive Equalization and Data Recovery," IEEE J. Solid-State Circuits, vol.40, no.4, pp. 1012-1026, April 2005.
- [7] K. Fukuda, H. Yamashita, F. Yuki, et al., "An 8Gb/s Transceiver with 3x-Oversampling 2-Threshold Eye-Tracking CDR Circuit for -36.8dBloss Backplane," ISSCC Dig. Tech. Papers, pp. 98-99, Feb. 2008.
- [8] J. Lee, "A 20-Gb/s adaptive equalizer in 0.13-µm CMOS technology," IEEE J. Solid-State Circuits, vol.41, no.9, pp.2058-2066, Sep. 2006.
- [9] J. H. Lee, S. Kim and D. K. Jeong, "A Combined Clock and Data Recovery Circuit with Adaptive Cancellation of Data-Dependent Jitter," J. Semiconductor Technology and Science, vol.8, no.3, Sep. 2008.