## 28.4 A 4.8Gb/s Impedance-Matched Bidirectional Multi-Drop Transceiver for High-Capacity Memory Interface

Woo-Yeol Shin<sup>1</sup>, Gi-Moon Hong<sup>1</sup>, Hyongmin Lee<sup>1</sup>, Jae-Duk Han<sup>1</sup>, Sunkwon Kim<sup>1</sup>, Kyu-Sang Park<sup>1</sup>, Dong-Hyuk Lim<sup>1</sup>, Jung-Hoon Chun<sup>2</sup>, Deog-Kyoon Jeong<sup>1</sup>, Suhwan Kim<sup>1</sup>

<sup>1</sup>Seoul National University, Seoul, Korea, <sup>2</sup>Sungkyunkwan University, Suwon, Korea

With the scaling of CMOS transistors and advance in I/O circuitry, the data rate of memory interfaces has recently reached 16Gb/s per channel [1], in which a point-to-point channel is required rather than a multi-drop channel for the high data rate. While point-to-point channels are advantageous in achieving higher data rates because of the absence of undesired reflections that occur at each stub of multi-drop channels, they are not suitable for high-capacity, highthroughput memory systems such as transaction servers or cloud computing nodes due to their prohibitively large PCB routing area connecting the memory chips. FBDIMM [2] and the cascading memory architecture [3] aim to reduce the routing area by the use of daisy-chained configurations, but they suffer from increased latency problems. This is why the recent DDR2/3 memory interface still uses the multi-drop bus architecture called stub series terminated logic (SSTL), and a number of proposals have been made to mitigate the problem of stub reflections in SSTL. For instance, a decision feedback equalizer has been used [4] to cancel the inter-symbol interference (ISI) due to stub reflections; but this requires a large number of filter taps, resulting in a limited speed under 3Gb/s. Another approach to eliminate impedance discontinuity is to use a 2Zn ohm transmission line [5], but this scheme is only applicable to 2-slot configurations.

In this paper, first, we describe an impedance-matched bi-directional multi-drop (IMBM) DQ bus that can handle up to 4 slots, 8 drops at a data-rate of up to 4.8Gb/s. In the case of the SSTL DQ bus, as shown in Fig. 28.4.1, the series resistor of  $Z_0/2$  can suppress ringing and attenuate reflections within the channel. But, the SSTL DQ bus is still not entirely free from reflections among the slots because the reflection coefficient of the SSTL DQ bus at the stub junctions is -1/4. However, the IMBM DQ bus that we propose makes the reflection coefficient 0 as can be seen from the equations presented in Fig. 28.4.1. Therefore, the IMBM DQ bus generates no reflection signal at each stub. Moreover, the IMBM DQ bus can send write signals of the same current level to every module according to its current division relation of  $I_k$  and  $I_{k+1}$ . The transceiver also receives the read signal from every module with the identical level, according to the reciprocity theorem [4]. This characteristic produces identical transfer responses for all the modules regardless of their positions.

Figure 28.4.2 shows the 4-slot, 8-drop IMBM DQ bus that we have implemented. Instead of matching the impedance in every direction, the IMBM DQ bus matches impedance only in the left-to-right direction at the upper transmission lines (TL 1,2,3,4 in Fig. 28.4.2). Since there is no reflection from the right-end of the upper TLs on the motherboard during the write operation, the write data signal from the memory controller to the memory modules at positions 0 through 7 can be transmitted without any reflection. During a read operation, however, reflection signals may be generated at each stub; but these reflections proceed from the stubs to right-end of the upper TLs, and are finally absorbed by the ODT resistors. Thus they do not reach the memory controller and hence do not degrade the integrity of the desired signal. Therefore, the IMBM DQ bus is able to transmit and receive both read and write signals without reflective ISI.

Second, we design a memory transceiver architecture that is suitable for the IMBM DQ bus, which attenuates the transmitted signals by the reciprocal of the number of modules. Direct data sampling using the received strobe is not recommended since the limiting amplifier chains necessary to recover the original signal swings would draw too much power in this architecture. Instead, we use the memory transceiver and RX clocking architecture, as shown in Fig. 28.4.3. This transceiver consists of 4 DQ channels, a DQS channel, a PLL and clock trees. A current-mode driver with a 4-tap FIR filter in the TX of each DQ enables de-emphasis for equalization. The PLL and a clock tree provide a TX clock for the serializer and multiphase clocks for the strobe recovery unit (SRU) of the DQS

block. The received strobe signal is used for timing recovery in RX mode. A dualloop architecture is used in the SRU for timing recovery. To generate the sampling clock for every DQ with the proper phase, the SRU generates the RX sampling clock with a phase interpolator, a half-rate bang-bang phase detector and control logics. A duty cycle corrector (DCC) compensates for the duty cycle distortions that may arise in the clock tree and the phase interpolator. To ensure that all DQs receive the recovered sampling clock signal with the equal phase, their clocks are shortened together to reduce possible on-chip clock skews. To reduce the skew between each DQ and SRU, the sampling clock for the PD is also delivered along the same clock tree.

The IMBM memory transceiver is fabricated in 0.13 $\mu$ m CMOS process and occupies 1400×1200 $\mu$ m<sup>2</sup>. The die microphotograph and the performance summary are shown in Fig. 28.4.7. The chip board and channel board were implemented separately to enable measurements with different channel configurations. Both boards are fabricated in NELCO material instead of FR4 to reduce the PCB insertion loss. MicroTCA [6] connectors are used on the channel board. The transceiver consumes 14.24mW/Gb/s/DQ at 4.8Gb/s in TX mode and 13.69mW/Gb/s/DQ at 4.8Gb/s in RX mode.

Figure 28.4.4 illustrates the 4.8Gb/s single-bit responses (SBRs) and the eye diagrams of a conventional SSTL DQ bus and the new IMBM DQ bus. Due to the reflections between connectors, the SBRs of the SSTL DQ bus vary vastly depending on the module positions. However, all the SBRs of the IMBM DQ bus are almost identical regardless of the module positions. Even when compared to the scaled SBR of the transmitted signal measured only with the chip board, the SBR of the IMBM DQ bus generates little reflection. The measured eye diagrams in Fig. 28.4.4 also substantiate the difference between the SSTL DQ bus and the IMBM DQ bus. Figure 28.4.5 shows the measured TX BER results of both the SSTL and IMBM DQ buses. Without TX equalization, SSTL modules #5 and #7 have no timing margin, whereas modules #1 and #3 have large margins. However, the BER of the IMBM DQ bus is nearly 10<sup>-7</sup> at the optimum sampling point. When TX de-emphasis is enabled (by setting the precursor tap to 0, the main cursor tap to 1, the first post-cursor tap to -0.33, and the second post-cursor tap to -0.08), the IMBM DQ bus achieves 10<sup>9</sup> BER with a timing margin of 0.39UI, as shown in Fig. 28.4.5. In comparison, we can see that some of the SSTL DQ modules fail to reach a BER of 10<sup>-9</sup> at any sampling positions. In the BER measurement of the RX without equalization, the recovered data exhibits many errors beyond reliable operation at SSTL module #5 and #7 as was the case with the TX BER measurement, as shown in Fig. 28.4.6. Un-equalized IMBM DQ bus, however, has a 0.61UI timing margin with a BER threshold of 10<sup>-9</sup>. Timing margin of the RX BER test is larger than that of the TX BER test because of the clean data transmitted from the PBERT. When the memory transceiver turns on the linear equalizer to boost its high-frequency gain, the IMBM DQ bus has a 0.73UI timing margin, which is 0.21UI higher than that of the SSTL DQ bus.

## References:

 H. Lee, et al., "A 16 Gb/s/Link, 64 GB/s Bidirectional Asymmetric Memory Interface," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1235-1247, April 2009.
E. Prete, D. Scheideler, and A. Sanders, "A 100mW 9.6Gb/s Transceiver in 90nm CMOS for Next-Generation Memory Interfaces," *ISSCC Dig. Tech. Papers*, pp. 253–262, Feb. 2006.

[3] Z. Gu, et al., "Cascading Techniques for a High-Speed Memory Interface," *ISSCC Dig. Tech. Papers*, pp. 234–235, Feb. 2007.

[4] H. Fredriksson and C. Svensson, "Improvement Potential and Equalization Example for Multidrop DRAM Memory Buses," *IEEE Trans. Adv. Packag.*, vol. 32, no. 3, pp. 675-682, Aug. 2009.

[5] S.-J. Bae, H.-J. Chi, H.-R. Kim, and H.-J. Park, "A 3Gb/s 8b Single-Ended Transceiver for 4-Drop DRAM Interface with Digital Calibration of Equalization Skew and Offset Coefficients," *ISSCC Dig. Tech. Papers*, pp. 520–521, Feb. 2005.

[6] MicroTCA Standard, http://www.picmg.org/v2internal/microtca.htm



