## An Energy and Area-Efficient PAM-4 Data Coding Scheme with Embedded Supply Noise Stabilization for Single-Ended Memory Interface

Giyeong Heo<sup>\*1</sup>, Younghwan Chang<sup>\*2</sup>, Yong-Un Jeong<sup>3</sup>, Jaekwang Yun<sup>1,4</sup>, Jusung Lee<sup>1,2</sup>, Shin-Hyun Jeong<sup>1</sup>, Sanghyuk Seo<sup>1</sup>, and Suhwan Kim<sup>1</sup>

<sup>1</sup>Seoul National University, Seoul, Korea, <sup>2</sup>Samsung Electronics, Hwasung, Korea

<sup>3</sup>Sejong University, Seoul, Korea, <sup>4</sup>SK Hynix, Icheon, Korea

## \*Equally Credited Authors (ECAs)

To meet the demand for high bandwidth memory, PAM-4 signaling has been recently introduced as an alternative to conventional NRZ signaling [1-5]. Although achieving twice the per-pin data rate of NRZ at the same clock frequency, PAM-4 signals are more vulnerable to noise sources such as inter-symbol interference (ISI), crosstalk (XT), and power noise due to a reduction in vertical and horizontal eye margin [2], [3]. To mitigate these noise sources with minimal area, PAM-4 data coding schemes have been proposed [1-5]. Most of these works use lookup table (LUT)-based 7b/8b maximum transition avoidance (MTA) coding to improve the signal integrity (SI) by eliminating maximum transitions, which significantly degrade both ISI and XT [1-4]. For 7b/8b MTA coding, the extra pin formerly used for data bus inversion (DBI) is used to transfer 1-bit of unencoded data per lane to maintain data bandwidth as before encoding. However, due to the absence of the DBI function that was originally used to improve energy efficiency and supply voltage fluctuation [6], power noise has become more critical in the 7b/8b MTA. In addition, applying DBI in conjunction with MTA using an extra pin is not a viable solution since it can regenerate maximum transitions [3]. Therefore, additional compensation circuits such as low-dropout (LDO) regulators is used to minimize power noise, but at the cost of additional area and power overhead [1].

This paper proposes a transmitter architecture employing embedded transitioninjecting MTA (eTI-MTA) coding, which minimizes power noise without additional compensation circuitry. The proposed X(N)OR-based eTI-MTA coding algorithm reduces pattern-dependent noise, thus providing supply noise stabilization and further ISI reduction compared to conventional MTA coding. As these functionalities are all embedded in the data coding algorithm itself, signal integrity can be improved without requiring additional compensation circuits. Furthermore, the eTI-MTA coding is implemented with a low gate count and low logic-depth, thus enabling energy and area-efficient operation of the proposed transmitter.

Fig. 1 illustrates the impact of supply noise and ISI on the transmitted signal for LUTbased 7b/8b MTA and the proposed eTI-MTA. To address the noise sensitivity problem of the PAM-4 signal, 7b/8b MTA eliminates maximum transitions to improve eye margin in terms of ISI and XT. However, power supply induced jitter (PSIJ) still exists due to pattern-dependent power noise caused by data pattern with sparse transitions and is worsened in the absence of DBI [7]. To alleviate this issue, the proposed eTI-MTA coding injects additional inter-symbol transitions to generate data pattern with high transition density, which results in PSIJ reduction. ISI is also reduced and larger middle eye is achieved compared to 7b/8b MTA, due to the consecutive identical digits (CIDs) rejection effect of our proposed coding.

In Fig. 2 (top), algorithm of the two processes in eTI-MTA coding is described. For MTA encoding, if the even-numbered symbols have equal MSB and LSB values (S<sub>E1</sub>, S<sub>E2</sub>=11 or 00), the LSB of the corresponding symbol is flipped. Through this process, all even-numbered symbols are encoded to have symbol values of either 10 or 01. As a result, all maximum transitions are eliminated since the sequence of 11 or 00 cannot appear consecutively. For transition injection, if the odd-numbered symbols have equal symbol values of either 10 or 01 (S<sub>01</sub>, S<sub>02</sub>=10 or 01), the MSB of the S<sub>E1</sub> is flipped and transitions on both side of S<sub>E1</sub> are guaranteed to occur. Decoding also consists of two processes: (1) removing injected transitions using an MSB flipped signal (S<sub>E1</sub>') and (2) decoding even-numbered symbols using flag signal (XOP[1:0]). As shown in Fig. 2 (bottom), all these functions can be implemented with simple X(N)OR-based gate logic and the critical path of this logic is composed of only two X(N)OR gates and a NAND gate. Thus, low latency of the encoder and decoder is achieved compared to LUT-based MTA coding, which require additional inversion logic [3].

Fig. 3 presents the top architecture and sub-blocks of the transmitter that is implemented to verify the performance of the proposed eTI-MTA coding scheme. The data generator of the transmitter is implemented through digital synthesis to provide three types of data pattern: pseudo-random data, 7b/8b MTA coded data

and eTI-MTA coded data. A clock DCDL, consisting of 5-bit fine delay control and 1-bit coarse delay control, is used to align the data arrival timing of each data lane and compensate for quadrature error at each transmitter. An unstacked tailless current-mode logic (CML) driver, consisting of an unstacked PMOS current source and pull-down passive resistors, is employed as an output driver to minimize the dynamic switching power caused by the high data rate and dense transition pattern [8]. Since the output impedance of the tailless CML driver structure is controlled by passive resistors, the pre-driver structure can be simplified compared to that of a voltage-mode driver. Since the capacitive load of the pre-driver is reduced, lower dynamic power consumption can be achieved. Furthermore, since unstacked current source does not require a bias generator unlike its double-stacked counterpart, this structure is more suitable for high-speed, low-power memory interface in terms of area and energy efficiency.

Because MTA coding eliminates maximum transitions, the middle eye limited by  $2\Delta V$  transitions becomes the bottleneck of a MTA coded signal. The proposed eTI-MTA coding can achieve a larger middle eye opening without additional compensation circuits, owing to the CIDs rejection effect as described in Fig. 4 (top). In eTI-MTA, all even-numbered symbols have a value of either 10 or 01 after MTA encoding, effectively eliminating CIDs of 11 and 00, which induce the worst-case ISI on the middle eye. Since transition injection occurs only when adjacent symbol values are both 10 or 01, the CIDs rejection effect is preserved even after this process. Single-lane measurement results, shown in Fig. 4 (bottom), demonstrate the ISI reduction effect of the proposed eTI-MTA coded gat, the middle eye opening of eTI-MTA coded signal is increased by 10~12mV in the vertical direction and 0.07UI in the horizontal direction at 30Gb/s. Additionally, top and bottom eye openings are increased since the elimination of CIDs of 11 and 00 also reduces multi-cursor ISI.

As shown in Fig. 5 (top-right), the average number of transitions is increased by 13.3% after transition injection compared to LUT-based 7b/8b MTA coding [3]. In addition, the eTI-MTA coding achieves 10.2% higher transition density compared to the 7b/8b MTA coding using a modified lookup table for maximum symbol transition density. The increase in the average number of inter-symbol transitions shifts current components in the mid-frequency range to higher frequencies, resulting in an average current reduction of 6.8dB in the mid-frequency range as shown in Fig. 5 (top-left). The eTI-MTA coded signal reduces PSIJ by eliminating data patterns with sparse transitions, which induce supply noise when combined with power distribution network (PDN) impedance peaking. Measurement results in Fig. 5 (bottom) show that all eyes of PAM-4 signals are closed due to supply noise using 7b/8b MTA coding. On the other hand, all eyes remain open by at least 20.2mV in the vertical direction and 0.22UI in the horizontal direction at 24Gb/s using the proposed eTI-MTA coding. The multi-channel measurement is performed with 9 and 10 channels enabled, respectively, to account for the pin overhead of each MTA coding scheme.

The comparison table in Fig. 5 summarizes the features of the proposed eTI-MTA in comparison to other MTA coding methods. The eTI-MTA coding has advantages in terms of hardware complexity and latency overhead compared to LUT-based 7b/8b MTA coding, which require a lookup table and additional logic. These advantages can be further improved if the eTI-MTA encoder is implemented in the high-speed data path with a custom layout. The decoder can also be implemented with simple logic in a similar manner. As a result, the read/write latency overhead of the memory interface can be minimized. In addition, although eTI-MTA requires an additional pin for decoding, the data coding scheme can improve jitter performance in terms of PSIJ and multi-cursor ISI without the need for an additional equalizer or LDO regulators.

The prototype transmitter to verify the proposed eTI-MTA is fabricated in a 28nm CMOS process. VDDQ and VDD are both 1.1V and output voltage swing is 0.3V for the saturation condition of the driver's PMOS current source. The transmitter achieves 30Gb/s with single channel enabled and 24Gb/s with multi-channel enabled. Channel loss for single and multi-channel measurements are 4.2dB at 7.5GHz and 3.5dB at 6GHz, respectively. In Fig. 6, the performance of the transmitter in this work is summarized in comparison with state-of-the-art transmitters that use MTA coding. Owing to the ISI and PSIJ reduction effect of eTI-MTA coding, our transmitter achieves an energy efficiency of 1.11pJ/bit, which is the lowest among all previous works.

1





## References

[1] J. Kim *et al.*, "A 60-Gb/s/pin single-ended PAM-4 transmitter with timing skew training and low power data encoding in mimicked 10nm class DRAM process," CICC, 2022. <u>https://doi.org/10.1109/CICC53496.2022.9772814</u>

[2] E. Song *et al.*, "A 35-Gb/s PAM-4 Transmitter with 7B4Q Full-Transition Avoidance and Area-Efficient Gm-Boosting Techniques," TCAS-II, 2023. https://doi.org/10.1109/TCSII.2023.3302023

[3] M. O'Connor *et al.*, "Saving PAM4 Bus Energy with SMOREs: Sparse Multi-level Opportunistic Restricted Encodings," HPCA, 2022. https://doi.org/10.1109/HPCA53966.2022.00077

[4] H. N. Rie *et al.*, "A 40-Gb/s/pin Low-Voltage POD Single-Ended PAM-4 Transceiver with Timing Calibrated Reset-less Slicer and Bidirectional T-Coil for GDDR7 Application," VLSI, 2022. https://doi.org/10.1109/VLSITechnologyandCir46769.2022.9830507

[5] Y. Su *et al.*, "Energy-efficient bus encoding techniques for next-generation PAM-4 DRAM interfaces," ICCD, 2022. <u>https://doi.org/10.1109/ICCD56317.2022.00106</u>
 [6] T. M. Hollis *et al.*, " Data Bus Inversion in High-Speed Memory Applications," TCAS-II, 2009. https://doi.org/10.1109/TCSII.2009.2015395

## Acknowledgements

This work was supported in part by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No.2022-0-01013).

[7] Y. Komatsu *et al.*, "A 0.25–27-Gb/s PAM4/NRZ Transceiver With Adaptive Power CDR and Jitter Analysis," JSSC, 2019. <u>https://doi.org/10.1109/JSSC.2019.2920082</u>
[8] Y. –U. Jeong *et al.*, "A Single-Ended PAM-4 Transmitter using Unstacked Tailless CML Driver and Coefficient-Corrected FFE for Memory Interfaces," TCAS-I, 2024. <u>https://doi.org/10.1109/TCSI.2024.3450875</u>