# A High-Accuracy and Fast-Correction Quadrature Signal Corrector Using an Adaptive Delay Gain Controller for Memory Interfaces

Hyunkyu Park\*, Jihwan Park†, Jae Whan Lee\*, Yong-Un Jeong\*, Shin-Hyun Jeong\*, Suhwan Kim\*, and Joo-Hyung Chae‡

\*Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea

<sup>†</sup>SK Hynix, Icheon, South Korea

<sup>‡</sup>Department of Electronics and Communications Engineering, Kwangwoon University, Seoul, South Korea E-mails: suhwan@snu.ac.kr, jhchae@kw.ac.kr

Abstract— In this paper, a quadrature signal corrector (QSC) with high accuracy and fast correction for memory interfaces is presented. An adaptive delay gain controller in the QSC adjusts each delay gain of four digitally controlled delay lines (DCDLs) separately depending on skew between the quadrature clocks, resulting in short correction time together with low residual skew. To validate the effectiveness of our QSC in memory interfaces, a quarter-rate single-ended 1-tap decision feedback equalizer (DFE) with the QSC was fabricated in a 65nm CMOS process. Using the adaptive delay gain controller, the QSC reduced the skew between the 3 GHz quadrature clocks from a maximum of 21.2 ps to 0.8 ps while correction time was reduced by a factor of 3.9 compared to that without using the adaptive delay gain controller. At 12 Gb/s, the DFE using our QSC achieved a BER of 10<sup>-12</sup> with an eye width of 140 mUI when the input clock skew is 13.2 ps.

# Keywords—quadrature signal corrector, adaptive delay gain controller, quarter data rate (QDR), memory interface

# I. INTRODUCTION

As the amount of data handled in memory increases, the memory interface is evolving to allow high-speed operation with high energy efficiency [1,2]. With this trend, increasing data-rate and clock frequency is required but this is limited by low-performance DRAM process [3]. Quarter-rate design has been implemented in recent memory interfaces to increase data-rate while achieving a more relaxed timing margin and lower power consumption than half-rate design [4,5].

For quarter-rate operation in the memory interface, quadrature clocks are generated by dividing high-speed differential clocks [4-6]. Skew between the quadrature clocks at the destination is introduced due to noise, mismatch and process, voltage, and temperature (PVT) variation of the clock distribution. Therefore, a skew correction circuit must be utilized in memory interfaces to improve signal integrity [5].

Although high-resolution skew correction allows for an enhanced timing window for data sampling, a trade-off exists between resolution and correction time. High resolution of delay units is needed for low residual skew [7], but it can increase correction time. Short correction time is required since DRAM remains on standby until the clocks are aligned during the initialization and power down exit states [2,7]. Therefore, a quadrature signal corrector (QSC) must be designed while considering both high-resolution skew correction and short correction time for optimal system performance.

Multiphase delay locked loops (MDLLs) were previously used to correct skew in quadrature clock signals [8,9]. Although correction time is short due to the parallel operation of multiple delay units, circuit mismatch can reduce the correction accuracy [5]. To solve this mismatch issue, a QSC implementing a single shared loop [5] was proposed. Although correction accuracy is greatly enhanced using high-resolution digitally controlled delay lines (DCDLs), the correction time is greatly increased. [10] and [11] introduced a loop using successive approximation registers (SARs) to improve correction time; but [10] requires additional duty-cycle correction and [11] still has a long correction time.

In this paper, we propose a QSC using an adaptive delay gain controller to achieve high-resolution skew correction and short correction time simultaneously. After the initial setting of delay gains, which means the amount of the change of delay codes to be controlled by the adaptive delay gain controller, the QSC starts the skew correction process. During this operation, when the clock skew becomes smaller than the product of the delay gain and the resolution of the DCDL, the adaptive delay gain controller reduces the delay gain. That is, large delay gains are initially used for fast correction, and smaller delay gains are subsequently adapted for high-resolution skew correction, achieving the short correction time and low residual skew. To verify the effectiveness of our QSC for memory interfaces, a prototype chip with a quarter-rate single-ended 1-tap decision feedback equalizer (DFE) with the QSC was fabricated and the performance improvement was verified.

# II. ARCHITECTURE

Fig. 1 shows the block diagram of the QSC. The QSC has four DCDLs (DCDL<sub>I</sub>, DCDL<sub>Q</sub>, DCDL<sub>IB</sub>, DCDL<sub>QB</sub>) to modify the timing of the quadrature clocks, a 4:2 MUX to select two adjacent clocks, two DCDLs (DCDL<sub>1</sub> and DCDL<sub>2</sub>), a bang-bang phase detector (BBPD), and a loop filter. The BBPD checks if the time difference between the two clocks is equal to the delay difference between DCDL<sub>1</sub> and DCDL<sub>2</sub>. The loop filter consists of a loop controller that selects two clocks for comparison and calculates the codes for DCDL<sub>Q</sub>, DCDL<sub>IB</sub>, DCDL<sub>QB</sub>, and DCDL<sub>1</sub> (CODE<sub>Q</sub>, CODE<sub>IB</sub>, CODE<sub>QB</sub>, and CODE<sub>1</sub>, respectively), and an adaptive delay gain controller that determines the delay gains



Fig. 1. Block diagram of proposed quadrature signal corrector.



Fig. 2. Timing diagrams of quadrature clocks (a) with skew and (b) without skew.



Fig. 3. Circuit and timing diagram to obtain  $\Delta t_D$  from  $\Delta t_{QB,I}$ .

of the DCDLs for calculating the codes. The delay gain is the amount of the change of delay code that the adaptive delay gain controller adjusts. Depending on the initial skew, the system can set the initial delay gains of each DCDL. The code of  $DCDL_I$  (CODE<sub>I</sub>) can be set to control the entire delay of the quadrature clocks.

Fig. 2 shows timing diagrams of quadrature clocks with skew and without skew.  $\Delta t_{I,Q}$ ,  $\Delta t_{Q,B,B}$ ,  $\Delta t_{IB,QB}$  and  $\Delta t_{QB,I}$  are the intervals between I<sub>OUT</sub> and Q<sub>OUT</sub>, Q<sub>OUT</sub> and IB<sub>OUT</sub>, IB<sub>OUT</sub> and QB<sub>OUT</sub>, and QB<sub>OUT</sub>, and QB<sub>OUT</sub>, respectively. If the intervals are not all equal, as shown in Fig. 2(a), at least one interval is different from  $\Delta t_{QB,I}$ . If the intervals are equal as shown in Fig. 2(b), there is no skew between the quadrature clocks. The QSC can determine whether skew between the clocks by comparing  $\Delta t_{OB,I}$  and other intervals.

To compare  $\Delta t_{QB,I}$  with other intervals, the QSC needs to obtain  $\Delta t_{QB,I}$ , which is illustrated in Fig. 3.  $t_{DI}$  and  $t_{D2}$  are the delays of DCDL<sub>1</sub> and DCDL<sub>2</sub>, respectively. CLK<sub>QB</sub> and CLK<sub>1</sub> pass through DCDL<sub>1</sub> and DCDL<sub>2</sub>. If the timing of CLK<sub>1</sub> (i.e  $t_I$ ) and  $t_{D2}$  (i.e  $t_2$ ) are equal, the difference between  $t_{DI}$  and  $t_{D2}$  (i.e  $\Delta t_D$ ) is equal to  $\Delta t_{QB,I}$ . According to the output of the BBPD, the loop controller calculates CODE<sub>1</sub> with a unit of the delay gain of DCDL<sub>1</sub> to get  $\Delta t_D$  closer to  $\Delta t_{QB,I}$ .

When the QSC begins the operation, the loop controller selects QB<sub>OUT</sub> and I<sub>OUT</sub> to adjust  $\Delta t_D$ . Next, the loop



Fig. 4. (a) Block diagram of loop filter with adaptive delay gain controller and (b) example operation of loop filter.

controller selects  $I_{OUT}$  and  $Q_{OUT}$  to compare  $\Delta t_{I,Q}$  with  $\Delta t_D$ . Based on this result, the loop controller updates  $CODE_Q$  with a unit of the delay gain for  $DCDL_Q$  so that  $\Delta t_{I,Q}$  approaches  $\Delta t_D$ . Afterward, the same operations are performed for  $CODE_{IB}$  and  $CODE_{QB}$  with the corresponding delay gains. Since  $CODE_{QB}$  is updated,  $\Delta t_{QB,I}$  is also changed, and the QSC repeats the processes by updating  $\Delta t_D$ . As the operations are repeated, since  $\Delta t_D$  follows  $\Delta t_{QB,I}$ , if  $\Delta t_{I,Q}$ ,  $\Delta t_{Q,IB}$ , and  $\Delta t_{IB,QB}$  are equal to  $\Delta t_D$ , the skew has been corrected.

#### III. LOOP FILTER WITH ADAPTIVE DELAY GAIN CONTROLLER

Fig. 4(a) shows the loop filter of the QSC, which includes a loop controller and an adaptive delay gain controller. The loop controller consists of an adder, register set<sub>1</sub> and register set<sub>2</sub> to store the code of each DCDL and the previous outputs of the phase detector, and a timing controller (T-CON) which selects pairs of clock signals for comparison. The adaptive delay gain controller has four shift registers for adjusting the delay gains of DCDL<sub>Q</sub>, DCDL<sub>IB</sub>, DCDL<sub>QB</sub>, and DCDL<sub>1</sub> (G<sub>Q</sub>, G<sub>IB</sub>, G<sub>QB</sub> and G<sub>1</sub>, respectively), and logic gates for determining whether to change the delay gains, and a lock detector.

Fig. 4(b) illustrates the operation of the loop filter, using an example in which  $\Delta t_{I,O}$  is modified. The loop filter initiates the modification of CODE<sub>0</sub> and G<sub>0</sub> by asserting SEL signal. The initial delay gains, including G<sub>Q</sub>, are set at the first iteration. The current output of the phase detector (PD<sub>OUT.I,Q</sub>), which is the result of the comparison of current  $\Delta t_{I,Q}$  with  $\Delta t_D$ , is compared with its previous value (PD<sub>PREV,I,Q</sub>), which was stored in the loop controller, using the XOR gate. If PD<sub>OUT,I,Q</sub> is not equal to PD<sub>PREV,I,Q</sub>, which means that the difference between  $\Delta t_{LO}$  and  $\Delta t_D$  is less than the product of G<sub>0</sub> and the resolution of DCDL<sub>0</sub>, the shift register decrements G<sub>0</sub>; otherwise, G<sub>0</sub> is retained. The loop controller updates CODE<sub>0</sub> to get  $\Delta t_{I,0}$  closer to  $\Delta t_D$  by adding or subtracting G<sub>Q</sub> to the previous CODE<sub>Q</sub>. Since the amounts of the skew between the clocks are not equal, the adaptive delay gain controller and the loop controller



Fig. 5. Changes of  $\Delta t_{I,Q}$  and  $\Delta t_D$  (a) with and (b) without adaptive delay gain controller from simulation.



Fig. 6. Block diagram of prototype chip.

perform the same processes for the  $G_{IB}$ ,  $G_{QB}$ ,  $G_1$ , and  $CODE_{IB}$ ,  $CODE_{QB}$ ,  $CODE_1$  individually to cut down the correction time for each clock. By repeating these operations, the QSC can quickly bring the intervals between the quadrature clocks closer to the value of  $\Delta t_D$  at the beginning of the operation while also bringing  $\Delta t_D$  closer to T/4, reducing the residual skew by gradually decreasing the delay gains. When all the delay gains reach the minimum value ( $G_{min}$ ), which means that the amounts of skew are acceptably low, the lock detector asserts a lock signal (Lock), and the adaptive delay gain controller stops changing the delay gains. Afterward, the QSC detects and corrects skew with the minimum gain to cope with small skew change caused by noise from clock distributions.

In previous studies that applied SAR algorithms to improve correction time [10,11], each bit of the code of the delay lines is fixed once it is determined. In the process of determining the upper bit, if an error occurs due to factors such as noise, it cannot be corrected in subsequent operations. But the method proposed in this paper can correct errors through subsequent operations even if an incorrect decision occurs in the previous operations while simultaneously improving correction time.

Fig. 5 shows the changes of  $\Delta t_{l,Q}$ , and  $\Delta t_D$  with and without the adaptive delay gain controller from a simulation of the QSC correcting 3 GHz quadrature clocks with a bandwidth of 0.3 GHz. In Fig. 5(a), the adaptive delay gain controller initially outputs the largest delay gains (initial delay gains). Subsequently, the adaptive delay gain controller outputs the smaller delay gain, determined by sensing the change in the polarity of the difference between  $\Delta t_{l,Q}$  and  $\Delta t_D$ . The loop controller reduces the difference between  $\Delta t_{l,Q}$  and  $\Delta t_D$ , while  $\Delta t_D$  approaches T/4, 83.3 ps. As shown in Fig. 5(a) and (b), the adaptive delay gain controller enables fast correction of the QSC.

#### IV. MEASUREMENT RESULTS

Fig. 6 shows the block diagram of a prototype chip, which consists of an IQ divider (IQ DIV), the QSC, a



Fig. 7. Measurement setup, die photograph and layout of prototype chip.



Fig. 8. Measured waveforms of quadrature clocks (a) before and (b) after QSC operation.

counter, a quarter-rate single-ended 1-tap DFE, and a reference voltage generator ( $V_{ref}$  gen.). The IQ divider generates quadrature clock signals by dividing the external clock by two. The outputs of the QSC are applied to the DFE to substantiate the effectiveness of the QSC in memory interfaces. A counter counts oscillations of the clock signals until Lock is asserted to measure the correction time.

Fig. 7 shows the measurement setup, the die photograph, and the layout of the prototype chip. A bit-error-rate (BER) tester (Anritsu MP1800A) supplies 12 Gb/s PRBS7 data and differential 6 GHz clock signals to the prototype chip and measures the BER before and after the operation of the DFE and the QSC. The channel for data transmission consists of an SMA cable, an SMA connector, and an FR4 PCB trace. The quadrature clocks before and after QSC operation are observed using an oscilloscope (Tektronix MSO 73304DX). The prototype chip was fabricated with a 65nm CMOS process, and the active area of the QSC is 0.0274 mm<sup>2</sup>.

Fig. 8 shows the waveform of the quadrature clocks at 3 GHz before and after the QSC. The QSC was able to correct the skew to 0.8 ps. Fig. 9(a) and (b) show 12 cases of the skew of 3 GHz quadrature clocks before and after the operation of QSC. The input skew from -21 ps to 14.8 ps are all corrected to within 0.8 ps. Fig 9(c) shows the corresponding correction times with and without the adaptive delay gain controller, which reduced the correction time by an average factor of 3.9. The QSC consumes a total of 6.45 mW of power at 3 GHz.

Fig. 10 shows the measured BER curves of the DFE with and without our QSC operation in the environment with the input skew of 13.2 ps. At 12 Gb/s PRBS7 input, the DFE



Fig. 9. Measured skew (a) before and (b) after QSC operation, and (c) distribution of correction times with and without adaptive delay gain controller.



Fig. 10. Measured BER curves of DFE with and without QSC.

without the operation of the QSC only achieves a BER of  $10^{-5}$ . The DFE with the QSC operation achieves a BER of  $10^{-12}$  with an eye width of 140 mUI under the same condition. These results show the effectiveness of the QSC in the memory interface.

Table I compares the performance of previous skew correctors with that of our QSC. FoM is the product of the resolution of the delay unit and the correction time normalized in the clock period and is an index showing how much the trade-off between the accuracy and the correction time has been relaxed [7].

### V. CONCLUSION

We have presented a QSC with an adaptive delay gain controller for memory interfaces. The adaptive delay gain controller compares the amounts of the skew between the clocks and the products of delay gains and the resolution of the DCDLs and adjusts the delay gains. For fast correction, the delay gains for each DCDL are individually adjusted. Our QSC can achieve low residual skew and fast correction using the adaptive delay gain controller, which reduces correction time by a factor of 3.9 compared to that without the adaptive delay gain controller. The QSC achieves a residual skew of 0.8 ps and consumes 6.45 mW at 3 GHz. Using the QSC, the quarter-rate single-ended DFE achieves a BER of  $10^{-12}$  at 12 Gb/s, verifying the effectiveness of the QSC in the memory interfaces.

TABLE I. PERFORMANCE COMPARISON WITH OTHER SKEW CORRECTORS FOR MEMORY INTERFACES

|                              | [9]                          | [10]                                                | [11]                                                                      | This<br>work                                                |
|------------------------------|------------------------------|-----------------------------------------------------|---------------------------------------------------------------------------|-------------------------------------------------------------|
| Process (nm)                 | 130                          | 65                                                  | 40                                                                        | 65                                                          |
| Frequency<br>(GHz)           | 0.4-0.8                      | 0.9-1.1                                             | 0.8-2.3                                                                   | 3                                                           |
| Architecture                 | MDLL                         | QSC                                                 | QSC                                                                       | QSC                                                         |
| Loop filter                  | Four 90°<br>controll-<br>ers | Duty<br>cycle<br>corrector<br>with SAR<br>algorithm | Single<br>loop with<br>minimum<br>delay<br>tracking<br>& SAR<br>algorithm | Single<br>loop with<br>adaptive<br>delay gain<br>controller |
| Correction range<br>(ps)     | 34.6<br>@ 0.8<br>GHz         | 40<br>@ 1GHz                                        | 101.6<br>@ 2.3<br>GHz                                                     | 21.2<br>@ 3 GHz                                             |
| Resolution (ps)              | 5.08                         | 5                                                   | 1.23                                                                      | 0.3                                                         |
| Residual skew<br>(ps)        | 6.25                         | 5                                                   | 2.1                                                                       | 0.8                                                         |
| Correction time<br>(ns)      | 93.75                        | 56                                                  | 500                                                                       | 71.5                                                        |
| Power efficiency<br>(mW/GHz) | 4.13                         | 2.6                                                 | 3.87                                                                      | 2.15                                                        |
| FoM <sup>a</sup> (ns)        | 0.38                         | 0.28                                                | 0.76                                                                      | 0.07                                                        |

a. Figure of Merit (FoM) = resolution · (correction time / clock period)

#### ACKNOWLEDGMENT

This work was supported by Institute for Information & Communications Technology Promotion (IITP) Grant funded by the Korea Government (MSIP) (No. 2020-0-01300, Development of AI-specific Parallel High-speed Memory Interface).

#### REFERENCES

- K. Ha et al., "A 7.5Gb/s/pin LPDDR5 SDRAM with WCK clocking and non-target ODT for high speed and with DVFS, internal data copy, and deep-sleep mode for low power," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Digest Tech. Papers*, pp. 378-380, February 2019.
- [2] J. Kim and S. Han, "A low-power fast-lock DCC with a digital dutycycle adjuster for LPDDR3 and LPDDR4 DRAMs," *Institute of Electronics, Information and Communication Enginners*, vol. 15, no. 7, pp. 1-9, March 2018.
- [3] M. Kossel *et al.*, "A 10 Gb/s 8-tap 6b 2-PAM/4-PAM Tomlinson-Harashima precoding transmitter for future memory-link applications in 22-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3268-3284, December 2013.
- [4] J.-H. Chae, H. Ko, J. Park and S. Kim, "A quadrature clock corrector for DRAM interfaces, with a duty-cycle and quadrature phase detector based on a relaxation oscillator," *IEEE Trans. Very Large Scale Integr. (VLSI) syst.*, vol. 27, no. 4, pp. 978-982, April 2019.
- [5] Y. J. Kim, K. Song and S. H. Cho, "A 2.3-mW 0.01-mm<sup>2</sup> 1.25-GHz quadrature signal corrector with 1.1-ps error for mobile DRAM interface in 65-nm CMOS", *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 27, no. 4, pp. 397-401, April 2017.
- [6] D. Kim et al., "A 1.1V 1ynm 6.4Gb/s/pin 16Gb DDR5 SDRAM with a phase-rotator-based DLL, high-speed SerDes and RX/TX equalization scheme," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)* Digest Tech. Papers, pp. 380-382, February 2019.
- [7] S. Kim, X. Jin, J. Chun and K. Kwon, "A digital DLL with 4-cycle lock time and 1/4 NAND-delay accuracy," in *IEEE Asian Solid-State Circuits Conf. (ASSCC)*, pp. 1-4, November 2015.
- [8] H. Kang et al., "Process variation tolerant all-digital 90° phase shift DLL for DDR3 interface," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 10, pp. 2186-2196, October 2012.

- [9] K. Ryu, D. Jung and S. Jung, "Process-variation-calibrated multiphase delay locked loop with a loop-embedded duty cycle corrector," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 61, no. 1, pp. 1-5, January 2014.
- [10] J. Cho, and Y.-J. Min, "An all-digital duty-cycle and phase-skew correction circuit for QDR DRAMs," *Institute of Electronics, Information and Communication Engineers*, vol. 15, no. 9, pp. 1-6, May 2018.
- [11] S. Shin, H. Ko, S. Jang, D. Kim and D. Jeong, "A 0.8-to-2.3GHz quadrature error corrector with correctable error range of 101.6ps using minimum total delay tracking and asynchronous calibration onoff scheme for DRAM interface," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Digest Tech. Papers*, pp. 340-342, Februrary 2020.