## 266–2133 MHz phase shifter using all-digital delay-locked loop and triangular-modulated phase interpolator for LPDDR4X interface

J.-H. Chae, M. Kim, H. Ko, Y. Jeong, J. Park, G.-M. Hong, D.-K. Jeong and S. Kim $^{\boxtimes}$ 

A 266–2133 MHz phase shifter is proposed for LPDDR4X interface, utilising an all-digital delay-locked loop (DLL) and a triangular-modulated phase interpolator (PI) to improve the jitter and linearity. The DLL consists of two kinds of DLLs: a global DLL to assist fast locking; and a local DLL which uses an adaptive-window phase detector and a folded delay line to reduce jitter and improve linearity. The PI uses a triangular-modulated clock waveform to achieve good linearity over a wide frequency range. The prototype chip is implemented in a 65 nm CMOS process. The measured differential non-linearity of the phase shifter is <0.91 LSB at 2133 MHz, and the power efficiency is about 2.7 mW/GHz.

*Introduction:* As the use of the mobile devices grows rapidly, the demand for high-bandwidth mobile memory such as LPDDR4X is increasing. Recently, various techniques such as  $C_{I/O}$  minimisation [1] and dual-loop phase-locked loop (PLL) [2] have been introduced to increase data-rate of memory interface. To improve the timing margin over a wide frequency range and increase the maximum achievable data-rate, however, the jitter and linearity of the phase shifter should be also enhanced.

A PLL-based phase shifter can reduce jitter, but require a large number of clock (CLK) distribution lines in a multi-channel architecture, which consumes a lot of power and occupies a large area [3]. Although the phase shifter using a delay line (DL) achieves good power efficiency and linearity, it is difficult to perform 1UI-based phase shifting, and the power consumption will be increased at low frequencies [4]. The injectionlocked oscillator (ILO)-based phase shifter is small in area and good linearity, but it is not suitable for applications with a wide frequency range due to its inherent narrow locking range characteristics [5].

In this Letter, we propose a 266–2133 MHz phase shifter using alldigital delay-locked loop (DLL) and phase interpolator (PI) that support training operations for LPDDR4X interface. To reduce jitter and improve linearity over a wide frequency range, the DLL uses an adaptive-window phase detector (PD) and a folded DL, and the PI uses a triangular-modulation.

LPDDR4X interface architecture including phase shifter: In LPDDR4X interface, most of the circuits performing complicated operations, including training, are accommodated in the controller. Fig. 1 shows a block diagram of the LPDDR4X interface for the controller, including the phase shifter. A 266–2133 MHz CLK signal is produced by a CLK generator and is distributed to a global DLL and a local DLL. The global DLL is used to assist fast locking of the local DLL. The phase shifter consists of the 180° phase-shift local DLL and the PI. The local DLL generates multiphase CLK signals, and the PI controls the DQ/DQS phase in steps of 1/64UI. A digitally-controlled delay line compensates for the difference in delay between the DQ and DQS paths. A DQS generating unit, a 2:1 serialiser, a driver (DRV), and a continuous-time linear equaliser (CTLE) transmit and receive the differential DQS; and a DQ generating unit, a 16:1 serialiser, the DRV, and the CTLE transmit and receive the single-ended DQ.



Fig. 1 Block diagram of LPDDR4X interface for controller including phase shifter

*Implementation and operation of a proposed DLL and PI:* A block diagram of the global DLL and the local DLL is shown in Fig. 2*a*. The global DLL has an open-loop coarse time-to-digital converter (TDC) architecture, and an 180° phase-shift locking is completed within a few CLK cycles through an asynchronous one-time lock. When the global DLL has locked, a lock code is transmitted to the local DLL, and all the circuits in the global DLL are powered down.



**Fig. 2** *Global DLL and local DLL a* Block diagram

b Operation of folded DL



Fig. 3 Locking and tracking operation of the local DLL and the adaptive-window PD under noisy environment

The delay code is received from the global DLL, and then the local DLL immediately enters a coarse lock state. Then the local DLL tracks the input CLK phase continuously so as to compensate for voltage and temperature variations, as well as mismatches between the global and local DLLs. The local DLL generates the multiphase CLK signals (CLK<sub>0</sub>, CLK<sub>90</sub>, and CLK<sub>180</sub>) and transmits them to the PI. The folded DL has coarse and fine DLs. A conventional DL consisting of coarse DL and fine DL can cause a glitch or signal distortion at the moment when the coarse delay code (CDC) is changed because all of the fine delay code is changed at the same time. It increases jitter and reduces linearity, thus reducing the timing margin and possibly causing the WRITE and READ errors. Fig. 2b shows the operation of our folded DL which has two delay codes at the same delay time. If the local DLL is initially locked near the change point of the CDC in the folded DL, the DLF in the DLL changes the delay code to the other code further away from the change point of the CDC, suppressing both the glitch and signal distortion. Fig. 3 shows the locking and tracking operations of the local DLL and the adaptive-window PD in a noisy environment. Three RFDLs make the phase sampling window, and the 3b window controller adjusts its window width in response to the lock state of the local DLL. Two DFFs and the CTRL logic issue WUP and WDN signals, and the counter (CNT) and majority voter decide one of three states: UP, DN, and HOLD. If some combination of supply/ground noise, the input CLK jitter, and supply droop due to the large dynamic current consumption cause the local DLL to break the locking state, then the CTRL logic determines that the lock is broken, and window controller widens the phase sampling window. After the locking state has been entered, the phase sampling window is narrowed. Through this repetitive locking process, the adaptive-window PD adjusts its phase-sampling window, depending on the input CLK jitter, UP/DN dithering, and the supply/ground noise, and this improves the jitter performance of the DLL. To compare the jitter performance of this DLL with that of the DLL using a bang-bang PD (BBPD), the BBPD is also supported, and the CTRL logic issues a BBUPDN signal to control this mode of operation.

To improve the linearity of the PI over a wide frequency range, a high slew-rate is required between adjacent interpolating CLK phases, but it is difficult to achieve with a conventional PI. This difficulty is overcome by introducing a triangular waveform generator (TWG) into the PI, as shown in Fig. 4. It uses a programmable current source and a capacitor array to convert a square waveform of the CLK signal into a triangular waveform having a high slew-rate over a wide frequency range. Also, since TWG generates a high slew-rate CLK signal, phase interpolating can be smoothly performed even if only three CLK signals with 90° intervals are transmitted from the local DLL, which reduces the complexity, power consumption, and area.



Fig. 4 Block diagram of triangular-modulated PI

Measurement results: The prototype chip is fabricated in a CMOS 65 nm process. Fig. 5 shows the measurement setup and die photograph. The input CLK signal is provided by a CLK source, while the output CLK signal is captured by an oscilloscope. In order to check the fast lock and jitter characteristic, the DLL has measured alone. Fig. 6a demonstrates that the global DLL assists fast locking: the local DLL achieves an 180° phase-shift lock within two cycles at 266 MHz and within seven cycles at 2133 MHz. Fig. 6b compares PD. With the BBPD, the jitter is  $14.78 \text{ ps}_{rms}/63.59 \text{ ps}_{pp}$ , but with the adaptivewindow PD, it is 3.08 psrms/19.93 pspp. Fig. 7a compares results at the CDC change point for the DLL with a conventional DL, which causes signal distortion, and with a folded DL. The measured differential non-linearity (DNL) curve of the PI is shown in Fig. 7b. We see that the DNL is <0.90 LSB at 266 MHz, and <0.91 LSB at 2133 MHz. The power efficiency of the proposed phase shifter is 2.7 mW/GHz. Table 1 shows a performance comparison with other phase shifters.



Fig. 5 Measurement setup and die photograph with magnified layout of phase shifter (local DLL and PI)



Fig. 6 Measured DLL performance

a Measured lock time using global DLL at 266 and 2133 MHz

 $b\,$  Measured jitter performance at 2133 MHz of local DLL using BBPD and adaptive-window PD



**Fig.** 7 Measured DL in the DLL and phase shifter performance a Measured conventional and folded DL at CDC change point b Measured DNL curve of phase shifter

Table 1: Performance comparison with other designs

|                 | [3]           | [4]      | [5]                   | This work            |
|-----------------|---------------|----------|-----------------------|----------------------|
| Process         | 40 nm         | 45 nm    | 0.18 µm               | 65 nm                |
| Туре            | PLL + PI      | DL + PI  | ILO                   | DLL + PI             |
| Frequency range | 1.35–2.15 GHz | 5 GHz    | 1–2 GHz               | 0.266-2.133 GHz      |
| DNL             | <1.06 LSB     | <1.3 LSB | —                     | <0.91 LSB            |
| Area            | —             | _        | 0.002 mm <sup>2</sup> | $0.007 \text{ mm}^2$ |

*Conclusion:* A 266–2133 MHz phase shifter using an all-digital DLL and a PI for LPDDR4X interface is proposed. It uses a DLL which adopts an adaptive-window PD and a folded DL, and a triangularmodulated PI to reduce jitter and improve linearity over a wide frequency range. According to the measurement results, the proposed phase shifter achieves the enhanced jitter and linearity performance.

© The Institution of Engineering and Technology 2017 Submitted: *10 April 2017* E-first: *15 May 2017* doi: 10.1049/el.2017.1291

One or more of the Figures in this Letter are available in colour online.

J.-H. Chae, M. Kim, H. Ko, Y. Jeong, J. Park, D.-K. Jeong and S. Kim (Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea)

□ E-mail: suhwan@snu.ac.kr

G.-M. Hong (SK Hynix, Icheon, Gyeonggi-do, Republic of Korea)

## References

- Amirkhany, A., Wei, J., Mishra, N.K., *et al.*: 'A 12.8-Gb/s/link tri-modal single-ended memory interface', *J. Solid-State Circuits*, 2012, 47, (4), pp. 911–925, doi: 10.1109/JSSC.2012.2185369
- 2 Wu, T., Shi, X., Kaviani, K., et al.: 'Clocking Circuits for a 16 Gb/s memory interface'. IEEE CICC, San Jose, USA, September 2008, pp. 435–438
- 3 Leibowitz, L., Palmer, R., Poulton, J., et al.: 'A 4.3 GB/s mobile memory interface with power-efficient bandwidth scaling', J. Solid-State Circuits, 2010, 45, (4), pp. 889–898, doi: 10.1109/JSSC.2010.2040230
- 4 Dickson, T.O., Liu, Y., Rylov, S.V., et al.: 'An 8x 10-Gb/s sourcesynchronous I/O system based on high-density silicon carrier interconnects', J. Solid-State Circuits, 2012, 47, (4), pp. 884–896, doi: 10.1109/JSSC.2012.2185184
- 5 Cai, L.J., Xiao, P.Y., and Wen, Q.S.: '1-2 GHz 2 mW injection-locked ring oscillator based phase shifter in 0.18 μm CMOS technology', *Electron. Lett.*, 2016, **52**, (22), pp. 1858–1860, doi: 10.1049/ el.2016.2501