# Low-Jitter Multi-phase Clock Generation: A Comparison between DLLs and Shift Registers Xiang Gao, Eric A.M. Klumperink, Bram Nauta CTIT Research Institute, IC Design Group, University of Twente 7500AE, Enschede, The Netherlands E-mail: X.Gao@utwente.nl Abstract—This paper shows that, for a given power budget, a shift register based multi-phase clock generator (MPCG) generates less jitter than a delay-locked loop (DLL) equivalent when both are realized with current mode logic (CML) circuits and white noise is assumed. This is due to the factor that the shift register MPCG has no jitter accumulation from one clock phase to the other as in the DLL based MPCG. For N-phase clock generation, the shift register MPCG needs a reference clock with N times higher frequency and thus requires a VCO with higher frequency than the DLL counterpart. However, we can show that this does not lead to additional power consumption. ### I. Introduction Multi-phase clocks are useful in many applications. In high-speed serial link applications [1], multi-phase clocks are used to process data streams at a bit rate higher than the internal clock frequencies. In wideband wireless communication systems like a software defined radio, multi-phase clocks can be used for cancelling un-wanted harmonics and sidebands without using filters[2]. For multi-phase clock generation, delay-locked loops (DLLs) are often used [3]. Other than a DLL, a shift register can also be used to generate multi-phase clocks [2]. Compared with a DLL multi-phase clock generator (MPCG), a shift register based MPCG uses N times higher frequency for N-phase clock generation and at first glance seems to have more power consumption. However, a shift register MPCG does not have jitter accumulation from one clock phase to the other clock phase as in a DLL equivalent, which should be taken into account for a fair comparison. This work makes a solid comparison between MPCGs using a DLL and a shift register, primarily based on their power and jitter performance. The rest of the paper is arranged as follows. Section II describes the architecture of a DLL MPCG and analyses its stochastic jitter. Section III examines the stochastic jitter of a shift register based MPCG. These two structures are then compared in Section IV. In section V, the simulation results are presented and Section VI discusses the conclusions. Figure 1. (a) DLL based MPCG architecture (b) CML delay unit schematic #### II. DLL BASED MPCG JITTER ## A. DLL Based MPCG Architecture The architecture of a DLL based MPCG is shown in Fig.1(a). It consists of a voltage controlled delay line (VCDL) which has N identical delay units (DUs) and a control loop which is formed by a phase detector (PD), a charge pump (CP) and a loop filter (LF). In the DLL, a reference clock generated by a VCO, $CLK_{ref}$ , with the wanted frequency f is propagated through the VCDL. The loop compares the phase of the last output of the VCDL $CLK_N$ with $CLK_{ref}$ and controls the VCDL so that the total delay time is one reference clock period. Therefore, the outputs of the N DUs $CLK_I \sim CLK_N$ are multi-phase clocks with ideally $2\pi/N$ phase shift in between. Because of better supply noise and substrate bounce rejection, current mode logic (CML) delay units are often used in DLL designs. To compare the output jitter of a DLL based MPCG and a shift register based MPCG, we assume CML circuits are used in both structures. The simplified schematic of a CML delay unit is shown in Fig.1(b). It is based on an NMOS source coupled pair driving the resistive load $R_L$ and biased by a current source $I_B$ . Due to the full switching of the tail current, the differential output swing $V_{SW}$ is determined by $R_L$ and $I_B$ as: $$V_{SW} = I_B \cdot R_L \tag{1}$$ The amount of delay $t_d$ is primarily determined by the load resistance $R_L$ and the load capacitance $C_L$ at the output node. If $t_d$ is measured from the input clock crossing point to the output clock crossing point, it can be approximated as [4]: $$t_d = \ln 2 \cdot R_L C_L = \ln 2 \cdot (V_{SW} / I_R) \cdot C_L \tag{2}$$ ## B. DLL Based MPCG Output Jitter In this work, we assume all noise sources are white and analyze absolute jitter performance, for simplicity. In a DLL based MPCG, there are three jitter sources: the reference clock, the VCDL and the PD/CP/LF control loop. The control loop jitter is usually relatively small [5] and thus ignored hereafter. It has been shown in [5][6] that for an optimal DLL design, its output jitter is defined by the reference clock and the jitter of a free-running VCDL. The DLL renders no improvement on the reference and VCDL jitter. For a free-running VCDL, the jitter will accumulate from one delay unit to the other. If we define the rms jitter variance of one delay unit as $\sigma^2_{Ald,D,U_5}$ the jitter variance on the output of the n<sup>th</sup> delay unit $(CLK_n)$ $\sigma^2_{Ald,D,LL,n}$ will be [5]: $$\sigma_{\Delta td,DLL,n}^2 = n\sigma_{\Delta td,DU}^2 \tag{3}$$ For multi-phase clock applications like the poly-phase multi-path technique for harmonic rejection [2], the jitter of every clock phase is equally relevant. To measure the jitter performance of a group of clocks, the average jitter variance $\sigma^2_{Atd,DLL,avgN}$ is used. With (3), $\sigma^2_{Atd,DLL,avgN}$ can be calculated as: $$\sigma_{\Delta td,DLL,avgN}^2 = \frac{N+1}{2}\sigma_{\Delta td,DU}^2 \tag{4}$$ In (4), only the VCDL jitter is taken into account, the reference clock jitter will be discussed in Section IV. ## III. SHIFT REGISTER BASED MPCG JITTER ## A. Shift Register Based MPCG Architecture The architecture of a shift register based MPCG is shown in Fig.2(a). It consists of a divide-by-N followed by a D-FlipFlop (DFF) chain with N DFFs<sup>1</sup>. In the shift register MPCG, a reference clock generated by a VCO, $CLK_{ref}$ , with a frequency of N-f, is fed into the divide-by-N and the DFF chain. The divider-by-N generates a clock at the frequency of interest f which is then fed into the DFF chain at its D input. Since a DFF is sensitive to rising edges, the Q output of each DFF will be delayed from the previous DFF Q output by one period of $CLK_{ref}$ which is equivalently a $2\pi/N$ phase delay at the wanted frequency f. Therefore, the output clocks $CLK_I \sim CLK_N$ of the DFF chain are N-phase clocks with ideally $2\pi/N$ phase shift in between. # B. Shift Register Based MPCG Output Jitter For a proper operation of a DFF, its D input signal must be stable before the input clock starts to switch. When the input clock switches, the logic value on the D node is transferred to the Q output. In other words, the timing of the DFF output is determined by the DFF input clock. The D input signal only acts as an "enabler" of a transition and will not affect the jitter of the DFF output. In Fig.2(a), the divide-by-N is an "enabler" for the first DFF and will not affect the jitter of the DFFs. Therefore, the power Figure 2. (a) Shift register based MPCG architecture (b) DFF block schematic consumption on the divide-by-N can be made small. In addition, it is not an indispensable block. Its function can be incorporated into the DFF chain, e.g., with a simple AND gate feedback as in [2]. Therefore, the jitter and power of the divide-by-N is not taken into account in the following calculations. There are now two jitter sources in the shift register based MPCG: the reference clock and the DFF chain. Since the shift register MPCG is an open loop system, it has no improvement on the reference and DFF chain jitter. In Fig.2(a), the output jitter of a DFF is not affected by the jitter of the previous DFF since the previous DFF only acts as an "enabler". Therefore, there is no jitter accumulation from one clock phase to the other clock phase as in the DLL MPCG. If we define the rms jitter variance of one DFF as $\sigma^2_{Atd,DFF}$ , the average jitter variance for the set of N-phase clocks generated by the DFF chain $\sigma^2_{Atd,SR,avgN}$ can be calculated as: $$\sigma_{\Delta td.SR.avgN}^2 = \sigma_{\Delta td.DFF.noise}^2 \tag{5}$$ In (5), only the DFF chain jitter is taken into account, the reference clock jitter will be discussed in Section IV. ## IV. COMPARISON BETWEEN DLL AND SHIFT REGISTER It's clear from Section II and III that a DLL MPCG has two major jitter sources: the reference clock and the VCDL, while a shift register MCG also has two jitter sources: the reference clock and the DFF chain. The comparison will start with the reference clock jitter. Then the VCDL jitter and DFF chain jitter are compared. ## A. Comparing the Reference Clock Jitter From the above analysis, we see that the DLL and shift register based MPCGs both have no improvement on the reference jitter. However, the shift register MPCG requires a reference clock with N times higher frequency than that of the DLL equivalent. The VCO in the shift register then should work at N times higher frequency. This may lead to a concern that the shift register puts more requirements on the VCO. To compare VCOs running at different frequencies, the so called Normalized Phase Noise Figure of Merit $FOM_{Nor-PN}$ can be used [7]: $$FOM_{Nor-PN} = 10\log(L(f_m)) + 10\log(\frac{f_m^2}{f_{OSC}^2} \frac{P_{DC}}{1mW})$$ (6) in which $L(f_m)$ is single side band phase noise to carrier ratio of the VCO at an offset frequency $f_m$ , $f_{OSC}$ is the VCO working frequency and $P_{DC}$ is the VCO power dissipation. For the shift register MPCG, $f_{OSC}$ is N times higher. However, with the same phase noise requirement at the frequency of interest, $L(f_m)$ can be $N^2$ times higher since the shift register MPCG also <sup>&</sup>lt;sup>1</sup>The use of DFF chain here assumes that only the clock rising edges are used. If both rising and falling edges can be used, i.e., with 50% duty cycle reference clock, DFFs can be replaced with simpler latches [2] and thus reduce the shift register power consumption by two times. Figure 3. (a) Schematic of CML latch at the switching instant. (b) Simplified schematic for jitter analysis. functions as a divide-by-N which theoretically improves $L(f_m)$ by a factor of $N^2$ . These two cancel each other if substituted into (6). In other words, although the VCO in the shift register MPCG runs at N times higher frequency, it consumes the same power as the VCO in the DLL MPCG for the same quality of design. If LC VCOs are used, then higher working frequency may even be preferred, since Q tends to be higher and the required area for inductors is smaller. ## B. Comparing the VCDL and DFF Chain Jitter A DFF can be designed with two master/slave connected latches as shown in Fig.2(b). The output jitter of a DFF is not affected by the jitter of the first latch, since the first latch is only an "enabler". The CML implementation of a latch is shown in Fig.3(a). At the zero crossings of the input clock CLK, the differential stage M1 and M2 is balanced. For a proper operation of the DFF, the D signals of the latch are already stable before the CLK starts to switch. Therefore, one of the transistors M3 and M4 is off and the other one is in triode region. The same happens to the transistors M5 and M6. Neglecting the transistor's on resistance, the schematic of the latch can be simplified as shown in Fig.3(b)[8], with $R_L$ and $C_L$ the effective load resistance and capacitance. Fig.3(b) is exactly the same as the schematic of the CML delay unit as shown in Fig.1(b). Therefore, we can apply the same jitter analysis for a delay unit and a DFF. The jitter variance of the circuit shown in Fig.3(b) can be predicted using the analysis presented in [8] as: $$\sigma_{\Delta td}^2 = (1 + \gamma + \gamma_T \cdot \frac{g_{mT} R_L}{2}) \cdot \frac{2kTC_L}{I_p^2}$$ (7) where $\gamma$ and $\gamma_T$ are respectively the noise factor of the transistors of the differential pair and the tail bias transistor. $g_{mT}$ is the transconductance of the tail bias transistor. For a correct circuit operation, the tail bias transistor should be ensured to work in the saturation region. The overdrive voltage $V_{OVT}$ should be smaller than the allowable voltage headroom $V_{S,T}$ for the tail transistor. Therefore, $g_{mT}$ can be written as: $$g_{mT} = \alpha \frac{I_B}{V_{OVT}} \ge \alpha \frac{I_B}{V_{ST}}$$ (8) with $\alpha$ the transistor model parameter which is equal to two for the square-law model. In order to achieve low jitter, $g_{mT}$ should be set at its minimum value. Using (1) and (8), the jitter will then be: $$\sigma_{\Delta td}^2 = (1 + \gamma + \gamma_T \cdot \frac{\alpha I_B R_L}{2V_{S,T}}) \cdot \frac{2kTC_L}{I_B^2} = (1 + \gamma + \gamma_T \cdot \frac{\alpha V_{SW}}{2V_{S,T}}) \cdot \frac{2kTC_L}{I_B^2}$$ (9) With (2), (9) can be re-written as: $$\sigma_{\Delta td}^2 = \{ (1 + \gamma + \gamma_T \cdot \alpha \frac{V_{SW}}{2V_{ST}}) \cdot \frac{2kT}{\ln 2 \cdot V_{SW}} \} \cdot \frac{t_d}{I_R}$$ (10) Figure 4. Timing diagram for (a) a DU, (b) a latch When $V_{SW}$ is chosen, $V_{S,T}$ is a constant for a speed optimized CML circuit design and can be derived as [9]: $$V_{S,T} = V_{DD} - V_T - V_{SW} \left\{ \frac{1}{2} + \left(\frac{1}{2}\right)^{\frac{1}{\alpha}} \right\}$$ (11) Therefore, the bracketed part of (10) is a constant once $V_{SW}$ is chosen and can be represented by a constant c: $$\sigma_{\Lambda td}^2 = c \cdot t_d / I_B \tag{12}$$ In most of the clock generator designs, jitter and power are the two most important parameters. Via admittance level scaling [10], noise power and hence jitter variance can always be reduced at the cost of increasing power consumption. In order to take this tradeoff into account, a (1mW) power normalized jitter variance $\sigma^2_{\Delta td,NorP}$ is defined to make a fair comparison: $$\sigma_{\Delta td.NorP}^2 = \sigma_{\Delta td}^2 \cdot (P/1mW) \tag{13}$$ where $\sigma_{\Delta td}$ is the amount of jitter and P is the power consumption. For the same circuit, applying admittance level scaling won't change the value of its $\sigma^2_{\Delta td,NorP}$ . A design with a smaller $\sigma^2_{\Delta td,NorP}$ means it generates less jitter, given the same amount of power. For a CML circuit, the total power consumption is dominated by the static power. Therefore, the $\sigma^2_{\Delta td,NorP}$ can be derived with (12) and (13) as: $$\sigma_{\text{Atd NorP}}^2 = (c \cdot t_d / I_B) \times (I_B \cdot V_{DD} / 1mW) = (c \cdot V_{DD} / 1mW) \times t_d \quad (14)$$ which indicates that the $\sigma^2_{Atd,NorP}$ is proportional to the amount of delay for a CML delay unit when $V_{SW}$ is chosen. If we focus on the jitter generated by the DLL and the shift register, the $\sigma^2_{Atd,NorP}$ of the two structures can be compared with (4), (5), (12) and (13) as: $$\frac{\sigma_{\Delta td,NorP,SR,avgN}^{2}}{\sigma_{\Delta td,NorP,DLL,avgN}^{2}} = \frac{c}{I} \frac{\frac{t_{d,Latch}}{I_{B,Latch}}} \times \frac{N \cdot 2I_{B,Latch} \cdot V_{DD}}{1mW}}{\frac{1mW}{I_{B,DU}}} = \frac{4}{N+1} \cdot \frac{t_{d,Latch}}{t_{d,DU}} \times \frac{t_{d,DU}}{I_{B,DU}} \times \frac{N \cdot I_{B,DU} \cdot V_{DD}}{1mW}}{1mW}$$ (15) where the parameters with subscripts *DU* and *Latch* are related to a *DU* in the *DLL* and a latch in the shift register, respectively. Although the simplified jitter analysis schematic for the DU and the latch is the same, there are some differences between them. Fig.4 shows a timing diagram for the DU and the latch. For the DU, its output is a clock phase and its input is the previous clock phase. Therefore, the delay of the DU is functionally required to be: $$t_{d,DU} = \frac{T}{N} = \frac{1}{N \cdot f} \tag{16}$$ where T and f are the period and frequency of the wanted N-phase clocks. For the latch, its output is one clock phase and its input is the reference clock. There is no such a delay requirement as (16). For Figure 5. Simulation results for (a) a CML delay unit (b) DLL and shift register FOM comparison. the DFF to work properly, it should satisfy: $$t_{d,Latch} + t_{su} = (1+a)t_{d,Latch} < \frac{T}{N} = \frac{1}{N \cdot f}$$ (17) where $t_{su}$ is the setup time required by the DFF, a is the ratio between $t_{d,Latch}$ and $t_{su}$ which is design dependant and larger than zero. Equation (17) shows that there is a certain boundary for how high the working frequency of a shift register MPCG can be due to the technology limitation. Defining the maximum working frequency of the shift register MPCG as $f_{max,SR}$ , the latch delay will have its minimum value $t_{d,Latch,min}$ at $f_{max,SR}$ . With (17), $t_{d,Latch,min}$ can be derived as: $$t_{d,Latch,\min} = \frac{1}{1+a} \cdot \frac{1}{N \cdot f_{\max SR}}$$ (18) Since a small delay is preferred for a small $\sigma'_{Atd,NorP}$ , the latch delay can be chosen to its minimum value as in (18). For a DU, the delay is limited by (16). Taking this factor into account, (15) can be re-written as: $$\frac{\sigma_{\Delta td,NorP,SR,avgN}^2}{\sigma_{\Delta td,NorP,DLL,avgN}^2} = \frac{1}{1+a} \cdot \frac{f}{f_{\text{max},SR}} \cdot \frac{4}{N+1}$$ (19) The first part of (19) is smaller than one since the DFF needs a finite setup time. The second part in (19) is smaller than or equal to one as soon as the shift register can work at the wanted frequency. The third part of (19) is also smaller than or equal to one if the wanted number of clock phases is larger than 2, which is normally the case. Therefore, (19) is smaller than one which means that the shift register MPCG has a smaller $\sigma^2_{Atd,NorP}$ than the DLL MPCG as soon as it can work at the wanted frequency. Equation (19) also indicates that the advantage of the shift register based MPCG will be larger in applications where clocks with larger number of phases at lower frequencies are needed. Other than a smaller $\sigma^2_{Atd,NorP}$ , a shift register based MPCG also has other advantages like shorter settling time, wider tuning range and easy of design. However, the maximum achievable working frequency is lower than a DLL MPCG since the shift register MPCG works at N times higher than the frequency of interest and needs a VCO with higher frequency. This may pose design issues. # V. SIMULATION RESULTS In order to verify the conclusion in (14) that the $\sigma^2_{Ald,NorP}$ of a CML delay unit is proportional to the amount of delay, simulations were done in Spectre with time domain Pnoise analysis. The technology used is 0.13um CMOS. The delay is tuned by tuning the load capacitance. $V_{SW}$ is chosen to be half $V_{DD}$ . The simulation results are shown in Fig.5(a). The simulated values fit the calculated curve well. It shows very clearly the trend that the $\sigma^2_{Ald,NorP}$ increases with increasing delay time. To verify the comparison results for the DLL based MPCG and the shift register based MPCG, i.e., (19), simulation is done for a free running VCDL and a DFF chain with clean reference clock and *N* equal to 8. For the CML DFF used in the simulation, *a* is about 0.5. The simulation results are shown in Fig.5(b). The simulated values fit the calculated curve well. #### VI. CONCLUSION Analysis show that, for a given power budget, a shift register based MPCG generates less jitter than a DLL equivalent when both are realized with current mode logic circuits. The reason is that the shift register MPCG has no jitter accumulation from one clock phase to the other as in the DLL MPCG. Although the shift register MPCG requires a reference clock with higher frequency, it does not lead to additional power consumption on the VCO. In addition, the MPCG using a shift register has the degree of freedom to reduce the delay time while jitter generation of a CML circuit is proportional to its delay time for a given power budget. The advantage of the shift register based MPCG will be larger in applications where clocks with larger number of phases at lower frequencies are needed. ## REFERENCES - C. K. K. Yang, M. A. Horowitz, "A 0.8-μm CMOS 2.5 Gb/s oversampling receiver and transmitter for serial links", IEEE Journal of Solid-State Circuits, vol. 31, pp. 2015 - 2023, December 1996. - [2] R. Shrestha, E. Mensink, E. A. M. Klumperink, G. J. M. Wienk, B.Nauta, "A Multipath Technique Cancelling Harmonics and Sidebands in a Wideband Power Upconverter", IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, February 6-8, 2006. ISSCC Digest, pp. 452-453 - [3] C. C. Chung and C. Y. Lee, "A New DLL-Based Approach for All-Digital Multiphase Clock Generation", Solid-State Circuits, IEEE Journal of, Vol. 39, No. 3, March 2004 - [4] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, "Digital Integrated Circuits (A design perspective)", Prentice Hall, 2003 (Second Edition) - [5] R. C. H. van de Beek, E. A. M. Klumperink, C. S. Vaucher, and B.Nauta, "Low-Jitter Clock Multiplication: A Comparison Between PLLs and DLLs", IEEE Transactions on Circuits and Systems—II: Analog and Digital Signal Processing, Vol. 49, NO. 8, Aug. 2002 - [6] B. Kim, T.C. Weigandt and P.R. Gray, "PLL/DLL System Noise Analysis for Low Jitter Clock Synthesizer Design," in Proc. Int. Symp. On Circuits and Systems, June 1994. - [7] A. Wagemans, P. Baltus, R. Dekker, A. Hoogstraate, H. Maas, A. Tombeur, J. van Sinderen, "A 3.5mW 2.5GHz diversity receiver and a 1.2mW 3.6GHz VCO in silicon-on-anything", IEEE International Solid-State Circuits Conference, vol. XLI, pp. 250 251, February 1998. - [8] S. Levantino, L. Romano, S. Pellerano, C. Samori, A. L. Lacaita, "Phase noise in digital frequency dividers", Solid-State Circuits, IEEE Journal of Volume 39, Issue 5, May 2004 Page(s):775 – 784 - [9] R. C. H. van de Beek, "High-Speed Low-Jitter Clock Multiplication in CMOS", PhD thesis, University of Twente, 2004 ISBN 90-365-1989-6. (http://doc.utwente.nl/41485) - [10] E. A. M. Klumperink, B. Nauta, "Systematic Comparison of HF CMOS Transconductors", IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Vol. 50, No. 10, Pg. 728 -741, Oct. 2003