APP下载

A 5.12-GHz LC-based phase-locked loop for silicon pixel readouts of high-energy physics

2022-09-02XiaoTingLiWeiWeiYingZhangXiongBoYanXiaoShanJiangPingYang

Nuclear Science and Techniques 2022年7期

Xiao-Ting Li• Wei Wei• Ying Zhang • Xiong-Bo Yan,2 • Xiao-Shan Jiang •Ping Yang

Abstract There is an urgent need for high-quality and high-frequency clock generators for high-energy physics experiments. The transmission data rate exceeds 10 Gbps for a single channel in future readout electronics of silicon pixel detectors. Others, such as time measurement detectors, require a high time resolution based on the time-todigital readout architecture. A phase-locked loop (PLL) is an essential and broadly used circuit in these applications.This study presents an application-specific integrated circuit of a low-jitter, low-power LC-tank that is PLL fabricated using 55-nm CMOS technology. It includes a 3rdorder frequency synthesis loop with a programmable bandwidth, a divide-by-2 pre-scaler, standard low-voltage differential signaling interfaces, and a current mode logic(CML) driver for clock transmissions. All the d-flip-flop dividers and phase-frequency detectors are protected from single-event upsets using the triple modular redundancy technique. The proposed VCO uses low-pass filters to suppress the noise from bias circuits. The tested LC-PLL covers a frequency locking range between 4.74 GHz and 5.92 GHz with two sub-bands. The jitter measurements of the frequency-halved clock(2.56 GHz)are less than 460 fs and 0.8 ps for the random and deterministic jitters,respectively, and a total of 7.5 ps peak-to-peak with a bit error rate of 10-12. The random and total jitter values for frequencies of 426 MHz and 20 MHz are less than 1.8 ps and 65 ps,respectively.The LC-PLL consumed 27 mW for the core and 73.8 mW in total.The measured results nearly coincided with the simulations and validated the analyses and tests.

Keywords LC phase-locked loop · Analog electronic

1 Introduction

Phase-locked loops (PLLs) are widely used in radio frequency (RF) transceivers, digital or optical fiber communication systems, and high-energy physics experiments.A future collider for the study of physics relative to Higgs bosons and other particles would produce a large amount of data that would need to be readout rapidly and reliably[1].PLLs are essential for detector readout systems. A ringoscillating-based PLL(RO-PLL)usually consumes a small area and can achieve a large frequency range;however,the LC-tank-based PLL has advantages in terms of noise and high-frequency performance. In the high-speed serial link for the Large Hadron Collider(LHC)[2],where high-speed serializer chips (GBT and LOCx2) were employed in the on-detector transmitter, which collected, serialized, and transmitted low-speed parallel data from the frontend, LCPLLs were adopted to provide low-jitter high-speed clocks[3-5]. Because LC-PLLs require RF devices, the technologies used in certain applications are insufficient.Similar to MAPS readout chips that integrate pixel sensors,front-end amplifiers and digitized readout electronics in a single chip are more likely to use RO-PLLs [6, 7].

Along with the luminosity upgrades of the present and next-generation colliders,the total data volumes are significantly increased. Considering the limited space and lowmass material budget requirements,the data rate of a single channel is increased to 10 Gbps or more[3,8].The present advanced synchrotron radiation sources(CERN-PS,HEPS,and XFEL)[9-14]use hybrid pixel detectors whose sensors and readout electronics are separate.With an increase in the array size, hit frame size, and hit rate, future hybrid pixel readout chips [15, 16] also require a high serializer bandwidth,which makes high-frequency low-jitter PLLs necessary. Moreover, high-resolution time-to-digital converter(TDC)readout detectors require low-jitter PLLs[17].

In this study, we present a low-jitter, low-power LCPLL using 55-nm CMOS technology for the silicon pixel readout of a future hybrid pixel detector and TDC readout chip. The PLL employs a proposed LC-tank voltage-controlled oscillator (VCO), which uses low-pass filters to reduce phase noise. The low-pass filters help to simplify the circuit design; however, they occupy a large area. The next section presents the circuit design and simulation.Section 3 describes the electrical and preliminary X-ray test setups and results.Certain upgrades and applications of the PLL design are presented in Sect. 4,which is followed by Sect. 5 where the study is concluded.

2 Circuit design and simulations

2.1 The overall architecture

The architecture of the PLL prototype is shown in Fig. 1. It consists of a phase-frequency detector (PFD),charge pump(CP),an LC-VCO,a 2nd-order low-pass filter(LPF), and several dividers. The PFD adopts an edge-detection structure [18] to detect the phase and frequency errors. The CP is designed as a differential structure with complementary current sources and switches. Cascade current mirrors were used to minimize the channel modulation effect, and a unit gain buffer was used to have the two branches continue to work at the same DC operating points. The loop bandwidth (LBW) can be programmed from 250 kHz to 1.55 MHz through the programmable resistors of the LPF and the charging current to compromise the locking time, noise contribution, and process variation. A standard low-voltage differential signaling(LVDS) receiver (RX) and driver (TX) were used to receive the 40-MHz reference clock (RefCK) and transmit the low-speed test clock (TestCK), respectively. To maintain a 50% duty cycle for the input clocks, a divide-by-2 pre-scaler followed the RX.A standard current mode logic(CML) driver (Driv) with 5-stage pre-amplifies was adopted to test the frequency-halved clock (2.56 GHz).The reference currents of the TX and CML drivers were mirrored using a common bias generator. Both drivers can be turned off to obtain the power consumption of the core circuits.

2.2 The proposed LC-VCO

To achieve a low-jitter PLL, the LC-tank VCO is preferred owing to its high Q-factor. As the VCO gain (Kvco)amplifies noise, a low-jitter LC-VCO typically minimizes the value of Kvco,which limits the tuning range of a single band.Therefore,a switched capacitance array that provides multiple sub-bands is used to expand the total tuning range and cover the process variation[3,8,19-23].As the power supply and bias circuit of a tail current source also contribute large jitters to the LC-VCO, the reference design[21] adopts built-in low-dropout regulators (LDOs) to provide the power supply and bias voltage. The proposed LC-VCO uses off-chip LDOs for power supplies and two low-pass filters (R1-C1 and R2-C2) to suppress the noise from the bias circuits. Figure 2 presents the scheme of the LC-VCO. The filtering effect of noise is inversely related to the cutoff bandwidth, which is equal to 1/(2πRC).However, a smaller bandwidth occupies a larger area. We chose 106 kΩ and approximately 100 pF (50 pF) as the values of R1(R2)and C1(C2),respectively,for a trade-off between the noise performance and area. These capacitors were arranged between modules for isolation. In addition,RC filters were implemented in the CP to filter the bias noise.

The LC-VCO consists of a 3-terminal inductor, two ptype varactors, a pair of cross-coupled NMOS transistors,1-bit controlled metal-oxide metal capacitors (MOMCAPs), a current source, an enable switch, and two filters for biasing voltages. The 3-terminal inductor was selected due to its higher Q-factor than that of the 2-terminal inductor at 5 GHz, as indicated in the process document.The central tap to each port is equivalent to a 2-terminal inductor; therefore, the resonant frequency can be expressed as follows:

where L1 is the inductance value of the 3-terminal inductor, coefficient ‘‘a’’ is one or zero depending on the connection status of the MOM capacitors,and Cpvar,Cmom,and Cparrepresent the capacitance of the p-type varactor,MOM capacitor, and parasitic capacitance on one branch,respectively. The complementary negative-resistance units have better symmetry and phase noise performance.However,only an NMOS cross-coupled pair limited by the process power supply (1.2 V) was used to compensate for the energy loss of the LC-tank. A PMOS current mirror(P1)is preferred because it has a smaller flick noise than an NMOS current mirror. The MOM capacitors were used to provide a lower frequency band. The proposed LC-VCO can achieve a phase noise level of -113 dBc/Hz (1 MHz frequency offset of 5.12 GHz) at a Kvcovalue of approximately 0.74 GHz/V.

2.3 The PFD and the divider chain

Latches and d-flip-flops (DFFs) are sensitive to singleevent upsets (SEU). A triple modular redundancy technique (TMR) was employed to protect these circuits,including the pre-scaler, PFD, and DFF divider, as shown in Fig. 3.

The clock feedback chain consists of a CML buffer,three CML dividers, and a divide-by-32 DFF divider. The CML buffer has two-stage open-loop amplifiers to minimize the capacitance load of the VCO and drive the CML divider. A typical CML divider has an NMOS current source that limits the output swing. The current source is removed to obtain a larger clock swing,which can directly drive the DFF divider after the three stages.

2.4 Implementation

The LC-PLL was independently fabricated and integrated in a TDC readout chip using standard CMOS technology with one top metal, as shown in Fig. 4. The core area of the LC-PLL with sufficient decoupling capacitors was less than 700 × 1800 μm2, whereas the chip area was 900 × 2350 μm2. Most of the decoupling capacitors were MOM-CAPs that overlap with MOS-CAPs to satisfy the filling rules and enlarge the capacitance to filter noise from power supplies. As shown in Fig. 4b, the TDC readout chip consisted of a PLL, a multichannel front-end photomultiplier tube(PMT)discriminator,and an all-digital TDC circuit.It was developed to achieve a highresolution time measurement of high-energy physics detectors. The PLL provides a 426-MHz CMOS clock,which is divided by 12 with a 50%duty cycle,to the TDC.The high-speed test clock of the LC-PLL in the TDC chip was 5.12 GHz without any frequency division.

Fig. 2 The proposed LC-VCO scheme

Fig. 3 PFD and divider chain schemes

Fig. 4 (Color online) Photomicrographs of the LC-PLL

A 100-μm clearance between the outer coil and the boundary of the inductor was maintained to guarantee consistency with the inductance model. Traces of high frequency were short and isolated by power or ground lines to reduce parasitic loads and to avoid coupling to other signals. The control voltage (Vctrl) of the VCO is highly sensitive.Leading out of the Vctrlfor the test introduces an extra nonignorable resistance and capacitance, which can change the LPF parameters. Thus, a small transmission gate was inserted between the Vctrland the IO pad to reduce the parasitic effect on the Vctrlside, which also partially isolates the noise coupling from the IO pad and test board.

2.5 Simulations

Figure 5a demonstrates the simulated phase noise curves of the proposed LC-VCO at 5.12 GHz for different cases. Compared to the no-filter case (blue line with plus markers), the case in which two RC filters were applied(smooth red line)improved by approximately 38 dBc/Hz at a 1 MHz frequency offset. Because the transfer functions for the VCO and reference noise-related blocks including the reference clock,PFD,CP,and dividers,were high pass and low pass,respectively,the out-band and in-band noises were dominated by the VCO and other blocks. Figure 5(b) presents the simulated phase noise curves of the reference related part at 5.12 GHz and the 250-kHz LBW.

The integrated root-mean-square (RMS) jitter can be directly printed by using the ‘‘Jc’’ option in the ‘‘pnoise’’analysis tools or calculated using the equation below [24]:

where f1to f2are the integral intervals of the frequency,foscis the operating frequency, and L(f) is the simulated spectrum of the phase noise. Based on this design, both calculation results were very close,which were approximately 0.32 ps and 0.36 ps using Eq. (2) and the Jc value,respectively, with an integral interval ranging from 250 kHz to 10 GHz. The in-band integrated noise was approximately 0.16 ps (minimum LBW) and 0.41 ps(maximal LBW) with an ideal reference clock; therefore,the actual in-band noise should be larger.

Figure 5 (c) presents the transient response (middle) of the LC-PLL at an LPF capacitance ratio of 50 [25], the VCO clock waveforms (left) at the start phase, the waveforms of the VCO clock, and the test clock (right) when it is locked. At 5.12 GHz, the locking time of the PLL was less than 10 μs. The Dj value was approximately 0.66 ps,of which 0.38 ps was contributed by the loop, and another 0.28 ps was from the CML driver with 1.5 pF and 2.5 nH as the loads model. In a typical case (typical process,voltage corner, and room temperature), the core current is approximately 12.2 mA, and the estimated current of the entire PLL chip is approximately 56 mA.

3 Electrical and X-ray tests

3.1 Electrical experiments

The individual LC-PLL die was wire-bonded on a printed circuit board (PCB) and tested in the laboratory using a pulse/pattern generator (Agilent 81130A), power supply (GPD 4303S), and 16-GHz wide-band oscilloscope(LeCroy SDA 816Zi-A).Figure 6 presents a block diagram(left) and image (right) of the experimental setup. The lengths of the bonding wires for the high-frequency clocks were approximately 2.5 mm(nearly equivalent to a 2.5 nH inductance), and the longest wire was approximately 3.5 mm. SMA-SMA (Small A Type) and SMA-BNC(Bayonet Nut Connector) coaxial cables were used as interconnecting wires. The following presents the test results of the individual PLL dies in detail; the TDC chips with the built-in LC-PLL were in the process of packaging and will be tested later.

Regardless of the temporary IO problem, the power supply provided 1.8 V directly to the high-power supply of the RX and TX to ensure that the chip functioned normally.The measured current was approximately 54 mA and 5 mA for the 1.2 V and 1.8 V supply, respectively, that is 73.8 mW in total,which nearly agrees with the simulations.The 67 mA current shown in Fig. 6 contains the current of the PLL die,the LDO,and an additional current caused by the 50-Ω pull-down resistors in the oscilloscope as the load of the CML driver. The actual terminal resistor bridges the two differential ports, where the driving current flows through the resistor in the loop without an additional current consumption.Therefore,the power consumption of the PLL core without the CML driver was approximately 27 mW.

Fig. 6 Test setup of the LCPLL

Five boards were tested with 1-m coaxial cables. Figure 7 presents several results,where‘‘#1’’to‘‘#5’’indicate the numbers of the test boards,and the last digit‘‘0’’or‘‘1’’represents the sub-band number. The locked frequency ranges of the PLL are plotted in Fig. 7(a),where the solid lines with the dotted marks indicate the test results, the dashed lines with the triangular marks indicate the postlayout simulations, and the dash-dotted lines with the square marks indicate the schematic simulations. The total measured range was between 4.74 GHz and 5.92 GHz.The control voltages of the target frequency (5.12 GHz) presented a better performance of the CP on both sub-bands.The SB0 test curve(MOM-CAP off)was nearly consistent with the post-layout simulation;however,the SB1(MOMCAP on) test curve was slightly higher than that of the post-simulation. This indicates that the accurate values of the MOM-CAP and the extraction model for the post-layout simulation may not be precise, resulting in the inconsistency of the oscillating frequency level when the MOMCAP is switched on.However,the frequency ranges of the two sub-bands nearly agreed with those of the post-layout simulations. This indicates that the value and extraction model of the p-type varactor were reliable.

The clock jitter performance can be measured by using the analyzer tool ‘‘SDA II’’ in the oscilloscope [26]. The analysis tool considers the clock under testing as a periodic data sequence; thus, both rising and falling edges are sampled and superimposed for jitter calculations.Figure 7b presents a 2.56-GHz clock eye diagram with jitter measurements and a jitter histogram based on the smallest LBW settings.Figure 7c summarizes Rj,Dj,and Tj values for different LBWs.As shown in Fig. 7c,the jitter of board#2 board was apparently large near the LBWs of 1 MHz and 1.5 MHz,where the charging current was maximal.In this case, the nonlinearity of the CP may be increased,especially for board #2, owing to device variations. However,the measurements were relatively consistent when the LBW was less than 0.9 MHz. The results demonstrate that the best jitter performance where the LBW was approximately 250 kHz was as follows: Rj <460 fs, Dj <0.8 ps,and Tj <7.5 ps. Both sub-bands covered 5.12 GHz;however, SB0 had a slightly smaller noise than SB1 with the same configurations. The differential output eye amplitude of the CML driver at 2.56 GHz was approximately 550 mV. We also compared the test results of the 1-m and 2-m cables, which indicated that the jitter would increase by approximately 15%, and the amplitude would decrease by approximately 17% by using the 2-m coaxial cables (RG 174).

LVDS clocks of 20 MHz (TCK20) and 426 MHz(TCK426) were also tested. The TDC chip requires an RMS jitter of less than 1 ps for the 426 MHz clock.The Rj value of the 2.56 GHz meets the requirement. Theoretically, the divider chain contributes a significantly small RMS jitter (approximately several femtoseconds based on simulations); thus, TCK20 and TCK426 should have Rj values close to that of the 2.56 GHz clock. However, the test results demonstrated significantly higher jitter values.Table 1 lists the measured results. The phase noise (PN)simulation of the CML driver demonstrates that the PN spectrum of the CML driver (at 2.56 GHz) was smaller than- 115 dBc/Hz at the 10 Hz offset with a- 10 dB/Dec slope. The integrated jitter, which was calculated using an integral interval from 10 Hz to 10 MHz, was less than 4.5 fs and can be ignored.Using the same integral interval,the TX introduced Rj values of 185 fs and 2.74 ps at 426 MHz and 20 MHz, respectively. Since the square of the total random jitter equals to the square sum of each individual part, the calculated RMS jitters of TCK426 and TCK20 were 0.481 ps and 2.78 ps, if considering the obtained 0.444 fs as the clock jitter before drivers. However,this does not agree with the test results;the TX driver contributed significantly more phase noise jitter than the CML driver, especially at a lower frequency. This trend was consistent with the test results. The simulated Dj was approximately 42 ps for TCK426 with the loads model of 1.5 pF and 2.5 nH. The output amplitude of the TX was approximately 400 to 450 mV under the 1.8 V supply,coinciding with the simulation results.

A 9-kHz to 3-GHz spectrum analyzer(Agilent N9320B)was then used to characterize the phase noise spectrum of TCK426.The analyzer can obtain one phase noise value at a certain frequency offset, as shown in the left plot of Fig. 8. The right plot in Fig. 8 presents the phase noise spectrum based on the average of ten values at each frequency offset point. The integrated RMS jitter was approximately 1.807 ps (10 kHz to 1 MHz) and 2.376 ps(10 kHz to 10 MHz),which was 1.5 times that of the result by the oscilloscope.

Fig. 7 (Color online) Performance measurements of the LC-PLL with 1-m cables

Table 1 Measured performance of the clocks

Fig.8 Measured phase noise of TCK426 by the spectrum analyzer (N9320B)

In conclusion, TX transmission may introduce random jitters larger than expected, and the test methods and accuracy at different frequencies may also influence the results. To obtain the actual RMS jitter of the 426 MHz clock more precisely, we can optimize the TX noise, use the CML driver instead in the next version, or indirectly evaluate it from the TDC chip.

3.2 X-ray irradiation experiments

Radiation tolerance is an important issue in high-energy physics experiments. For example, the ATLAS Phase II upgrade experiment requires a total ionizing dose(TID)of approximately 1000 Mrad[27]for an entire lifetime for the inner-most layer, while 700 krad was required for the ALICE experiment [28]. The CEPC TDR proposes a maximum of 3.4 Mrad per year for the Z pole,with a safety factor of ten [1]. In the preliminary design, the LC-PLL was aimed to achieve a TID of at least 1 Mrad (Si). As no bipolar device was used in this design, the standard test procedure with a dose rate of 50-300 rad(Si)/s was considered with reference to the test methods document MILSTD-883 [29].

Fig. 9 (Color online) Test setup of X-ray irradiation experiment

The test board removed the LDO and was tested using Faxitron MultiRad 160 [30] with irradiation. The beam source (160 kV and 25 mA maximum) was placed on the top at a height of 8.4 cm height above the chamber top.There was a slot for filters just below the chamber top.There were seven shelves in the chamber to place the samples and for a passageway for the cables. Figure 9 presents the test setup, which is similar to that of the electrical test (Fig. 6). Longer power lines (>2 m) and signal cables (2 m) were used in the irradiation test. We followed the calibration procedure and obtained a reference raw dose rate of 41 Gy (air)/min at the S7 position, which was obtained by the dosimeter in the shelf center.The dose rate at the die position was approximately 40 Gy (air)/min(corresponding to 66.6 rad/s), which was estimated according to the inverse square law of distance (to the source). As the equipment unfortunately failed, the test lasted only 19 min and accumulated a total ionizing dose of approximately 76 krad (air). No apparent differences were observed during or after irradiation.

There was a ratio between the different reference materials owing to the absorptive capacity of the X-rays.Correction coefficient calculations were provided by the optical-electronics laboratory of the Southern Methodist University (SMU) physics department [31]. The dose can be expressed as follows:

where μen/ρ is the material mass energy-absorption coefficient and E and Φ(E) are the energy and fluence of the X-ray machine, respectively. The fitting curves (μen(E)/ρ)of the reference materials,Si and air,can be obtained from the reference μen/ρ data table [32]. Based on the energy spectrum and fluence of the X-ray machine, the estimated correction coefficient of Si to air was approximately 7.24 with a raw beam.The value was different in the presence of a filter. In this case, the rough TID of the LC-PLL was approximately 550 krad (Si) with a dose rate of 482.5 rad(Si)/s under typical conditions (1.2/1.8 V power supplies and room temperature). As the irradiation dose falls short of 1 Mrad (Si), more tests will be conducted after equipment maintenance, which would take several months. In addition, we are finding other methods to perform the irradiation tests.

3.3 Comparisons

Table 2 presents a comparison of the performance obtained in this study with other similar LC-PLLs,including four designs with a small and relatively constant VCO gain (Kvco) [20-23], and three PLLs for high-energy physics experiments [8, 19, 33].

4 Upgrade and applications

As found in tests and practical applications, two power supplies are inconvenient and consume more power and routing resources on the PCB.In this design,a 2.5 V power is used to meet the LVDS standard.In fact,RX has a wide receiving range of Vcom and a large-amplitude gain. The TX is only used for clock testing, considering the output swing and noise rather than Vcom. Thus, the RX and TX will be optimized with only a power of 1.2 V and transistors in the LC-PLL upgrade and applications.

Using the LC-PLL as the clock generator,we designed a 10.24-Gbps serializer prototype fabricated using the same technology. In addition to the PLL, the preliminary serializer includes a 32 to 1 CMOS multiplexer,32-bit pseudorandom binary sequence generator, single-end to differential converter, and CML driver. The serializer occupied an area of 980 × 1520 μm2.The 5-stage multiplexer adopts a binary-tree scheme at a half rate, which can reduce the highest frequency of the driving clock to 5.12 GHz and reduce power consumption. As both clock edges are used to latch data, low clock jitters, particularly duty cycle distortion (DCD), are critical. According to the analysis,the DCDs of the previous 4-stages multiplexer influence the data sampling timing, whereas the DCD of the last stage directly affects the output eye diagram, causing the big-small-eye problem. Although the DCD of the LC-PLL is significantly small (approximately 50 fs), as shown in Fig. 7 (b), the clock conversion and transmission to the multiplexer results in a new DCD. Thus, the CML-to-CMOS circuit should be designed carefully, and an additional duty cycle corrector may be required.The respective data rates of the input(parallel)and output(serial)data are 320 Mbps and 10.24 Gbps.

Another concern is the driving capability of the CML drivers. During the LC-PLL test, only a 2.56 GHz (corresponding to 5.12 Gbps) clock was observed. The approximate load model for the driver was 1.5 pF and 2.5 nH,which is suitable for clock simulations; the simulation and test results were consistent.The eye height of the clock eye diagram (Fig. 7b) corresponds to the short one/zero of the serial data at 5.12 Gbps, whose amplitude degrades significantly worse at 10.24 Gbps. When simulated with the same load model,the eye diagram of the serializer at 10.24 Gbps was poor.However,with loads of 1 pF and 1 nH,the simulated deterministic jitter was approximately 25 ps,and the eye height was 400 mV in a typical case.This indicates that the upper limit of the standard CML driving capability and other technologies such as pre-emphasis or equalizers are needed [34]. The estimated total jitter of the serializer was less than 32.5 ps based on the LC-PLL test results.The total power consumption was 86 mW, excluding the test drivers.

5 Conclusion

A 5-GHz low-jitter low-power LC-PLL was developed using 55-nm CMOS technology for the silicon pixel readout of a future hybrid pixel detector. Standard LVDS and CML interfaces were used to transmit and test theclocks. A proposed LC-VCO using low-pass filters to reduce phase noise was adopted in the LC-PLL. The PLL consisted of two sub-bands, covering a 1.18-GHz frequency range from 4.74 GHz to 5.92 GHz. The PLL functions and the random jitter at 5.12 GHz were measured to be less than 460 fs; meanwhile, the deterministic jitter and total jitter (at a bit error rate of 10-12) were no more than 0.8 ps and 7.5 ps.The PLL consumes 27 mW of core circuits and 73.8 mW in total. Most of the test results coincided with the simulations and validated the analytical methods and tests.

Table 2 Comparison table of the LC-PLLs

AcknowledgementsThe authors would like to thank the opticalelectronics laboratory at Southern Methodist University for providing the dose rate correction coefficient calculations.

Author contributionsAll authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Xiao-Ting Li,Wei Wei,Ying Zhang,and Xiong-Bo Yan. The first draft of the manuscript was written by Xiao-Ting Li, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.