Applying the Kalman filter particle method to strange and open charm hadron reconstruction in the STAR experiment

2023-12-05XinYueJuYueHangLeungSoorajRadhakrishnannPetrChaloupkaXinDongYuryFisyakPavolFedericIvanKiselHongWeiKeMichalKocanSpyridonMargetisAiHongTangIouriVassilievYiFeiZhangXiangLeiZhuMaksymZyzak

Nuclear Science and Techniques 2023年10期

Xin‑Yue Ju · Yue‑Hang Leung · Sooraj Radhakrishnann · Petr Chaloupka · Xin Dong · Yury Fisyak ·Pavol Federic · Ivan Kisel · Hong‑Wei Ke · Michal Kocan · Spyridon Margetis · Ai‑Hong Tang ·Iouri Vassiliev · Yi‑Fei Zhang · Xiang‑Lei Zhu · Maksym Zyzak

Abstract We applied KF Particle, a Kalman Filter package for secondary vertex finding and fitting, to strange and open charm hadron reconstruction in heavy-ion collisions in the STAR experiment.Compared to the conventional helix swimming method used in STAR, the KF Particle method considerably improved the reconstructed Λ , Ω , and D0 significance.In addition, the Monte Carlo simulation with STAR detector responses could adequately reproduce the topological variable distributions reconstructed in real data using the KF Particle method, thereby retaining substantial control of the reconstruction efficiency uncertainties for strange and open charm hadron measurements in heavy-ion collisions.

Keywords Heavy-ion collisions · Secondary vertex finding · Kalman filter

1 Introduction

In high-energy particle and nuclear physics experiments,strange and heavy flavor hadrons aid in studying the electroweak and strong interactions in the Standard Model [1–3].These particles are predominantly short-lived, and their ground-state particles such asK0S, Λ ,D0, and Λ+cexhibit proper lifetimes (cτ) varying from tens of micrometers to several centimeters [4].The experimental reconstruction of the decay positions and their separation from the collision vertices is imperative for achieving precise measurements [5–7].This becomes extremely critical in high-energy heavy-ion experiments at the RHIC and LHC, where the collision vertex produces thousands of particles.The process of secondary vertex reconstruction can significantly reduce the combinatorial background in these collisions while achieving a finite reconstruction efficiency, especially for low-momentum particles [5–7].Therefore, the balance between the combinatorial background and the reconstruction efficiency must be considered for the particle of interest to achieve the best experimental measurement precision.

The STAR detector at RHIC serves as a general purpose detector dedicated to heavy-ion experiments [8].The primary tracking subsystem, the time projection chamber(TPC) [9], provides a pointing resolution of ∼ 1 mm to the collision vertex for charged tracks, which enables topological separation of strange hadron weak decay positions from the primary collision point.A high-resolution silicon detector, the Heavy Flavor Tracker (HFT), was operated from 2014 to 2016, which improved the charged track pointing resolution to more than ∼ 50μm for 750 MeV/ccharged kaon tracks [10].This enables the topological reconstruction of various open-charm hadron decays in heavy-ion collisions [5, 11–15] and significantly improves the precision of the measurements without necessitating the detection of the decay vertex [16].Furthermore, the vertex resolution is sufficient to separate the open charm and open beauty hadron decays, which facilitates the measurement of beauty decay electrons to reveal mass-dependent parton energy loss in the hot-dense medium [17–19].

Conventionally, secondary vertex reconstruction in STAR has been conducted by determining the distance between the closest approach (DCA) points of two charged track helices, referred to as the helix swimming method (HS).Earlier,the decay position was regarded as the middle of the two DCA points, and this method has demonstrated adequate performance in reconstructing strange and open-channel hadrons in heavy-ion collisions [5, 6].The key topological variables employed in this method is schematically represented in Fig.1: DCA of daughter particles to the primary vertex (DCAv1,DCAv2) , DCA between two daughter particles(DCA12) , decay length from the decay vertex position to the primary vertex (d),θdenotes the angle between the particle momentum vector of interest and the decay length vector,and/or the DCA between the interested particle helix and the primary vertex (b).The calculations were performed based on the mathematical helix model for the daughter tracks.The experimentally estimated uncertainties were excluded from the reconstruction method.

Recently, within STAR, an experimentally estimated error matrix on the track helix-fitted parameters was rendered in the offline analysis software infrastructure.Simultaneously,the KF Particle package, a Kalman Filter method used for secondary vertex finding and fitting utilizing the estimated track helix error matrices, was deployed for STAR offline analysis.Overall, this study aims to improve the secondary particle reconstruction with constraints provided by additional knowledge on the error matrices of various topological variables.

Fig.1 Sketch of key topological variables used by the helix swimming method

This paper reports the results of applying the KF Particle method to the reconstruction of strange ( Λ,Ω-) and opencharm (D0) hadrons in heavy-ion collisions in the STAR experiments.A toolkit for multivariate analysis (TMVA)package deployed in ROOT [20] was used to optimize the topological selection cuts for the best signal significance in both the helix swimming and KF Particle methods.The remainder of this paper is organized as follows.Section 2 describes the mechanism followed by the KF Particle method to manage the secondary particle reconstruction and fitting.The application of the KF Particle method to the STAR data is discussed in Sect.3.The optimized signal performance of the helix-swimming method and the KF Particle method are comparatively analyzed as well.The topological variable distributions from the KF Particle method obtained through the real data are comparatively analyzed with those derived from Monte Carlo (MC) simulations.Finally, the present findings are summarized in Sect.4.

2 KF particle method

The Kalman Filter (KF) [21] represents a recursive method for analyzing linear discrete dynamic systems described by a vector of parameters called the state vectorraccording to a series of measurements observed over time.It estimates the unknown vector parameters with high accuracy and is widely used in tracking and data prediction tasks.

In particle experiments, the Kalman filter can be employed to solve various tasks, such as track finding, particle reconstruction, and event vertex reconstruction [22].In particular, the KF particle package utilizes the Kalman filter for the reconstruction of short-lived particles and vertex finding has been developed and is currently applied to STAR data analysis.

In the KF Particle framework, each particle is described by a state vector with eight parameters [23]r=(x,y,z,px,py,pz,E,s) , where (x,y,z), (px,py,pz) , andEare the position, ands=l∕p, whereldenotes the length of the trajectory in the laboratory coordinate system andprefers the total momentum of particle.This natural particle parametrization renders the algorithm independent of the detector system geometry.The reconstructed state vector and its covariance matrix (C) contain all the necessary information regarding the particle, which enables the calculation of physical quantities such as momentum, energy, and lifetime with their accuracy and theχ2values during the reconstruction, i.e., to estimate the quality of reconstruction.

To simplify the calculation, the momentum and energy of the mother particle were calculated from the sum of all the daughter particles, and only the vertex position was fitted.After transporting the daughter particle to the current estimation of the decay vector (rk,Ck), the state vector of the daughter particle can be deemed as a measurement (mk,Vk)of the state vector of the mother particle.Using the residualζkbetweenrkandmkand the Kalman gain matrixKkevaluated fromCkandVk, the estimation of the mother particle vector can be updated as (rk+1,Ck+1) according to Eq.(1).

Theχ2criterion for this estimation is obtained simultaneously.A basic filtering algorithm was developed by conducting this process on all daughter tracks.A complete description of the algorithm and its mathematical justification is detailed in Refs.[23, 24].Herein, we briefly outline the scheme for short-lived particle reconstruction displayed in Fig.2.

1.Sort the final state particles into primary and secondary according to itsχ2to collision vertex.

2.Selection of an initial secondary decay point, often as the DCA point to the collision vertex from the first daughter track.Set the mother particle initial parameters(r0,C0),C0is often set as an infinite diagonal matrix.

3.Extrapolation of thek-th daughter particle to the point of the closest approach with the current estimation of the decay point and update its parameters.

4.Correction of the decay vertex according tok-th daughter particle and adding the 4-momentum of the daughter particle to the 4-momentum of the mother particle.

6.If the production vertex of the mother particle (typically, the primary vertex) is known, the mother particles are transported to it.Thereafter, the position of the production vertex is filtered and theχ2probabilities of the origination are calculated from the production vertex.

Fig.2 (Color online) Schematic of short-lived particle reconstruction with the KF Particle package.The main steps are as follows:initialize the parameters of the mother particle; extrapolate a daughter particle to the DCA point with a current estimation of the mother particle; correct the mother particle with the parameters of its daughter particle; after correcting overall daughter particles, the optimum estimation of the mother particle is obtained; the parameters of current mother particle are input to the initial step and iterated several instances until the results converge

7.SetrinandCinas the initial parameters of the mother particle and repeat steps 3–6 N times.

8.Finalize the precision of the mother particle parameters(rn,Cn).

Compared with the traditional helix swimming method, the KF Particle method offers several crucial advantages.

• Usage of the daughter particle track parameters covariance matrices adds information on the detector performance and the track reconstruction quality, improving the mother particle reconstruction accuracy and efficiency.

• Statistical criteria (χ2based cuts) were calculated and used for background rejection, for instance, usingχ2between the daughter track parameters and the collision point parameters instead of DCA to better discriminate primary and secondary particles.

• The natural and simple interface enables the reconstruction of the complicated decay chains [24].

• Usage of parallel programming provides high computational speed for the above-mentioned rather complicated calculations.

3 Application to data

matrix information of the track parameters, were applied in this analysis.

3.1 Λ reconstruction

Λ particles were reconstructed using the decay channel Λ →p+π-, which offers a branching ratio of 69.2% [4].Λ particles decayed with an appropriate decay length ofcτ≈ 79 mm after they were produced in Au+Au collisions.Protons and pions were identified based on the ionization energy loss in the TPC gas.In practice, charged tracks withnσX<3 for any particle of interestXwere selected, wherenσXis defined as follows.

Fig.3 pπ- invariant mass distributions of pT =0.4-0.6 GeV/c in Au+Au collisions at √sNN =27 GeV with centrality 0–5% (left) and 30–40% (right).Black data points depict all pπ- pair distributions and are fitted with the Gaussian function in addition to the combinatorial background distributions depicted in the blue lines, which were estimated via side-band fitting

To ensure that the KF Particle method can be reliably used for extracting the physical yields, we applied the KF Particle method to a Monte Carlo simulated sample generated using an embedding technique detailed as follows:Simulated Λ particles with flatpTand rapid distributions were propagated through a GEANT3 [25] simulation of the STAR TPC.The Λ particles decayed inside the simulated detector and the electronic signals originating from the decay particles were mixed with those from a given event from the real data.The number of simulated Λ particles was 5% of the measured charged-particle multiplicity of the event in which the simulated particles were embedded, and the simulated Λ particles all originated from the primary vertex of that event.The combined electronic signals were subsequently processed using the STAR tracking software, which is used for real data processing as well.Thereafter, the KF Particle package was deployed on the resultant tracks for Λ reconstruction.

We compared the performance of KF particles on real data and MC simulation samples.The topological variables listed below (Table 1) were used to select the Λ candidates during the KF Particle reconstruction.

Statistical criteria were used instead of geometric quantities correspondingly (DCAprim,π→χ2prim,π,DCAprim,p→χ2prim,p,DCAp-π→χ2p-π,bandθ→χ2topo,Λ,dΛ→dΛ∕σdΛ).Comparisons of these variables between the data and MC simulation for Λ candidates withpT=0.4-1.2 GeV/cand a centrality between 0 and 10%are depicted in Fig.4.In general, the distributions of these topological variables from the data are appropriately described by the MC simulations for all centralities andpT.

To achieve the optimal significance of the Λ signal, the Toolkit for Multivariate data A analysis is used.TMVA is a family of supervised learning algorithms that can be used to differentiate between signals and backgrounds.For further details, please refer to Refs.[20].Signal and background samples were prepared as inputs for training.The signal samples were obtained from a GEANT3 simulation as described above.For the background sample, we selected sidebands ( 3σ

Table 1 Topological variables for Λ reconstruction

Fig.4 Key topological variable distributions: a b c d χe dΛ , f dΛ∕σdΛ used in KF Particle method for Λ reconstruction.Data (black points) and MC simulations (red curves)are compared

The BDT response value distributions from the signal and background samples for Λ candidates withpT=0-1 GeV/cand centrality 0–10% are shown in the left panel of Fig.5.We observe that the BDT response values for the signal and background are significantly different from each other, and thus serve as a good measure for differentiating between the signal and background.To select a BDT response cut value to optimize the significanceS∕√S+B, whereSstands for signal counts andBstands for background counts, we used the TMVA package to calculate the signal and background efficiency as a function of the BDT response cut value,εS(BDT cut) andεB(BDT cut) , using the signal and background samples, respectively.The signal and backgroundefficiencies for Λ candidates in thepTrange 0–1 GeV/cand a centrality 0–10% are shown in the right panel of Fig.5 for blue and red lines.The estimated significance as a function of the BDT cut-offvalue can then be calculated using Eq.(3):

whereS0andB0are the number of signal and background counts in the dataset before the BDT cut is applied.S0is obtained from the estimated Λ counts without performing any cut on the topological variables, andB0is obtained from the number of sidebandpπ-pairs without the BDT cut.The calculated significance as a function of the cut value applied to the BDT response value for Λ candidates in thepTrange 0–1 GeV/cand 0–10% centrality is shown in the right panel of Fig.5 as a green line.We find that a cut value of -0.09 maximizes the significance of this particularpTand centrality bin, and we choose this cut value for Λ reconstruction.This procedure is then repeated for eachpTand the centrality bin.Generally, as the signal-to-background ratio decreases,a stricter BDT selection cut is necessary to optimize the significance.

Fig.6 (Color online) Ratio of significance for Λ particles using the KF Particle method in conjunction with BDT training over those using the helix swimming (HS) method in conjunction with BDT training as a function of pT of the Λ particles for centrality selection 0–5% (black), 30–40% (red) and 60–80% (blue).The shaded bands indicate the statistical uncertainties

3.2 Ω reconstruction

Next, we turn to Ω baryon.Ω baryons were reconstructed using the decay channel Ω →Λ+K-→p+π-+K-.Ω particles decayed with a proper decay length ofcτ≈ 25 mm[4], and the Λ daughters decayed again soon thereafter.The final daughter tracks were detected using STAR TPC.Similarly, for each proton, kaon or pion track, a minimum of 15 hits were required to ensure good track quality.We reconstruct the Λ baryons with the KF Particle method first and then treat it as a daughter track to reconstruct the Ω production vertex.In Fig.7, clear Ω mass peaks were observed in the ΛK-invariant mass distributions using the KF Particle method.

Because the decay topology for Ω baryons is more complicated than that for Λ baryons, more topological variables can be used for training to facilitate the differentiation between the signal and background.The topological variables are listed in Table 2 were used in the selection of Ω baryon candidates during KF Particle reconstruction.

Similar to the Λ baryon study, we generated an MC sample of the reconstructed Ω baryons using a GEANT3 simulation of the STAR TPC.The data-MC comparison of key topological variables is shown in Fig.8.

Fig.7 ΛK- invariant mass distributions of pT =1.2-1.6 GeV/c in Au+Au collisions at √sNN =27 GeV with centrality 0–5% (left), and 30–40% (right).Black data points depict all ΛK- pair distributions and are fitted with the Gaussian function plus the combinatorial background distributions shown in the blue lines, which are estimated via side-band fitting

We find reasonable agreement between the data and MC simulations, which suggests proper estimation and usage of the covariance matrix of the Λ daughters and gives us confidence that the KF Particle method may be reliably used for the extraction of Ω baryon yields.We then generated a signal and background sample using the same method as in Λ analysis to supply inputs for TMVA training using the BDT method.The BDT response value distribution for Ω candidates withpT=1-4 GeV is shown in the left panel of Fig.9.The signal efficiency, background efficiency, and significance are shown in the right panel of Fig.9.As in the case of Λ analysis, we selected the BDT response cut-offvalue that optimizes significance.

This process is repeated for eachpTand the centrality bin.The significance of using the optimized BDT response cuts for eachpTand centrality bin was extracted.We then performed signal extraction using the default helix swimming method, with candidate selection cuts chosen to bethe same as in the previous Ω analyses at the same collision energy [6, 27], also tuned by the BDT method.The signal and background counts were extracted using the default helix swimming method, and the ratios of the significances were calculated using these two methods, as shown in Fig.10.We observe an ≈50% increase in significance in thepTrange of 1–4 GeV/c.This increase is higher than that for Λ , likely owing to the more complex decay topology with two decay vertices reconstructed by KF particles and a larger background.Further studies using KF particles are underway to extend the Ω measurement to lowpTbeyond 1 GeV/c; however, this is beyond the scope of this study.

Table 2 Topological variables for Ω reconstruction

Fig.8 Key topological variable distributions: a , b c χ2 topo,Ω , d χ2Λ-K , e dΛ∕σdΛ , f dΩ∕σdΩ used in KF Particle method for Ω reconstruction.Data (black points) and MC simulations (red curves)are compared

Fig.10 (Color online) Ratio of significance for Ω particles using the KF Particle method in conjunction with BDT training over those using the helix swimming (HS) method in conjunction with BDT training as a function of pT of the Ω particles for centrality selection 0–5% (black), 30–40% (red) and 60–80% (blue).The shaded bands indicate the statistical uncertainties

Fig.9 (Color online) (left) BDT response value distributions for signal (blue) and background (red) Ω candidates in the pT range 1–4 GeV/c for 0–10% centrality.(right) Efficiency for signal (blue line) and background (red line) Ω candidates in the pT range 1–4 GeV/c for 0–10% centrality as a function of the cut value placed on the BDT response value.The significance (green line) achieves its maximum value when the cut value is 0.09

3.3 D0 reconstruction

D0(and¯D0) particles were reconstructed via the decay channelD0→K-π+and its charge conjugation with an appropriate decay length ofcτ≈ 123μm [4].Because this decay length is less than the spatial resolution of the TPC detector, information from the microvertex detector HFT is used to distinguish theD0decay vertex from the primary collision vertex.For each kaon or pion daughter track, a minimum of 15 hits in the TPC and a match with the HFT detector with at least three hits were required to ensure good track quality.For kaon and pion particle identification, in addition to the requirement thatnσπ<3 andnσK<2 , we also utilized information from the time-offlight (TOF) detector [5] when available to help identify the particles at highpTby requiring the measured inverse velocity ( 1∕β) to be within three standard deviations The topological variables are listed in Table 3 were used to select theD0meson candidates in KF Particle reconstruction.pT,πandpT,Kcuts are added to reject combinatorial background at lowpT.

Similar to the Λ and Ω baryon studies, we generated an MC sample of reconstructedD0mesons using a GEANT3 simulation of the STAR TPC, HFT, and TOF andprocessed it through full detector tracking, as was done in the real data reconstruction with the previously mentioned embedding technique.The HFT simulator was tuned to reproduce the single-track efficiency and DCA pointing resolution observed in real data.However, the consistency in the topological variable distributions between the data and MC forD0signals is yet to be demonstrated.Figure 11 shows a comparison of several key topological variables used in the KF Particle method forD0reconstruction between the data (black data points) and MC (red histograms).We found good agreement between the data and MC simulations for these variables, which means that this multiple-detector-combined MC simulation can generateD0signals reasonably.The background distributions are shown in Fig.11 (blue circles)).They are estimated from real data using the sideband method, in which background candidates are selected by requiring an invariant mass ofKπpairs within 3σ

Table 3 Topological variables for D0 reconstruction

Fig.11 (Color online) Key topological variables distributions: aπ , b , c χd dD0∕σdD0 , e χ2 topo,D0 used in KF Particle method for D0 reconstruction.Data (black points), MC simulations(red curves), and background (blue circles) are compared

Fig.12 (Color online) (left) BDT response value distributions for signal (blue) and background (red) D0 candidates in the pT range 2–3 GeV/c for 10–40% centrality.(right) Efficiency for signal(blue line) and background (red line) D0 candidates in the pT range 2–3 GeV/c for 10–40% centrality as a function of the cut value placed on the BDT response value.The significance (green line) achieves its maximum value when the cut value is 0.05

Fig.13 K±π∓ invariant mass distributions using the KF Particle method in 10–40% Au+Au collisions at √sNN =200 GeV in the region of pT =0-1 GeV/c (left) and pT =1.5-2.0 GeV/c (right).Black data points depict all K±π∓ pair distributions and are fitted with the Gaussian function plus the combinatorial background distributions shown in the blue lines, which are estimated via side-band fitting

Fig.14 (Color online) Ratio of significance for D0 particles using the KF Particle method in conjunction with BDT training over those using the helix swimming(HS) method in conjunction with BDT training as a function of pT of the D0 particles for centrality selection 0–10% (black circles), 10–40% (red squares), and 40–80% (blue triangles).Shaded bands indicate the statistical uncertainties

Thereafter, we applied the optimized BDT selection cuts to real data analysis.Figure 13 displays theD0-invariant mass distributions derived using the KF Particle method for 10–40% Au+Au collisions atin the regionspT=0-1 GeV/c(left) andpT=1.5-2.0 GeV/c(right), respectively.Black lines depict the function fits of the data with a Gaussian function for theD0signal and linear background.Compared with the Λ and Ω reconstructions, more combinatorial backgrounds are included because theD0decay points are closer to the collision points and blend with the particles from them, especially atpT=0-1 GeV/c.

Thereafter, the signal significance was calculated from the invariant mass distributions ofD0candidates.The signal counts were obtained from a Gaussian function fit of theD0peak, whereas the background counts were determined based on a linear background function fit within a mass window ofMinv-MD0<3σ, whereσdenotes the width ofD0peak.We compared the significance ofD0from the KF Particle method to the helix swimming (HS) method used in a previous analysis [5] which was in conjunction with BDT training; the ratio of theD0significance between these two methods is illustrated in Fig.14.The shaded bands indicate the statistical uncertainties in this calculation.This comparison demonstrates that the KF Particle method improves the reconstructedD0signal significance, especially for lowpTand more central collisions.In 0–10% of the central Au+Au collisions atpT<1 GeV/c, the improvement can signify as a factor of ≈ 3, potentially because of the enormous amount of combinatorial background (hundreds of signals) in that particular range, and the cuts based on statistical criteria work appropriately in discriminating the particles originating from a secondary vertex.

4 Summary

AcknowledgementsThe authors thank the STAR Collaboration, RHIC Operations Group, RCF at BNL, and NERSC Center at LBNL for their support.

Author ContributionsAll authors contributed to the study conception and design.Material preparation, and analysis were performed by Xin-Yue Ju and Yue-Hang Leung.Data collection are performed by the RHIC-STAR collaboration and the Monte Carlo simulation thanks to Sooraj Radhakrishnann and Xiang-Lei Zhu.The first draft of the manuscript was written by Xin-Yue Ju,Yue-Hang Leung, Xin Dong and all authors commented on previous versions of the manuscript.All authors read and approved the final manuscript.

Data availabilityThe data that support the findings of this study are openly available in Science Data Bank at https:// www.doi.org/ 10.57760/ scien cedb.j00186.00250 and https:// cstr.cn/ 31253.11.scien cedb.j00186.00250.

Declarations

Conflict of interestThe authors declare that they have no competing interests.

Nuclear Science and Techniques

2023年10期