Improvement of the Bayesian neural network to study the photoneutron yield cross sections

2022-12-20YongYiLiFanZhangJunSu

Nuclear Science and Techniques 2022年11期

Yong-Yi Li • Fan Zhang• Jun Su

Abstract This work is an attempt to improve the Bayesian neural network (BNN) for studying photoneutron yield cross sections as a function of the charge number Z, mass number A, and incident energy ε. The BNN was improved in terms of three aspects: numerical parameters, input layer, and network structure. First, by minimizing the deviations between the predictions and data, the numerical parameters, including the hidden layer number, hidden node number,and activation function,were selected.It was found that the BNN with three hidden layers, 10 hidden nodes, and sigmoid activation function provided the smallest deviations. Second, based on known knowledge,such as the isospin dependence and shape effect, the optimal ground-state properties were selected as input neurons.Third,the Lorentzian function was applied to map the hidden nodes to the output cross sections, and the empirical formula of the Lorentzian parameters was applied to link some of the input nodes to the output cross sections. It was found that the last two aspects improved the predictions and avoided overfitting, especially for the axially deformed nucleus.

Keywords Bayesian neural network · Photoneutron cross sections · Giant dipole resonance

1 Introduction

Neural networks are powerful tools for making predictions after training using data and have made possible exciting achievements in nuclear physics in the past few years [1-6]. The earliest work on neural networks in nuclear physics dates back to 1993, when the phenomenological approach to many-body systems based on multilayer feedforward neural networks was introduced to learn the systematics of atomic masses and nuclear spins and parities [7]. Thereafter, various types of neural networks have been applied to study nuclear mass systematics[8, 9], β-decay systematics [10], and binding energy [11].

Recently, with the help of physical ideas, neural networks have been improved and their potential capability has been realized. The known physics were explicitly embedded based on the Bayesian neural network (BNN),which results in a novel method for accurately predicting β-decay half-lives [12]. The input data are preprocessed,correlations among the input data are included, and the problem of multiple solutions can be reduced, yielding more stable extrapolated results [13]. The combination of the three-parameter formula and the BNN results in a novel approach for describing the charge radii of the nuclei [14].In addition, the physical law is revealed by neural networks. For example, a feed-forward neural network model was trained to calculate nuclear charge radii, and the correlation between the symmetry energy and charge radii of Ca isotopes was suggested [15]. The convolutional neural network algorithm was applied to determine the impact parameters in heavy ion collisions using constrained molecular-dynamics model simulations [16]. To date, neural networks have been widely applied for signal identification [17-19], data restoration [20, 21], and regression analysis [5] and have been broadly used in nuclear physics.

In studies where the nuclear masses and nuclear charge radii were studied,Utama et al.claimed that the physics in the initial prediction can be included using physics-motivated models and that the BNN can be used to fine-tune these models by modeling the residuals [22, 23].Based on this residual-approach, the predictions of several physicsmotivated models in nuclear physics have been improved using neural networks. For example, the BNN approach was employed to improve the nuclear mass predictions of various physics models[24]and the fission yield prediction using the TALYS model [2]. To study the isotopic cross sections in proton-induced spallation reactions, the BNN predictions obtained by learning the experimental data and the residuals of the SPACS parametrization were compared[25].It was shown that the latter are better,which indicates the significance of the robust physics-motivated model when using a neural network.

A neural network is a type of numerical algorithm. In cases where a physics-motivated model is not available,attempts have been made to provide physics guidance in neural networks.A multitask neural network was applied to learn the experimental data of the giant dipole resonance parameters directly, and the predictions were better than those calculated by the Goldhaber-Teller model [26]. In a study for determining the impact parameters of heavy-ion collisions using convolutional neural networks, no initial predictions were made [16]. In addition to the physicsmotivated model, physics guidance in neural networks has also been provided from the input layer [27] or by an empirical formula[14].In this study,an attempt is made to improve the BNN for studying photoneutron yield cross sections,where both the improvements from the input layer and by the empirical formula are considered.

Photonuclear reactions were first observed more than 60 years ago [28]. Their cross section data are important to analyze the radiation transport, study the nuclear waste transmutation [29], and calculate the nucleosynthesis[30, 31]. The underlying mechanisms in photonuclear reactions are significant in fundamental nuclear physics[32,33].More than 27000 data of σxnwere collected in the EXFOR database [34] for nuclei from6Li to239Pu at incident energies above the neutron separation energy.New facilities have been developed to measure additional data [35, 36]. A large dataset makes machine learning possible and advisable.

This study focuses on the improvement of the BNN to study photoneutron yield cross sections. The remainder of this paper is organized as follows. In Sect. 2, the model is described. In Sect. 3, both the results and discussions are presented. Finally, a summary is presented in Sect. 4.

2 Model

The fundamentals of the BNN approach were established in the last century [37]. It is now a commonly used method for pattern recognition and numerical regression.The latter is the case in the present study. In the normal BNN algorithm, a neural network with hidden layers is built for mapping from the input layer X to the output layer Y.

where θ=｛a，bj，cj，dji｝ are parameters in the neural network, ｛a，cj｝ are biases, and ｛bj，dji｝ are weights. The activation function f can be set as a sigmoid, tanh, or softplus type.

where Nd is the sample size of the available data. The Bayesian theorem, which deduces the posterior knowledge from the prior case using the likelihood function,is applied to solve this regression problem. Specifically, the prior distribution P（θ） of the parameters θ is assumed, and its posterior distribution P（θ|D） for a given dataset D is expressed as

where P（D|θ） is the likelihood of D given θ.

With the posterior distribution of the parameters θ, the expected value of the output ^Y*for the known inputs X*is expressed as the integration

where S is the standard deviation of the samples Y（X*，θ（k））. The difference between the integration and Monte Carlo calculations can be reduced by increasing the number of samples Ns.

The analytic computation of the posterior distribution P（θ|D） is intractable owing to the high dimensionality of the parameters.In the BNN approach,variation inference is applied to obtain an approximation of P（θ|D）. The variation inference attempts to find a κ such that q（θ|κ） is the minimum distance from P（θ|D） measured by the (KL)divergence:

The traditional method of solving the regression problem in physics is based on an empirical formula with parameters, where an appropriate formula can avoid both misconvergence and overfitting. Similarly, when applying a BNN in physics, it is important to select the appropriate input nodes for a specific output.In this study,the output is the photoneutron yield cross section σxn.More than 27,000 data of σxnwere collected in the EXFOR [34] for nuclei from6Li to239Pu at incident energies above the neutron separation energy. The minimum input nodes of the BNN to study σxnare the charge number Z of the target, mass number A of the target, and energy ε of the incident γ particle.The BNN in this case is abbreviated as BNN-ZAE and illustrated in Fig. 1a.

The BNN is a numerical algorithm.In physics,the BNN is used to learn the residuals of the physics-motivated model and then fine-tune the model.Thus,the main physics information is included in the initial prediction by the physics-motivated model. In our previous work [27], we illustrated a new method to provide physical guidance in the BNN from the input layer without initial prediction by the physics-motivated model.This method is applied in this work to select the optimal ground-state properties as neurons of the input layer in the BNN to predict the photoneutron yield cross sections σxn. Details of the method are provided in Ref.[27].In brief,various combinations of ground-state properties are considered as the input nodes in the BNN, and the optimal combination is selected according to the smallest deviation between the data and the prediction. The optimal input nodes are as follows:

Fig. 1 (Color online) a Bayesian neural network BNN-ZAE with input nodes of charge number Z,mass number A,and incident energy ε. b Bayesian neural network BNN-OPT with optimal input nodes including mass number A, incident energy ε, and other ground-state properties. c Lorentzian function-based Bayesian neural network LBNN where the Lorentzian shape of the photoneutron cross sections and the empirical formula of the Lorentzian parameters are considered

where Sn is the neutron separation energy, Qβis the βdecay energy,B is the binding energy per nucleon,A is the mass number,β2is the quadrupole deformation parameter,and ε is the incident photon energy.In this case,the BNN is abbreviated as BNN-OPT and illustrated in Fig. 1b. It is worth noting that the charge number Z is not input in the BNN-OPT model. Only the experimental data of the photoneutron yield cross section for stable nuclei are available.For stable nuclei, the correlation between the charge number Z and the mass number A is very strong(see Fig.1 in Ref. [27]). After mass number A has been used, the additional Z is not conducive to reducing the deviation between the data and the prediction by the BNN-OPT model.

We will further prove that the BNN can be improved by considering known knowledge of physics. The photoneutron yield cross sections σxnas a function of incident energy can be characterized by a Lorentzian shape with two components. The Lorentzian parameters are peak energy Ei, width Γi, and strength si.

The subscripts i=1 and 2 denote the two components.σTRK expresses the cross section in terms of the Thomas-Reiche-Kuhn sum rule.The Lorentzian function is applied before mapping the hidden nodes to the output cross sections,as shown in Fig.1c.Except for the Lorentzian shape,the known knowledge includes the empirical formula of the Lorentzian parameters:

where Hi(i = 1, ..., 6) are empirical parameters. These empirical formulas are also considered in the BNN. The algorithm then becomes a Lorentzian function-based BNN(LBNN).As shown in Fig.1c,the(black)solid lines show that all inputs nodes are used to calculate the empirical parameters H1,H2,H3,H4,H5,and H6.Then,the empirical parameters, together with the input nodes A and β2, are used to calculate the Lorentzian parameters E1,E2,Γ1,Γ2,s1, and s2according to Eq. (11). As can be observed, the Lorentzian parameters depend not only on A and β2but also on other input nodes. However, the dependence on A and β2is known and physical, as shown in Eq. (11),

whereas the dependence on all the input nodes is unknown and numerical. These were fitted during the training process. We use two types of lines, solid and dashed, to distinguish between the physical and numerical dependences.Energy is one of the input nodes, but is not used to calculate the hidden layer. This is because the dependence of the energy on the output (cross section) is known as the Lorentzian shape. In the BNN-ZAE and BNN-OPT models,the dependence on energy of the output(cross section)is a black box.In the LBNN model,some dependencies on the inputs of the outputs are known. We say that parts of the black box are open.

3 Results and discussions

There are three types of BNN in this work: BNN-ZAE,BNN-OPT,and LBNN. In the latter two,physics guidance is provided by improving the input nodes and considering the Lorentzian shape.In contrast,BNN-ZAE is a numerical algorithm without any physical improvement. In the following section, we evaluate these three models by comparing their predictions. The root-mean-square (RMS)deviations between the predictions and data were calculated as

where σp is the predicted cross section and σd is the corresponding data. Log-scaling was used because the values of σd are in the region from 10-3to 10 b, which is across four orders of magnitude.

The RMS deviations as a function of the iteration step were applied to test the convergence of the predictions.The cases for the BNN-ZAE predictions with one, two, and three hidden layers and 10 nodes for each hidden layer are shown in Fig. 2a. From the figure, we can observe that all the RMS deviations converge quickly in 1000 iteration steps. The final RMS deviations were 0.226, 0.219, and 0.214 for the cases of one, two, and three hidden layers,respectively. This indicates that more hidden layers help reproduce the training data; however, the effect is weak.The effect of the number of hidden nodes was also tested by setting one hidden layer to 10, 30, and 100 nodes. The RMS deviations are compared in Fig.2b.It was shown that the rate of convergence was slower for a larger number of hidden nodes. The iteration steps required to decrease the RMS deviations to a value less than 0.25 are 1000 for the cases of 10 and 30 hidden nodes,but 2000 steps are needed for 100 hidden nodes. The RMS deviations at 4000 iteration steps are 0.226, 0.229, and 0.235 for 10, 30, and 100 nodes,respectively.The RMS deviations are similar for 10 and 30 hidden nodes, but they are larger for 100 hidden nodes. The role of the activation function was tested by setting a hidden layer with 30 hidden nodes using the sigmoid, tanh, or softplus activation function, as shown in Eq. (2).The RMS deviations are compared in Fig. 2c.The sigmoid activation function was shown to be the best. In the following calculations, three hidden layers with 30hidden nodes for each layer and a sigmoid activation function were applied.

Fig. 2 (Color online) Root-mean-square (RMS) deviations between data and the BNN-ZAE predictions as a function of the iteration step.The insets show the RMS deviations at the iteration step 4000.a Comparison of the RMS deviations for 1, 2, and 3 hidden layers.b Comparison of the RMS deviations for 1 hidden layer with 10, 30,and 100 hidden nodes. c Comparison of the RMS deviations for 1 hidden layer with 30 hidden nodes but different activation functions

Fig. 3 (Color online) Comparison of the prediction errors for three types of Bayesian neural networks (as shown in Fig. 1). a The RMS deviations as a function of the iteration step.The inset shows the RMS deviations at the iteration step 4000.b Distributions of the differences between predictions and training data log σp-log σd

The RMS deviations as a function of the iteration step for the three types of BNNs are shown in Fig. 3a. It is shown that the rates of convergence are similar for all the cases, and 4000 iteration steps are sufficient for the calculations. The RMS deviations at 4000 iteration steps are 0.198 for the BNN-OPT model and 0.206 for the LBNN model,which are smaller than that of 0.214 for BNN-ZAE.Figure 2a, b shows that a change in the numerical parameters (number of hidden layers and hidden nodes)does not improve the neural network. Here, it is indicated that the key to improving the neural network for physics is the consideration of the known knowledge of observables. For instance,several effects of the photoneutron reaction, such as isospin dependence and shape effect, have been found[3, 38, 39], which indicate that the cross sections of the photoneutron reaction depend on the ground-state properties of the nuclei. Another known knowledge of photoneutron cross sections is their Lorentzian shape.The data of the ground-state properties are applied to the input layer of the BNN-OPT model, whereas the Lorentzian shape is considered in the LBNN model. These two aspects are responsible for the smaller RMS deviations compared to those provided by the BNN-ZAE algorithm.

To compare the errors of the predictions by the three neural networks, the distributions of the differences between the predictions and training data log σp-log σd are shown in Fig. 3b. A value 1 of log σp-log σd means that the predicted cross section is 10 times larger than the experimental data,whereas a value of-1 indicates that it is 10 times smaller. These cases hardly ever occur. The sample size for abs（log σp-log σd）＞1 is less than 0.1%of the total sample size. For most samples, the values of log σp-log σd are in the region from - 0.1 to 0.1, which means that the prediction agrees with the data in 1.26 times. Specifically, these samples comprise 48.0%,57.1%,and 61.6% of BNN-ZAE, BNN-OPT, and LBNN samples,respectively. The percentage of LBNN was the largest.Furthermore, the distribution of log σp-log σd provided by the LBNN was more symmetrical than that of the others.Considering these aspects,the LBNN model is superior to the BNN-OPT model.

We further evaluated the three types of BNNs by comparing their predictions of photoneutron yield cross sections for spherical nuclei92Zr,112Sn, and206Pb. The cross sections as a function of the incident energy are shown in Fig. 4. In general, for spherical nuclei, only one main component of the Lorentzian shape is displayed in the excitation function of the photoneutron reaction. The experimental cross sections as a function of the incident energy ε for nuclei92Zr,112Sn, and206Pb display this situation.The position and value of the Lorentzian peak were reproduced well by the LBNN model.In contrast,the peak position for nucleus92Zr and the peak value for nucleus112Sn were underestimated by both the BNN-ZAE and BNN-OPT models.

The data were abundant for these three nuclei. Some experimental errors, including statistical and systematic errors,are shown as error bars in the figure.The error of the data was applied to train the BNN. More specifically, thedata were re-sampled with a number inversely proportional to the experimental errors. This means that the data with small errors are resampled with a large number for training,while those with large errors are applied only a few times.After 4000 iteration steps, 100 samples are applied to calculate the standard deviations and uncertainties of the predictions, which are shown as shadows in the figure. In the energy region with data, the uncertainties of the predictions by the BNN-ZAE and BNN-OPT models are small, but there should be large uncertainties in the extrapolations. Because the Lorentzian function is applied to the neural network, the predictions and their uncertainties are both constrained by the Lorentzian shape. The uncertainties of the predictions by the LBNN model were the same for interpolations and extrapolations in logarithmic coordinates. Furthermore, the uncertainties are small because the data for these three nuclei are abundant. The uncertainties shown in Fig. 4 originate from the Monte Carlo techniques. The Lorentzian function applied in the LBNN model is only an approximate expression for the photoneutron yield cross sections. A threshold exists for the photoneutron yield cross sections at ε=Sn, where Sn is the neutron separation energy. The Lorentzian shape is the most important known knowledge of photoneutron yield cross sections, but it does not consider the threshold.The experimental data near the threshold were against this formula. Thus, predictions by the LBNN model below the threshold are meaningless.

Fig. 4 (Color online) Photoneutron yield cross sections for spherical nuclei 92Zr, 112Sn,and 206Pb as a function of the incident energy.The circles with error bars show the experimental data taken from the EXFOR database. The curves with shadows show the predictions of the three types of Bayesian neural networks (as shown in Fig. 1)

For the axially deformed nucleus, the photoneutron yield cross sections as a function of incident energy display two main Lorentzian shapes. The difference between the two Lorentzian peaks has been found to be positively associated with the deformation parameter β2using the time-dependent Hartree-Fock model [40]. The data for deformed nuclei31P,75As, and165Ho are shown in Fig. 5.The quadrupole deformation parameters β2are -0.22,-0.24, and 0.28, respectively. The data clearly show two peaks for165Ho,but faintly for75As.The abundant data but large errors for31P make it difficult to distinguish the two peaks.However,the wide peak does not contradict the two main Lorentzian shapes because they may be too close to be distinguished.

The curves and shadows show the predictions with confidence intervals using neural networks.The BNN-ZAE model reproduces the overall trend for the data of31P and75As and slightly overestimates the cross sections of31P in the most energy region. However, it neglects the two obvious peaks for165Ho and predicts the cross sections with only one peak.At energy ε=14 MeV,where the data are approximately 0.25 b, the BNN-ZAE model grossly overestimates the cross section. This process is illustrated in Fig. 3. The BNN-OPT model provides a smaller RMS deviation than the BNN-ZAE model. This point is reiterated by comparing the predictions of the BNN-OPT and BNN-ZAE models. However, the confidence intervals provided by the BNN-OPT model were wider than those provided by the BNN-ZAE model.When extrapolating the cross section to the low-energy region,ε＜10 MeV for31P and75As, overfitting by the BNN-OPT model is shown.The confidence intervals show an increasing cross section with decreasing energy, which was not observed in the experiment.

Fig.5 (Color online)Same as Fig.4 but for the deformed nuclei 31P, 75As,and 165Ho.The quadrupole deformation parameters β2 of those nuclei are shown

The LBNN model reproduced the data better than the other two models did.Two Lorentzian shapes related to the quadrupole deformation parameters β2were considered in the LBNN model, as shown in Fig. 1c, hence the predictions show two peaks for deformed nuclei. It cannot be denied that the LBNN model underestimates the cross section of31P at ε=21.5 MeV, where the calculations show a valley between two Lorentzian peaks, but the data display a weak peak. The weak peak may reveal a substructure beyond the main Lorentzian shapes, which can also be observed in the data of206Pb near ε=25 MeV.However,its origin has not been explained by the physicsmotivated model, and hence, it has not been considered in the neural network. The improvement from BNN-OPT to LBNN supports the idea that the substructure beyond the main Lorentzian shapes can be considered in the neural network after its properties have been revealed.

4 Conclusion

In conclusion,the photoneutron yield cross sections as a function of the charge number Z, mass number A, and incident energy ε were studied using the BNN, and the model is abbreviated as BNN-ZAE. The numerical parameters of the neural network were varied to test the model. The influence of the activation function on the prediction was determined. The sigmoid activation function was the best for realizing nonlinearity between the input and output, and hence, it provided the smallest deviations between predictions and data. However, the predictions of the BNN-ZAE model could not be improved by increasing the number of hidden layers from 1 to 3 and the number of hidden nodes from 10 to 100.

In the method proposed in Ref.[27],physics guidance is provided in BNNs from the input layer. Several effects of the photoneutron reaction, such as the isospin dependence and shape effect, have been observed [3, 38, 39], which indicate that the cross sections of the photoneutron reaction depend on the ground-state properties of the nuclei. Based on this knowledge, the optimal ground-state properties were selected as input neurons, resulting in the BNN-OPT model. It was shown that the deviations between the predictions of the BNN-OPT model and the data were smaller than those of the BNN-ZAE model.

The BNN was further improved by the Lorentzian shape of the photoneutron yield cross sections. The Lorentzian function was applied to map the hidden nodes to the output cross sections,and the empirical formula of the Lorentzian parameters was applied to link the input nodes to the output cross sections. This new algorithm is called Lorentzian function-based BNN (LBNN). We evaluated the BNNZAE, BNN-OPT, and LBNN models by comparing their predictions of the photoneutron yield cross sections for the spherical nuclei92Zr,112Sn, and206Pb, as well as the deformed nuclei31P,75As, and165Ho. Generally, for spherical nuclei, only one main component of the Lorentzian shape exists.All three models reproduced the main trend of the data, but the predictions of the LBNN model were the best. For an axially deformed nucleus, the photoneutron yield cross sections displayed two main Lorentzian shapes. Only the LBNN model reproduced two peaks of the cross sections in the deformed nuclei31P,75As, and165Ho. This is because two Lorentzian shapes related to quadrupole deformation parameters β2are considered in the LBNN model.

Author contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yong-Yi Li,Fan Zhang and Jun Su.The first draft of the manuscript was written by Yong-Yi Li and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Nuclear Science and Techniques

2022年11期