APP下载

Novel algorithm for detection and identification of radioactive materials in an urban environment

2023-12-05HaoLinLiuHaiBoJiJiangMeiZhangJingLuCaoLinZhangXingHuaFeng

Nuclear Science and Techniques 2023年10期

Hao‑Lin Liu · Hai‑Bo Ji · Jiang‑Mei Zhang · Jing Lu · Cao‑Lin Zhang · Xing‑Hua Feng

Abstract This study introduces a novel algorithm to detect and identify radioactive materials in urban settings using time-series detector response data.To address the challenges posed by varying backgrounds and to enhance the quality and reliability of the energy spectrum data, we devised a temporal energy window.This partitioned the time-series detector response data,resulting in energy spectra that emphasize the vital information pertaining to radioactive materials.We then extracted characteristic features of these energy spectra, relying on the formation mechanism and measurement principles of the gammaray instrument spectrum.These features encompassed aggregated counts, peak-to-flat ratios, and peak-to-peak ratios.This methodology not only simplified the interpretation of the energy spectra’s physical significance but also eliminated the necessity for peak searching and individual peak analyses.Given the requirements of imbalanced multi-classification, we created a detection and identification model using a weighted k-nearest neighbors (KNN) framework.This model recognized that energy spectra of identical radioactive materials exhibit minimal inter-class similarity.Consequently, it considerably boosted the classification accuracy of minority classes, enhancing the classifier’s overall efficacy.We also executed a series of comparative experiments.Established methods for radionuclide identification classification, such as standard KNN,support vector machine, Bayesian network, and random tree, were used for comparison purposes.Our proposed algorithm realized an F1 measure of 0.9868 on the time-series detector response data, reflecting a minimum enhancement of 0.3% in comparison with other techniques.The results conclusively show that our algorithm outperforms others when applied to time-series detector response data in urban contexts.

Keywords Gamma-ray spectral analysis · Nuclide identification · Urban environment · Temporal energy window · Peakratio spectrum analysis · Weighted KNN

1 Introduction

Nuclear technology and science have enriched the lives of millions globally, with advancements in areas such as clean energy, cancer treatment, food security, and pest control.However, it is imperative that nuclear and radioactive materials employed in these beneficial applications remain secure to prevent potential misuse [1].Data from the incident and trafficking database (ITDB) of the international atomic energy agency (IAEA) reveal that between 1993 and 2020,there were 3686 reported incidents worldwide.Of these, 290 were confirmed or suspected cases of trafficking or malicious use.Notably, 12 incidents involved highly enriched uranium (HEU), and 2 featured plutonium [2].The detection and identification of illegal radioactive materials in an urban environment is crucial to ensure the safe and legal use of radioactive materials, prevent their illegal transfer, and protect the safety of the state and its citizens [3, 4].

Numerous researchers delved into the detection and identification of radioactive materials.Most studies focus on conditions, where the detector and nuclear material maintain a static position relative to each other.In these cases, the radioactive source is often scaled proportionally and linearly superimposed onto a measured background.However, real measurement environments rarely exhibit a consistent background.Thus, simulations using a constant background intensity do not adequately represent the complexities encountered in actual measurement contexts [5–7].

During routine monitoring of radioactive events, or when responding to specific incidents involving uncontrolled radioactive material, imagine a detection scenario within an urban block.Experimenters traverse this block, seeking subtle indications of radioactive materials to ascertain their presence.Notably, in the backdrop of this urban environment, the most dominant element is the naturally occurring radioactive materials (NORM) found in various construction materials such as brick, granite, and concrete [8, 9].The concentration of NORM varies near different buildings due to the unique composition of each structure and the environmental conditions surrounding it.Clearly, the background radiation within an urban environment fluctuates based on neighboring structures and prevailing environmental factors [10].Additionally, radioactive materials can sometimes exhibit low intensity, with their gamma rays being attenuated by any shielding or dense materials surrounding the source.The energy spectra derived from these scenarios may be further complicated by cumulative and peak effects [11].Given these complexities, traditional methods often struggle to effectively detect illicit radioactive materials concealed within buildings or accurately determine their types.Regrettably, false positives in radioactive material detection can lead to grave repercussions, wasting valuable time and posing potential health risks to researchers and the local populace.Consequently, algorithms designed for detecting and identifying radioactive materials should be resilient against diverse background conditions and shielding setups [12].

The task of detecting and identifying illicit radioactive materials presents significant challenges, and various studies have pursued techniques to address them.From a hardware equipment standpoint, Flanagan et al.[13] recommended the use of mobile, distributed sensors to detect nuclear materials in transit.Their research evaluated the efficacy of a mobile sensor network in detecting radioactive materials by melding radiation transport with geographic information systems.

Tran-Quang et al.[14] introduced an internet of radiation sensor system (IoRSS) designed for the detection of unregulated radioactive materials in scrap metal recycling and production facilities.This system enhances the detection,localization, and identification of radioactive materials by assimilating data from an array of portable radiation detectors.Meanwhile, Li et al.[15] pioneered the nuclide identification and quantitative analysis system (NIQAS) aimed at identifying hazardous substances via MCNP simulations.Central to this system are a D-T neutron generator and an HPGe detector.Various modules within the system were fine-tuned utilizing a signal-to-noise ratio (SNR) assessment method.

Conversely, when faced with hardware constraints, the onus shifts to the development of effective algorithms for energy spectrum analysis.A myriad of machine learning techniques, designed to emulate human cognition, have made significant strides in various domains.These include medical diagnosis [16], signal processing [17–19], and text classification [20–23].Within the realm of radioisotope identification and radiation detectors, Pfund et al.[7]delved into defining energy region boundaries and decision metrics for gamma-ray spectra.Their research illuminated that selecting specific energy regions can augment the probability of detection in scenarios with low-count or obscured sources.Concurrently, Li et al.[24] proposed a groundbreaking approach for radionuclide identification in urban settings, harnessing a feature enhancer coupled with a onedimensional neural network.Their methodology adeptly preprocesses the input energy spectrum data via the feature enhancer and seizes nonlinear information using the neural network.

Wu et al.[25] devised a peak searching technique using a generative adversarial network (GAN) tailored for urban environments characterized by low-count rates and brief measurements of single nuclide spectra.This GAN-centric approach outperforms the symmetric zero-area (SZA)method in accurately pinpointing characteristic peaks.By significantly reducing both the likelihood and number of false peaks, it bolsters the overall efficacy of peak recognition.Nonetheless, the quest to detect and identify illicit radioactive materials faces enduring challenges, including diminished detection sensitivity and the sway of environmental factors.As such, ongoing research is imperative to refine the precision and dependability of these techniques.

This study introduces a novel algorithm for the detection and identification of radioactive materials within urban environments.Our approach aims to offer a fresh solution to detect and identify radioactivity against the backdrop of complex urban settings, both during routine monitoring and in scenarios involving the uncontrolled dispersal of radioactive substances.Initially, the time-series detector response data, collected from an urban setting, were segmented using a temporal energy window.We then extracted distinct features from the energy spectra, drawing on the formation mechanism and measurement principle inherent to gamma-ray instrument spectra.These key features encompass aggregated counts, peak-to-flat ratios, and peak-to-peak ratios.Given the need for imbalanced multi-classification,we crafted a detection and identification model grounded in the weighted KNN architecture.

2 Method

The proposed method unfolds in three pivotal steps: (a) To contend with the variability of backgrounds and accentuate the primary information from the radiation source, the time-series detector response data were segmented using a temporal energy window.(b) For a comprehensive analysis and to elucidate the physical implications of an energy spectrum, distinct features were drawn from the energy spectra.This extraction leaned on the formation mechanism and measurement principle of the gamma-ray instrument spectrum, incorporating features such as aggregated counts,peak-to-flat ratios, and peak-to-peak ratios.(c) With the aim of enhancing the resilience and precision of the model for detection and identification tasks within urban settings, we fashioned a model rooted in the weighted KNN architecture.The sequence of our proposed algorithm is illustrated in Fig.1.

2.1 Temporal energy window

In this subsection, the temporal energy window is proposed for sample processing of time-series detector response data.Samples were partitioned into multiple segments with consideration of the sample type.

Urban landscapes teem with roads and structures composed of natural and man-made substances.Naturally occurring radioactive materials (NORMs) are inherent in these substances, with concentrations differing across materials.Predominantly, NORM comprises isotopes, such as40K ,238U , and232Th , along with the radioactive daughter products of the latter two, commonly denoted as KUT [26].As detectors navigate the search zone, particularly when radioactive substances are unmonitored, the makeup of the neighboring structures and their ensuing radioactive signatures shift with each locale [27].Consequently, the cumulative gamma photon count rate and spectra recorded by detectors might demonstrate notable fluctuations [28].Adding to the complexity, illicit radioactive substances may be concealed, leading to attenuated detection signals that are challenging to identify.The interplay between gamma photons and diverse substances, mediated by various physical processes, amplifies the dynamism of the observed radiation background signal.Hence, the time-series detector response data acquired in urban settings are profoundly shaped by ambient conditions, often overshadowing the distinctive peaks that mark the presence of radioactive materials in the energy spectra.

To counteract the effects of variable backgrounds,enhance the integrity and dependability of the energy spectrum data, and streamline subsequent data processing, we segmented the time-series detector response data using a temporal energy window.This strategy primarily underscores the features of faint radioactive materials.

In an urban setting, the detection system operates under two potential conditions: with or without the presence of an auxiliary radiation source, which is contextualized against the background radiation.Consequently, the timeseries detector response dataset encompasses active and passive samples.An active sample pertains to the detector response data captured in the presence of a radioactive source, while a passive sample relates to data collected in an environment devoid of any radioactive source.

Fig.1 (Color online) Block diagram of the proposed method.Partitioning the time-series detector response data using temporal energy windows, and converting the resulting corresponding segments into spectral form.Extracting features, such as aggregated counts, peakto-flat ratios, and peak-to-peak ratios, based on the formation mechanism and measurement principle of the gamma-ray instrument spectrum.Constructing a weighted KNN-based detection and identification model for the imbalanced multi-classification problem in urban environment radiation detection

The time-series detector response dataset is defined asT.

The temporal energy window is proposed for sample processing of time-series detector response data.The quantity and length of a temporal energy window were defined aswq∈ℤ+andwl∈ℤ+.The key of utilizing a temporal energy window for processing time-series detector response data lies in determining the temporal origin of the window,which refers to the initial point from which the temporal energy window conducts the partition task, thereby determining the position of the window within the time series.The temporal origin of an energy window is defined as:tj′andj′is calculated by Eq.2.

In Eq.2,bdenotes a Boolean variable.Assuming that the present sample is active,bis true, whereas if the present sample is passive, thenbis false.Obviously,tj′of a passive sample is distributed evenly at several different locations in the time axis, whiletj′of an active sample is fixed due to the demand for obtaining energy fragments as close to the source as possible.

The segmented time-series detector response dataset processed by a temporal energy window is denoted asTsegand represented by Eq.3.Samples inTsegmay have been expanded in comparison withT, which is dependent onwq.

2.2 Peak-ratio spectrum analysis

In this subsection, we delve into the formation mechanism and measurement principle of the gamma-ray instrument spectrum.These are leveraged as the foundation for extracting spectral features.Key features include aggregated counts, peak-to-flat ratios, and peak-to-peak ratios.This type of an approach aids in the analysis and interpretation of the intrinsic significance of energy spectra.

After processing through the temporal energy window,the segmented time-series detector response data are converted into an energy spectrum format, easing the subsequent feature extraction.The energy spectrum provides a distribution curve mapping the count rate against particle energy, a pivotal tool in detecting and identifying radioactive nuclear materials.

For the context of this study, the relative distance between the detector and radiation source is in constant flux due to the detector’s movement.It is essential to underline that this study primarily focuses on scenarios with static radiation sources.Dynamics, such as the continuous movement of the source or its dissolution in water, have not been contemplated.Owing to the finite number of photon counts within the full-energy peak, statistical fluctuations become pronounced.Consequently, the channel with the peak counts might not align with the expected value of a Gaussian distribution [29, 30].To mitigate the effects of these statistical fluctuations, spectral data are reorganized into multiple bins along the energy axis.Each bin encompasses an energy range, and counts within this range are consolidated to create a novel feature vector.

The transformed spectrum dataset is denoted asTspe.

The value ofnis determined based on the expected value of the maximum and minimum energy values across all samples inT.The expected value of the maximum and minimum energy values is denoted asE[max(ei)]andE[min(ei)],i=1,2,…,M.n∈ℤ+and the value ofnis calculated by Eq.8.

Detection and identification of radioactive materials primarily hinge on nuclear radiation detectors, which capture gamma rays emitted during the decay process.The measurement of gamma-ray energy is determined by registering the energy dispersed within the detector.The main mechanisms driving gamma energy spectrum measurements encompass three interactions between gamma rays and the detector medium: the photoelectric effect, the Compton effect, and pair production.

Low-energy gamma rays (0 – hundreds of keV) predominantly undergo the photoelectric effect, resulting in at least one distinct photoelectric peak.Medium-energy gamma rays (hundreds of keV – 3 MeV) primarily interact through the Compton effect.Conversely, high-energy gamma rays (5–10 MeV and beyond) are primarily subject to pair production.The photoelectric peak, when the energy of the incident gamma radiation is below 1.02 MeV, is often termed as the full-energy peak.This peak is traditionally considered as the primary hallmark for identifying specific radioactive nuclides.The full-energy peak arises from the sum of the photoelectric peak’s energy combined with energy from Compton electrons and photoelectrons stemming from Compton scattering interactions.In the spectrum of low-to-medium energy gamma rays,pair production is negligible.Instead, the energy spectrum is characterized by a Compton continuum and photoelectric peaks.When gamma rays possess intermediate energy, incident gamma photons experience multiple successive Compton scatterings.The energy from the recoil electrons, produced from these scatterings, is deposited in the detector.Notably, the cumulative energy of these recoil electrons can surpass the energy transfer’s upper limit in a single scattering event, filling regions between the Compton edge and photoelectric peaks [9].

From the prior discussion on the formation mechanism and measurement principle of the gamma-ray instrument spectrum, it is clear that gamma energy spectra contain both photoelectric peaks and the Compton continuum.Conventionally, the photoelectric peaks serve as the primary identifiers for radionuclides.Conversely, the Compton continuum, which often exhibits similar shapes across different contexts, is usually overlooked.However, relying solely on characteristic peaks for radioactive nuclide identification may fall short in complex background situations[31–33].Drawing inspiration from Ref.[7], this subsection introduces peak-to-flat ratios and peak-to-peak ratios as descriptors for the spectral features.

The boundaries of the Compton continuum are denoted byclandcr, while the characteristic peaks are bounded byflandfr, and the auxiliary peaks are bounded byalandar.These boundaries are selected based on the decay properties of the radionuclide material.The peak-to-flat ratior1and peak-topeak ratior2are defined as Eq.13.

Based on a macroscopic perspective,r1characterizes the capability to discern low-energy weak peaks amidst a complex background, whiler2measures the likelihood of gamma rays experiencing multiple interactions within the detector,culminating in their contribution to the full-energy peaks.

2.3 Classification

In this subsection, considering the requirements for imbalanced multi-classification, we developed a detection and identification model using the weighted KNN architecture.By capitalizing on the inherent trait that energy spectra from identical radioactive materials exhibit minimal inter-class variability, the model significantly boosts the classification accuracy for underrepresented classes and improves the overall efficacy of the classifier.

Following the peak-ratio spectrum analysis, we derive a set of feature vectors comprised of aggregated counts, peakto-flat ratios, and peak-to-peak ratios, denoted asTapp.represents the Euclidean norm.Furthermore,L∞-norm(wherep→∞ ) represents the maximum norm.

To limit the effect of farther samples points and also avoid divisions by zero, the distance function is implemented as follows.If ‖‖c0-c(i,k)‖‖p=0, then distanced(c0,c(i,k)) is set toε.If0<‖‖c0-c(i,k)‖‖p≤R, then distanced(c0,c(i,k))is calculated by Eq.19.If ‖‖c0-c(i,k)‖‖p>R, then distanced(c0,c(i,k))=0, whereεis a small number andRis the radius of the distance functiond(⋅).Table 1 summarizes the overall flow of the algorithm.

3 Experiments and analysis

This section detailed the processes of data acquisition and preprocessing and established a series of comparative experiments to validate the efficacy of the proposed algorithm.All experiments conducted in this section utilized tenfold cross-validation to guarantee the reliability of the results.

3.1 Introduction of data source

The experimental data utilized in this study originated from a time-series detector response dataset, representing a NaI(Tl) detector’s movement within a simulated city block using the Monte Carlo method.This dataset was curated by J.M.Ghawaly Jr and his team at Oak Ridge National Laboratory (ORNL) [34].Figure 2 offers a visual representation capturing the core features of the dataset.The model simulated seven interconnected city blocks in three dimensions,encompassing various buildings, sidewalks, roads, parking areas, and other urban elements.The naturally occurring radioactive materials (NORMs) incorporated included40K ,232Th and its progeny, as well as238U/235U and their respective offspring.The concentration of each component within the KUT (potassium, uranium, and thorium) might vary depending on the specific material.Each block’s background radiation was individually computed.The radioactive materials were potentially concealed in 15 distinct spots.Each radioactive source could exist in one of two states: either unshielded or shielded by 1 cm of lead.A NaI(Tl) detector navigated through these city blocks without the interference of cars or other forms of clutter.

The dataset comprises radioactive materials from two categories: special nuclear materials (SNMs) and common sources.The SNMs are represented by highly enriched uranium (HEU) and weapons-grade plutonium (WGPu),while the common sources are technetium-99m (99mTc ),iodine-131 (131I ), and cobalt-60 (60Co ).Both HEU and WGPu are characterized by energy spectra dominated by prompt fission neutrons and prompt gamma rays, which are emitted during fission.These gamma rays possess a broad energy range, spanning from several hundred keV up to multiple MeV.Conversely,99mTc releases gamma rays thatpredominantly linger around the 140-keV energy mark.131I emits mainly beta particles accompanied by gamma rays; the beta particles peak at energies near 606 keV.The emitted gamma rays have varying energy levels, with the most notable peaks observed at 364 keV and 637 keV.Finally,60Co radiates gamma rays that prominently feature two energy peaks, one at 1.17 MeV and the other at 1.33 MeV [9].

Table 1 Overall flow of the proposed method

Table 1 (continued)

Fig.2 (Color online) Schematic diagram of the fundamental characteristics of the dataset.This model consisted of seven modular city blocks, and the order of the blocks can be adjusted.Size of the model was 989–1047 m × 201 m × 158 m.For each component of the blocks, every NORM isotope in each material (asphalt, brick, granite,concrete, and soil) in its composition was modeled.These data form the background of the urban environment.A 2′′×4′′×16′′ NaI(Tl)detector traversed the city block in the absence of cars or other forms of clutter.The velocity of the detector was a value in the range of 1–13.4 m/s and remains constant.The walls of the buildings in the model were 6 in (15.24 cm) thick [34]

Table 2 Radionuclide library

Specifically, 9700 samples were labeled and are listed in Table 2, of which 4900 were background samples without any radioactive materials, while the remaining 4800 samples contained radioactive materials.

Fig.3 Confusion matrix for the proposed algorithm’s test results.Each cell within the matrix’s core is normalized according to the total observations of the respective class, illustrating the proportion of correctly identified samples within the whole dataset.The column summary indicates the percentage of correct and incorrect classifications for each predicted class, scaled by the overall observations of that predicted class.Similarly, the row summary portrays the percentage of correct and incorrect classifications for each actual class, adjusted by the total observations of that specific class

3.2 Comparative experiments

In this subsection, to optimize and assess the model’s performance while ensuring its practical applicability, the timeseries detector response dataset was partitioned into three distinct subsets: training (60%), validation (20%), and testing(20%).This division was subjected to tenfold cross-validation.A stratified random split was adopted, guaranteeing a balanced representation of radioactive materials across all subsets.The model was implemented in Python, leveraging the capabilities of the PyTorch framework.For comparative analysis, the Weka machine learning toolkit was employed.All experiments were executed on a system furnished with an Intel Core i7 processor, 16 GB of RAM, and an NVIDIA GeForce RTX 3070 graphics card.

To streamline the discussion, the proposed algorithm in the experimental results will be referred to as TPW.In the experiments detailed in this subsection, the values ofK,p,andqwere set to 5, 2, and 2, respectively.A comprehensive examination and discussion regarding the selection of these parameter values is given in Sect.3.5.The testing accuracy achieved was 99.1%, with an F1 score of 0.9868.The confusion matrix derived from the test data is viewed in Fig.3.

In summary, the TPW algorithm has shown promising results in both passive backgrounds and active scenarios.Analysis of the column and row summaries indicates that the most notable misclassifications occur between class 1(HEU),5(99mTc ), and 6(HEU+99mTc ).The primary reason for this is the presence of radioactive material in the detection scene of class 6, which is also present in classes 1 and 5, causing ambiguity in the identification process.

Furthermore, to provide a comprehensive comparison,the standard KNN (KNN) [16, 35], support vector machine(SVM) [17, 36], Bayesian network (BayesNet) [18, 37],random tree (RandomTree) [19, 38], and the proposed algorithm (TPW) were applied for evaluation.The aforementioned methods were commonly utilized for radionuclide identification classification in recent years.Comparative experiments were conducted using the Weka machine learning toolkit [39] with a batch size of 100 and tenfold cross-validation.The main parameters used for each method were as follows: For standard KNN, the number of neighbors was set to 5 with no distance weighting.For SVM, the Poly Kernel was used as the kernel function.The complexity and tolerance parameters were set to 1.0 and 0.001, respectively.For Bayesian network, the initial network structure used for learning was the naive Bayes network and the maximum number of parents that a node in the Bayes net can have was limited to 1.For Random Tree, the random number seed used for selecting attributes was 1, and the minimum total weight of instances in a leaf was set to 1.0.The maximum depth of the tree was unlimited.

Given the imbalanced nature of the dataset, relying solely on traditional classification accuracy can be misleading.For instance, a model might achieve high accuracy simply by categorizing all samples as the majority class, in this case,“Background.” Hence, a variety of evaluation metrics were utilized in this subsection to provide a holistic view of model performance.Figure 4 presents these metrics for different models.Comparing the TPW algorithm with four other methods (standard KNN, support vector machine, Bayesian network, and random tree) across five distinct evaluation metrics, it became evident that TPW excels.Specifically,TPW consistently showcased superior accuracy, F1 score,MCC, ROC area, and PRC area when compared to its counterparts.This underlines TPW’s enhanced efficacy and reliability in tasks related to radionuclide identification.

Fig.4 Multiple evaluation metrics across various models.The x-axis represents the evaluation metrics including accuracy (Acc), F1 measure (F-measure), Matthews correlation coefficient (MCC), receiver operating characteristic area (ROC area), and precision–recall curve area (PRC area).The y-axis shows the performance values for each method under the corresponding metrics for every class of samples

We further performed individual tests for each class of samples, contrasting the TPW algorithm’s performance with the four other methods using the F1 measure.Figure 5 showcases the classification results across different models for every sample class.Overall, the TPW algorithm emerged as the top performer among all the tested methods.In particular, it exhibited a commendable capacity to accurately classify samples from every class, underscoring its robustness and adaptability to various sample types.

Examining the results in depth, we note that the efficacy of different methods varies considerably across classes.In particular, for some classes, the TPW algorithm notably surpasses the F1 measures of its competitors, while for others, the performance differences are more nuanced.This indicates that the TPW algorithm is especially adept at processing certain sample types, although its relative advantage might be less distinct for other sample types.Notably, the F1 measure for the samples of class 0 (background), 3 (131I ),and 4 (60Co ) is higher, whereas it is somewhat subdued for class 1 (HEU), 2 (WGPu), 5 (99mTc ), and 6 (99mTc ).By analyzing the peak energies of these radioactive materials, we discerned that their characteristic peaks are all below 200 keV.This suggests that their accurate detection and identification might be compromised by the Compton continuum.Nevertheless, the TPW algorithm excels over other models,surpassing them by at least 0.18% in multi-isotope scenarios like HEU+99mTc , and showcases lower variability than other models when detecting radioactive materials.

Fig.5 F1 measure for each class of samples across various models.The F1 measure values of different methods for each class of samples are plotted on the y-axis of the plot, while the x-axis of the plot indicates the corresponding class of samples as listed in Table 2

3.3 Discussion on temporal energy window

In this subsection, we explored the effects of varying parameters associated with the temporal energy window, specifically focusing on the number and duration of these windows.The movement of the detector poses challenges in determining the ideal length for an energy window.A brief window,such as 1 s, might not capture sufficient relevant data, while an extended window, such as 20 s, could introduce substantial noise, potentially overshadowing crucial signals.Figure 6 depicts the energy spectra for a sample from class 4(60Co ) at varying energy window durations.As the window length increased, there was a noticeable rise in the count of the energy spectrum.However, the count values across different energy points did not grow linearly with the expansion of the window length, possibly due to statistical fluctuations and other influencing factors [29, 30].For a deeper insight into the nuances of the spectral lines, Fig.7 highlights the variations in the morphology and attributes of the energy spectra, as the window length transitions across five distinct durations.

Fig.6 (Color online) Energy spectra with respect to different temporal energy window lengths.This figure illustrates the energy spectra of 60Co for different temporal energy window lengths of 1, 3, 5, 10,and 20 s.The energy range shown is between 0 and 3000 keV.The count value displayed on the Z-axis shows an increasing trend with the varying window lengths

Figure 8 illustrates the model’s performance changes in relation to varying window lengths and quantities.The data indicate that as the number of temporal energy windows increases, the model’s prediction accuracy also rises,particularly when the window quantity equals 5.With the increase in the number of temporal energy windows, the proposed algorithm captures a richer representation of the signal, enhancing the differentiation between signal and noise, and thus providing a more precise target prediction.Additionally, the model’s prediction accuracy trends upward with longer window durations.However, there is a significant decline in accuracy with excessively long windows.These experiments shed light on the influence of window lengths and quantities on classification efficacy, emphasizing the need for optimal temporal scale selection and feature extraction methods for accurate classification in the given task.

Fig.8 Performance variations of models with different window lengths and quantities.The plot displays the F1 measure values for various parameter combinations, with x-axis representing the energy window lengths

Fig.7 (Color online) Detailed energy spectra across varied window lengths.These five diagrams offer an in-depth examination of the energy spectra of 60Co , captured at different temporal energy window durations, complementing Fig.6.The inset images in the upper right magnify the spectrum within the highlighted red boxes.The irregular growth in count values at various energy addresses, as the window length expands, can be linked to statistical fluctuations and other potential influences

Table 3 Results of ablation experiments

3.4 Discussion on peak-ratio spectrum analysis

In this subsection, we explored the influence of individual and combined features on the classification performance using ablation experiments.Table 3 contrasts the classification outcomes of the aggregated energy spectrum counts,peak-flat ratio, and peak-peak ratio features against those using the joint features, shedding light on their individual and combined impacts.

The F1 measure comparisons reveal that combined features outperform their individual counterparts.This superior performance of joint features arises from their capacity to seamlessly assimilate energy spectrum data from diverse viewpoints.By harmoniously harnessing the unique attributes of each feature, joint features amplify classification precision, overshadowing the results achieved by singular features.Additionally, joint features adeptly counteract challenges intrinsic to individual features, such as noise interference, data sparsity, or lack of comprehensive representation.Conversely, singular features often struggle to offer a holistic and resilient information foundation for classification [23].Therefore, combined features furnish a more holistic and richer data representation, bolstering classification effi-ciency.In essence, the empirical findings underscore the merit of deploying joint features for more accurate radionuclide identification.

3.5 Discussion on classification model

In this subsection, we examined the proposed algorithm by assessing the impact of three factors: the distance metric,value ofK, and distance weight.We experimented with various distance metrics, including Euclidean, City block,Chebyshev, Correlation, Spearman, Hamming, Jaccard, and Cosine.The value ofKwas varied between 1 and 20.For distance weighting, we considered three approaches: "Equal distance" (ED), which did not incorporate any weight;"Inverse distance" (ID), where the weight was based on the inverse of the distance to the data point; and "Inverse distance squared" (IDS), where the weight was determined by the inverse of the squared distance to the data point.The experimental results are listed in Table 4.During the experiments,Kwas set to 10 and all the samples were subjectedto standardization, bold numbers in Table represent the best results under this experimental setting.

Table 4 Results of discussion on classification model

Based on the observed F1 measure during validation and testing, the distance metric of Correlation was superior in the validation experiments, while the distance metric of City block outperformed the other distance metrics in terms of classification accuracy in the testing experiments.In terms of distance weighting, the IDS emerged as the superior performer.By significantly reducing the influence of far-offpoints on the classification decision,IDS led to more precise and dependable results.

Selecting an appropriate value ofKwas crucial for optimal model performance.A very smallKmakes the model vulnerable to noise in the feature points, which can greatly influence classification outcomes.Conversely, an overly largeKdilutes the specificity of the model as the neighborhood around the training instance becomes too expansive, increasing the likelihood of misclassifications[16, 35].Thus, striking a balance between noise resistance and model precision by carefully adjusting theKvalue is imperative.We undertook a series of tests to discern the effect of differentKvalues, ranging from 1 to 20, on the efficacy of the proposed algorithm.The outcomes of these tests are depicted in Fig.9.

From the results, it can be observed that the F1 score initially increases as the value ofKincreases from 1 to 5,reaching a peak value of 0.9868 atK=5.Then, F1 score slightly fluctuates and then starts to decline asKfurther increases.Additionally, the results suggest that the model has a high overall performance, with F1 scores consistently above 0.95 for all values ofK.This indicates that the model is effective in accurately predicting the class labels of the input data.This pattern of results suggests that increasing the value ofKcan lead to better classification performance up to a certain point, beyond which overfitting may occur, resulting in a decline in performance.

Fig.9 Performance variations based on different K values.The figure showcases the F1 measure for the proposed algorithm for K values spanning from 1 to 20, plotted on the X-axis.The Y-axis indicates the F1 measure.Two distinct curves denote the outcomes from verification and test experiments, respectively

4 Conclusion

This study introduces a novel approach for detecting and identifying radioactive materials within urban settings.

From the conducted comparative experiments, we derive the following key conclusions: (1) Detector response data,when viewed as a time series, are effectively segmented using temporal energy windows.Segmenting the data in this manner mitigates the impact of shifting backgrounds,enhances the reliability and quality of energy spectrum data, and streamlines the downstream data processing.This segmentation yields an energy spectrum that emphasizes pivotal information pertaining to the radioactive materials.(2) Our feature extraction strategy taps into the formation mechanisms and the measurement principles of gamma-ray instruments, yielding deep insights into the physical nature of energy spectra.The features we extract, including aggregated counts, peak-to-flat ratios, and peak-to-peak ratios,offer a comprehensive view of the sample’s multidimensional attributes.This approach negates the need for individual peak analysis and peak searches, thereby enhancing the efficiency and precision of data processing.(3) The custom weighted KNN model crafted for detection and identification capitalizes on subtle variations in energy spectrum classes for identical nuclides.This shifts the classification challenge into a task of partitioning the feature space.By incorporating weights, our model counters the issues posed by imbalanced datasets, meets the requirements of real-time and multiclassification detection, and fortifies the robustness of the detection model, especially when operating against intricate urban backgrounds.

However, this technology does come with certain limitations.It struggles when tasked with measuring systems affected by improvised nuclear devices (INDs) or tactical nuclear artifacts.The detectors are vulnerable to electromagnetic pulses (EMPs) and prompt gamma rays,which span a broader energy spectrum.It is our hope that subsequent research will overcome these hurdles.Additionally, this study’s limitation lies in its exclusive reliance on simulated spectra for training and testing the algorithm.Future endeavors need to validate the findings against spectra measured from actual detectors.Moving forward,our research will prioritize the practical application of this method in real-life scenarios.This includes refining detection accuracy, minimizing false positives, and bolstering the algorithm’s computational prowess.Moreover, this technique holds promise for broader applications, potentially benefiting areas such as nuclear safety, environmental conservation, and public health.

AcknowledgementsThanks for the https:// doi.org/ 10.13139/ ORNLN CCS/ 15974 14 dataset which is provided by the Oak Ridge National Laboratory.

Author contributionsAll authors contributed to the study conception and design.Material preparation, data collection, and analysis were performed by Hai-Bo Ji, Jiang-Mei Zhang, Cao-Lin Zhang, Jing Lu,and Xing-Hua Feng.The first draft of the manuscript was written by Hao-Lin Liu, and all authors commented on previous versions of the manuscript.All authors read and approved the final manuscript.

Data availabilityThe data that support the findings of this study are openly available in Science Data Bank at https:// doi.org/ 10.57760/ scien cedb.10892 and http:// cstr.cn/ 31253.11.scien cedb.10892.

Declarations

Conflict of interestThe authors declare that they have no competing interests.