Radiomics model for distinguishing tuberculosis and lung cancer on computed tomography scans
2020-04-07NuoCuiTaoYuShengJieShangXiaoYuWangYiLinJinYueDongHaiZhaoYaHongLuoXiRanJiang
E-Nuo Cui, Tao Yu, Sheng-Jie Shang, Xiao-Yu Wang, Yi-Lin Jin, Yue Dong, Hai Zhao, Ya-Hong Luo, Xi-Ran Jiang
E-Nuo Cui, Hai Zhao, School of Computer Science and Engineering, Northeastern University,Shenyang 110619, Liaoning Province, China
E-Nuo Cui, School of Computer Science and Engineering, Shenyang University, Shenyang 110044, Liaoning Province, China
Tao Yu, Xiao-Yu Wang, Yue Dong, Ya-Hong Luo, Medical Imaging Department, Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shenyang 110042,Liaoning Province, China
Sheng-Jie Shang, Yi-Lin Jin, Xi-Ran Jiang, Department of Biomedical Engineering, China Medical University, Shenyang 110122, Liaoning Province, China
Abstract BACKGROUND Pulmonary tuberculosis (TB) and lung cancer (LC) are common diseases with a high incidence and similar symptoms, which may be misdiagnosed by radiologists, thus delaying the best treatment opportunity for patients.AIM To develop and validate radiomics methods for distinguishing pulmonary TB from LC based on computed tomography (CT) images.METHODS We enrolled 478 patients (January 2012 to October 2018), who underwent preoperative CT screening. Radiomics features were extracted and selected from the CT data to establish a logistic regression model. A radiomics nomogram model was constructed, with the receiver operating characteristic, decision and calibration curves plotted to evaluate the discriminative performance.RESULTS Radiomics features extracted from lesions with 4 mm radial dilation distances outside the lesion showed the best discriminative performance. The radiomics nomogram model exhibited good discrimination, with an area under the curve of 0.914 (sensitivity = 0.890, specificity = 0.796) in the training cohort, and 0.900(sensitivity = 0.788, specificity = 0.907) in the validation cohort. The decision curve analysis revealed that the constructed nomogram had clinical usefulness.CONCLUSION These proposed radiomic methods can be used as a noninvasive tool for differentiation of TB and LC based on preoperative CT data.
Key Words: Pulmonary tuberculosis; Lung cancer; Radiomics; Computed tomography;Computer-aided diagnosis; Nomogram
INTRODUCTION
Pulmonary tuberculosis (TB) is a global public health threat, which represent > 80% of clinical TB cases. Its effects on the lungs involve chronic inflammation that is reported to cause carcinogenesis of lung tissue[1]. Lung cancer (LC) has a poor prognosis, and is one of the most common cause of death due to cancer worldwide[2]. These two diseases are both common, with high prevalence and similar symptoms and clinical presentation. Hence, patients with LC are often misdiagnosed with pulmonary TB,which may delay timely treatment, and even expose patients to inappropriate medication.
Previous studies have examined the association between and diagnosis of pulmonary TB and LC through clinical symptoms and signs, and blood transcriptional profiles[3-5]. Such methods mainly relied on the subjective experiences of clinicians and were therefore unreliable. Imaging examinations, such as computed tomography (CT),are useful tools. However, in clinical practice, due to the radiological similarities between TB and LC, even highly trained radiologists relying on CT data are often prone to misdiagnosis or missed diagnosis. Therefore, the determination of TB or LC is based on histopathological analysis, such as invasive biopsy, with the associated inherent risk of these invasive procedures[6-8]. Thus, noninvasive and computer-aided alternatives are required to improve the discrimination of TB and LC.
In recent years, radiomics has attracted increasing attention due to its highthroughput extraction and selection of discriminative features from medical imaging data, and to construct machine learning classifiers and a radiomics nomogram model to assist in disease diagnosis, prediction of disease status, and response to treatment[9-11]. This has been shown to improve the detection and discrimination performance of medical images compared with those made by radiologists[12-15]. The radiomics approach has been used to predict tumor subtype[16]and metastasis[17]in patients with lung disease. However, to the best of our knowledge, there is still no instance of the application of radiomics in differentiating TB and LC. Thus, the present study aims to establish and validate radiomic methods to distinguish TB from LC,based on pretreatment CT data.
MATERIALS AND METHODS
Patients
The retrospective analysis conducted on lung CT data was approved by the Institutional Research Ethics Board of our institute. A total of 478 patients were enrolled between January 2012 and October 2018 in the Liaoning Cancer Hospital and Institute. The number of patients with pulmonary TB and LC was 244 and 234,respectively. All patients were pathologically confirmed with pulmonary TB or LC,which is the gold standard. Inclusion criteria were as follows: (1) Patients aged > 18 years; (2) Patients who underwent CT thorax screening before surgery; and (3) Patients who underwent surgical resection with pathological confirmation. Exclusion criteria were as follows: (1) Patients exhibiting other tumors; (2) Patients with a history of lung surgery, or radiotherapy or chemotherapy; and (3) Patients with artifacts in CT images. All patients were randomly divided into the training and validation cohorts at a ratio of 2:1.
CT image acquisition
All patients were scanned with a 64-slice spiral CT (Syngo 2009A; Siemens, Germany):voltage 120 kV, current 200-350 mAs, slice thickness 5.0 mm, and array 512 × 512. The obtained CT thoracic images with a resolution of 2457 × 1996 were interpreted on a Hologic breast computer-aided diagnosis workstation (SecureView Dx; Hologic)equipped with two 5-megapixel monitors, and stored in the Picture Archiving and Communication System of the hospital in Digital Imaging and Communications in Medicine format.
Segmentation and mask dilation
The lesion regions of interests (ROIs) were drawn manually by two radiologists with 12 and 14 years of experience for each patient using the ITK-SNAP software (version 3.6.0, www.itk-snap.org). Other senior radiologists and clinicians were invited to join the decision-making process whenever a divergence occurred during the segmentation. None of the radiologists and clinicians had prior knowledge of the pathological results of these patients. The segmented ROIs were exported into MHA format, and used for image feature extraction. To evaluate the discriminative power of the peritumor tissues. Dilated masks were obtained by dilating the original ROI of each CT slice with 10 different radial distances. The dilated radial distance was up to 10 mm outside the lesion region. The dilated masks are shown in Figure 1. The original ROI segmented by radiologists is colored red. Rings with different colors indicate various radial dilation distances surrounding the lesion.
Feature extraction and selection
The imaging features included the following: First order statistics, shape-based, graylevel co-occurrence matrix (GLCM), gray-level size zone matrix, gray-level run length matrix (GLRLM) and neighborhood gray-tone difference matrix[9,18]. These were extracted from the lesions using Python (version 3.6.5). The least absolute shrinkage and selection operator (LASSO) logistic regression was used to exclude features that were redundant, while the predictive features in relation to pulmonary TB and LC remained[19]. The LASSO-selected features were further used to calculate a radiomics score for constructing the radiomics nomogram as a routine radiomics analysis process[20].
Construction of the radiomics nomogram model
The radiomics score was calculated by a linear combination of selected features weighted by the respective LASSO coefficients for each patient[14,21]. A radiomics nomogram model for differentiating LC from TB was constructed based on the multivariable logistic regression analysis using the “rms” package in the R language(v. 3.5.0; available from URL: https://www.r-project.org).
Validation strategy
Figure 1 Example of dilated masks with various radial dilation distances on a computed tomography image of a lung cancer patient. Each color ring indicates 2.0 mm width.
The performance of binary classifications was evaluated using the receiver operating characteristic (ROC) curve analysis for both the nomogram model and machine learning classifiers. The optimal cut-off values of the ROC curves were selected based on the maximum Youden index[22]. The area under the ROC curve (AUC) values were calculated to quantify the discrimination performance. Three comparison metrics,including accuracy, sensitivity and specificity, were also computed following the standard formulas described previously[23]. Calibration curves were plotted to evaluate the calibration of the constructed radiomics nomogram model. A decision curve analysis (DCA) was conducted to assess the clinical utility of the nomogram, by quantifying the net benefits for a range of threshold probabilities in the training and validation groups. All algorithms were run on a 64-bit hexa-core 3.7 GHz Intel i7-6700K CPU with 128 GB of 3000 MHz DDR4 RAM.
RESULTS
The best radial dilation distance
To evaluate the discriminative performance of peritumoral tissues, dilations of ten distances were performance from the original ROI. As shown in Table 1, the radiomics features were extracted from the ROI when the dilation was 0. The features were obtained from peritumor tissues when the dilations were from 1 to 10.
The model with lowest overfitting was obtained when the dilated radial distance equaled 4.0 mm. At this dilated distance, the highest AUCs of 0.914 and 0.900 on the training and validation cohorts, respectively, were also achieved.
Evaluation of the selected radiomics features
Eight radiomics features were selected by the LASSO process at the best dilation distance. Table 2 shows the selected features with the AUCs andPvalues in the training and validation cohorts. Figure 2 shows the boxplots of the eight selected radiomics features between the TB and LC groups.
Development of the radiomics nomogram model
The radiomics signature that consisted of eight features from the best radial dilation distance was obtained by logistic regression, and was as follows: Ct Score = 447.771 -360.807 × lbp-2D_firstorder_Entropy-4.955 × lbp-3D-k_firstorder_10Percentile + 27.755× log-sigma-3-0-mm-3D_glcm_Idn + 0.0000143 × log-sigma-5-0-mm-3D_glrlm_RunLengthNonUniformity - 0.0000753 × squareroot_gldm_DependenceNonUniformity + 33.277 × wavelet-HLH_glcm_Idn + 4.746 × wavelet-HLL_glcm_Idn-195.455 × wavelet-LLL_glcm_Idmn.
A nomogram model was then constructed (Figure 3A), which includes the radiomics score for differentiating TB and LC in the second row. The favorable calibration of the present radiomics nomogram model was confirmed in the training and validation groups (Figure 3B and C). The calibration curves indicated good agreements between the nomogram-estimated probability and actual outcome. The X and Y axes represented the calculated and actual probabilities, respectively. The diagonal blue line represented the performance of an ideal diagnostic model.Furthermore, the red dotted line represented the performance of the constructed nomogram model. The closer the red dotted line was to the diagonal blue line, thebetter the discriminative performance achieved by the nomogram model. The nomogram model exhibited a marked discriminative efficacy, with an AUC of 0.914 in the training group and 0.900 in the validation group (Figure 3D and E). Hence, the constructed nomogram model has good discriminative power in differentiating TB from LC.
Table 1 Discriminative performance of peritumoral tissues with different radial dilation distances on lung cancer and pulmonary tuberculosis
The decision curve analysis showed that our nomogram model for distinguishing TB and LC patients was advantageous, which indicates the good performance of the nomogram in terms of clinical application (Figure 4).
DISCUSSION
Prompt diagnosis, such as the discrimination of TB from other chronic lung disorders,including LC, is important in providing appropriate and timely treatment[24]. Reports have shown that the delay in the diagnosis and treatment of LC frequently leads to poor outcome and survival[25]. However, LC often exhibits similarities to TB, requiring invasive biopsy for distinguishing these two diseases[4,26]. In clinical practice, even radiologists with decades of experience may still misdiagnose TB and LC using CT imaging data, or even miss the diagnosis altogether. There is little understanding on the differentiation of the two diseases in CT images using computer-aided methods,with no reported attempts. Therefore, these radiomic methods were proven to improve the differentiation between TB and LC.
The selected radiomics features from CT images included two local binary patterns,two Laplacian of Gaussian, one square root and three wavelet-filtered features. The original images were first filtered with corresponding filters, then used to extract handcrafted features. The first-order features describe the distribution of voxel intensities in images. The GLCM features quantify the second-order joint probabilities of images. The GLDM and GLRLM features describe gray-level dependencies and gray-level runs in an image, respectively. Our findings might partially explain the fact that radiologists find it hard to distinguish between TB and LC, since the discriminative CT markers all belonged to high-dimensional space that can hardly be understood by naked eye examination. However, the selected features were all closely related with gray intensities, which indicate that changes in gray levels in lung lesion CT images can potentially assist in the differential diagnosis of TB and LC. The discriminative power of peritumoral tissues was also evaluated. We extracted imaging features from several radial dilation distances (up to a radial distance of 10 mm outside the lung lesion) from the original ROI. Our results revealed that the peritumoral area exhibited more discriminative power than the intratumoral area. The radiomics model with lowest overfitting and best AUCs were obtained at 4 mm
outside the lesion for the training and validation cohorts. The results were consistent with previous studies and showed that CT-based peritumoral radiomics are important in the diagnosis of lung lesions[27,28].
Table 2 The eight radiomics features selected from the lung computed tomography images
Figure 2 Boxplots of the eight radiomics features correlated with pulmonary tuberculosis vs lung cancer. A: Lbp-2D_firstorder_Entropy; B: Lbp-3D-k_firstorder_10Percentile; C: Log-sigma-3-0-mm-3D_glcm_Idn; D: Log-sigma-5-0-mm-3D_glrlm_RunLengthNonUniformity; E:Squareroot_gldm_DependenceNonUniformity; F: Wavelet-HLH_glcm_Idn; G: Wavelet-HLL_glcm_Idn; H: Wavelet-LLL_glcm_Idmn. TB: Tuberculosis; LC: Lung cancer.
Figure 3 The radiomics nomogram for the differentiation of tuberculosis and lung cancer. A: The construction of the nomogram model; B, C: The calibration curves of the nomogram model in the training group (B) and validation group (C), respectively; D, E: The receiver operating characteristic curves of the nomogram model in the training group (D) and validation group (E), respectively. CT: Computed tomography.
Figure 4 The decision curve analysis for the constructed radiomics nomogram model. The X and Y axes represent the threshold probability and net benefit, respectively. The red line indicates the constructed nomogram model. The blue line represents the hypothesis that all patients had lung cancer. The black line represents the assumption that all patients had tuberculosis.
To the best of our knowledge, there is no previous report on differentiating TB and LC using CT radiomics. Our findings indicated the diagnostic value of peritumoral regions that have a dilation distance of approximately 4 mm outside the lesions. From the peritumoral area, 1967 imaging features were extracted. A radiomics signature was obtained using the LASSO algorithm by reducing high-dimensional and overfitting data[19]. The constructed nomogram model exhibited favorable discrimination of TB and LC, with AUCs of 0.914 and 0.900 in the training and validation groups,respectively. In addition, good sensitivity and specificity were also obtained, which revealed the low misdiagnosis rate and missed diagnosis rates of our model. The calibration of the present nomogram model was confirmed by calibration-curve-based analysis, which revealed excellent agreement with the actual outcome. To further evaluate whether our nomogram-assisted diagnosis method improved patient outcomes, the clinical usefulness of the model was assessed by DCA, which quantified net benefits for a range of threshold probabilities in the training and validation cohorts. If radiologists use the proposed radiomics nomogram model for differentiating TB and LC, they need first to manually segment the lesions in the CT thoracic images for each patient, and calculate the probabilities of TB or LC based on the nomogram model. Then, the radiologists could consider the clinical information,calculate the probabilities for these patients, and accordingly make a comprehensive decision on medical treatment.
There are a few limitations in the present study. First, all CT data were obtained from a single hospital, which may be inherently biased. Second, only CT images were used to perform the radiomics analyses. Clinical parameters should be incorporated in future studies[29]. Third, the ROIs in each image were manually segmented, which is time-consuming. We found a recent study that developed a 3D U-net algorithm for lesion segmentation in CT thoracic images[21].This enlightened us to explore automatic segmentation and classification methods in future studies.
CONCLUSION
The radiomic methods for differentiating LC and TB using CT thoracic images are presented in this study. The established nomogram model exhibited favorable classification performance, indicating its potential as an assisting tool in future clinical applications.
ARTICLE HIGHLIGHTS
Research background
Pulmonary tuberculosis (TB) and lung cancer (LC) are common pulmonary diseases with high incidence and similar symptoms, which may be misdiagnosed by radiologists, thus delaying the best treatment opportunity for patients.
Research motivation
Due to the radiological similarities of TB and LC, even highly trained radiologists relying on computed tomography (CT) data are often prone to misdiagnosis, or missed diagnosis. Therefore, the determination of TB or LC is based on histopathological analysis, such as invasive biopsy, with the associated inherent risk of these invasive procedures. Thus, noninvasive and computer-aided alternatives are required to improve the discrimination of TB and LC.
Research objectives
This study aimed to develop and validate radiomic methods for distinguishing pulmonary TB from LC based on CT images.
Research methods
Radiomics features were extracted and selected from the CT images to establish a logistic regression model. A radiomics nomogram model was constructed, with the receiver operating characteristic, decision and calibration curves plotted to evaluate the discriminative performance.
Research results
This study found that radiomics features extracted from the lesion with 4 mm radial dilation distances outside the lesion showed the best discriminative performance. The radiomics nomogram model exhibited good discrimination performance, and decision curve analysis revealed that the constructed nomogram had clinical usefulness.
Research conclusions
The proposed radiomic methods can be used as noninvasive tools for differentiating TB and LC based on preoperative CT data.
Research perspectives
This study confirms the predictive performance of our proposed radiomics model. In the future, multimodal data combined with deep learning characteristics are desirable.
杂志排行
World Journal of Clinical Cases的其它文章
- Strategies and challenges in the treatment of chronic venous leg ulcers
- Peripheral nerve tumors of the hand: Clinical features, diagnosis,and treatment
- Treatment strategies for gastric cancer during the COVID-19 pandemic
- Oncological impact of different distal ureter managements during radical nephroureterectomy for primary upper urinary tract urothelial carcinoma
- Clinical characteristics and survival of patients with normal-sized ovarian carcinoma syndrome: Retrospective analysis of a single institution 10-year experiment
- Assessment of load-sharing thoracolumbar injury: A modified scoring system