APP下载

Early detection of colorectal cancer based on circular DNA and common clinical detection indicators

2022-09-01JianLiTaoJiangZengCiRenZhenLeiWangPengJunZhangGuoAnXiang

Jian Li,Tao Jiang,Zeng-Ci Ren,Zhen-Lei Wang,Peng-Jun Zhang,Guo-An Xiang

Jian Li,Guo-An Xiang,The Second School of Clinical Medicine,Southern Medical University,Guangzhou 510515,Guangdong Province,China

Jian Li,Guo-An Xiang,Department of General Surgery,Guangdong Second Provincial General Hospital,Guangzhou 510317,Guangdong Province,China

Jian Li,Zeng-Ci Ren,Zhen-Lei Wang,Department of General Surgery,Henan Tumor Hospital,Affiliated Tumor Hospital of Zhengzhou University,Zhengzhou 450000,Henan Province,China

Tao Jiang,Medicine Innovation Research Division of Chinese PLA General Hospital,Beijing 100853,China

Peng-Jun Zhang,Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing),Interventional Therapy Department,Peking University Cancer Hospital and Institute,Beijing 100142,China

Abstract BACKGROUND Colorectal cancer (CRC) is the third most common cancer worldwide,and it is the second leading cause of death from cancer in the world,accounting for approximately 9% of all cancer deaths.Early detection of CRC is urgently needed in clinical practice.AIM To build a multi-parameter diagnostic model for early detection of CRC.METHODS Total 59 colorectal polyps (CRP) groups,and 101 CRC patients (38 early-stage CRC and 63 advanced CRC) for model establishment.In addition,30 CRP groups,and 62 CRC patients (30 early-stage CRC and 32 advanced CRC) were separately included to validate the model.51 commonly used clinical detection indicators and the 4 extrachromosomal circular DNA markers NDUFB7,CAMK1D,PIK3CD and PSEN2 that we screened earlier.Four multi-parameter joint analysis methods: binary logistic regression analysis,discriminant analysis,classification tree and neural network to establish a multi-parameter joint diagnosis model.RESULTS Neural network included carcinoembryonic antigen (CEA),ischemia-modified albumin (IMA),sialic acid (SA),PIK3CD and lipoprotein a (LPa) was chosen as the optimal multi-parameter combined auxiliary diagnosis model to distinguish CRP and CRC group,when it differentiated 59 CRP and 101 CRC,its overall accuracy was 90.8%,its area under the curve (AUC) was 0.959 (0.934,0.985),and the sensitivity and specificity were 91.5% and 82.2%,respectively.After validation,when distinguishing based on 30 CRP and 62 CRC patients,the AUC was 0.965 (0.930-1.000),and its sensitivity and specificity were 66.1% and 70.0%.When distinguishing based on 30 CRP and 32 early-stage CRC patients,the AUC was 0.960 (0.916-1.000),with a sensitivity and specificity of 87.5% and 90.0%,distinguishing based on 30 CRP and 30 advanced CRC patients,the AUC was 0.970 (0.936-1.000),with a sensitivity and specificity of 96.7% and 86.7%.CONCLUSION We built a multi-parameter neural network diagnostic model included CEA,IMA,SA,PIK3CD and LPa for early detection of CRC,compared to the conventional CEA,it showed significant improvement.

Key Words: Colorectal cancer;Colorectal polyps;Multi-parameter;Circular DNA;Neural network

lNTRODUCTlON

Colorectal cancer (CRC) is the third most common cancer worldwide,and it is the second leading cause of death from cancer in the world,accounting for approximately 9% of all cancer deaths.Currently,surgery is the most common treatment for nonmetastatic CRC[1].Most patients with CRC are diagnosed at an advanced stage.The high morbidity and mortality of advanced CRC indicates an urgent need for clinical improvements in early CRC detection and individualized management[2].

In the era of precision oncology,liquid biopsy has become the primary method for characterizing circulating tumor components present in body fluids[3].This noninvasive tool can identify relevant molecular alterations in CRC patients,including some that indicate disruption of epigenetic mechanisms.Epigenetic alterations found in solid and liquid biopsies have shown great utility as biomarkers for the early detection,prognosis,monitoring,and assessment of the treatment response in CRC patients[4].Therefore,the term “liquid biopsy” includes blood,the most commonly used human fluid sample,as well as other fluids,such as urine,ascites,pleural effusion,cerebrospinal fluid,and saliva[5,6].Both primary tumors and metastases can release tumor material into these body fluids,mainly comprised of circulating tumor cells (CTCs),nucleic acids (cNA),and extracellular vesicles (cEVs)[7].These circulating elements constitute a valuable source of noninvasive biomarkers[8-11].

At present,single-stranded or double-stranded DNA is detected based on ctDNA.With the development of high-throughput sequencing technology and single-cell gene amplification technology,new types of circular cell-free DNA have been discovered such as extrachromosomal circular DNA (eccDNA)[12,13].eccDNA refers to a closed circular DNA located outside the chromosome in the form of single-stranded or double-stranded DNA,which is widely found in eukaryotes,including humans[14,15].Compared with free linear DNA,eccDNA is not easily degraded by nucleases,and its structure is more stable.

In our study,we aimed to build a multi-parameter diagnostic model based on the commonly used clinical detection indicators and the 4 eccDNA markers for early detection of CRC which is urgently needed in clinical practice.

MATERlALS AND METHODS

Study samples

After approval by the ethics committee,the research subjects signed informed consent forms.This project included 59 patients with colorectal polyps (CRP) and 101 CRC patients (38 early-stage CRC and 63 advanced CRC) for building the model.An additional 30 CRP and 62 CRC patients (30 early-stage CRC and 32 advanced CRC) were used to validate the model (Table 1).

Table 1 General clinical characteristics of study subjects

The inclusion criteria for the CRP group were those with villous/tubular adenoma,with or without mild-to-moderate hyperplasia,confirmed by colonoscopy and pathologically confirmed after adenoma removal,or confirmed by pathology and immunohistochemistry as focal high-grade neoplasia of villous tubular adenoma.All biochemical examinations and auxiliary examinations showed no abnormality,no complaints of gastrointestinal discomfort,no signs of a tumor,adenoma with a diameter less than 1 cm,no villous adenoma or mixed adenoma,and no adenoma with moderate to severe dysplasia.

In the early CRC group,it was confirmed by tumor surgery that the adenocarcinoma of the intestinal wall was confined to the mucosa or submucosa without lymphatic metastasis,that is,stage 1 or 2,and it was pathologically confirmed villous tubular adenoma with focal high-grade neoplasia or intestinal wall glands.

For the advanced CRC group based on tumor staging according to the American Joint Committee on Cancer tumor node metastasis staging,we defined colorectal cancer stages 3 and 4 as advanced stage with pathologically confirmed colorectal cancer;no treatment was performed before sample collection,including surgery,chemotherapy,radiotherapy,or other treatments;and no blood transfusion had occurred within the past 3 mo.

All enrolled patients provided colorectal cancer or polyp specimens and the corresponding clinical examination data.None of the patients received chemotherapy,radiotherapy or immunotherapy before surgery,and other tumors and gastrointestinal diseases were excluded by examination at the time of admission.

Peripheral blood was collected from all subjects included in this study on an empty stomach in the morning.The anticoagulant in the plasma collection tube was EDTA and after collection,the blood was centrifuged at 3000 rpm for 10 min,and the plasma was placed into a new sterile Eppendorf tube.Serum samples were early morning fasting peripheral blood samples collected in tubes containing separation gel and a clot activator.The samples were centrifuged at 3000 rpm for 10 min,and the serum was transferred to new sterile Eppendorf tubes and stored at -80 °C until assayed.The plasma was also stored at -80 °C.During the sample collection process,hemolyzed and chyle blood samples were removed to avoid repeated freezing and thawing.When testing was conducted,normal temperature recovery was performed.

Detection of commonly used clinical indicators

There were 51 commonly used clinical detection indicators,including 13 common tumor-related markers and 38 clinical biochemical indicators.Among them,13 tumor-related indicators included carcinoembryonic antigen (CEA),alpha fetoprotein (AFP),carbohydrate antigen 125 (CA125),CA199,CA153,CA724,cytokeratin fragment 211 (Cyfra211),ferritin (Ferr),neuron-specific enolase (NSE),squamous cell carcinoma (SCC),pepsinogen (PG) I,PG II and PGI/II.The 38 clinical biochemical indicators included alanine aminotransferase (ALT),aspartate aminotransferase (AST),total protein (TP),albumin (ALB),total bilirubin (TB),direct bilirubin (DB),total bile acid (TBA),alkaline phosphatase (ALP),γ-glutamyl transfer enzyme (GGT),glucose (GLu),urea nitrogen (UN),creatinine (Cr),uric acid (UA),cholesterol (CHO),triglyceride esters (TG),creatine kinase (CK),lactate dehydrogenase (LDH),creatine kinase isoenzyme (CKMB),calcium (Ca),phosphorus (P),magnesium (Mg),potassium (K),sodium (Na),chlorine (Cl),carbon dioxide (CO2),lipoprotein a (LPa),high-density lipoprotein (HDL),low-density lipoprotein (LDL),apolipoprotein A1 (ApoA1),apoB,cysteine (CYS),sialic acid (SA),homocysteine (HCY),C-reactive protein (CRP),amylase (AMY),lipase (LPS),superoxide dismutase (SOD) and ischemia-modified albumin (IMA).

Among the 51 detection indicators,CEA,AFP,CA199,CA724,CA125,CA153,Cyfra211,Ferr,NSE,ALT,AST,TP,ALB,ALP,GGT,Glu,UN,CR,UA,CHO,TG,CK,Ca,P,Mg,K,Na,CL,CO2,HDL,LDL,CRP,AMY,and LPS standards and controls and detection kits were purchased from Roche Diagnostics Ltd.ApoA1,ApoB,CYS,LPa,and CKMB standards and controls and detection kits were purchased from Beijing Leadman Biochemical Co.,Ltd.SCC,PG I and PG II standards and controls and test kits were purchased from Abbott Diagnostics.TBA and HCY standards and quality controls and detection kits were purchased from Beijing Jiuqiang Biotechnology Co.,Ltd.TB and DB standards and controls and assay kits were purchased from Hitachi Diagnostics Co.,Ltd.IMA standards,quality control products,and detection kits were purchased from Changsha Yikang Technology Development Co.,Ltd.SA standards,quality control products,and detection kits were purchased from Zhejiang Dongou Diagnostic Products Co.,Ltd.SOD standards,quality control products and detection kits were purchased from Fujian Fuyuan Biotechnology Co.,Ltd.A modular 7600 automatic biochemical analyzer,Roche E170 immunoassay analyzer and Architect i2000 immunoassay system were used to complete the pre-assay quality control and calibration.After the analysis,the experimental data of each instrument were exported for statistical analysis.

Detection of differential eccDNA based on ddPCR

Cell-free DNA was extracted from plasma samples using the QIAamp DNA Blood Kit (Qiagen,51192) according to the ddPCR detection method established in the second part of this study.ATP-dependent DNase (Epicenter,E310K) was added to the free DNA and digested at 37 °C for 1.5 h to a final concentration of 0.4 U/μL to remove linear double-stranded DNA.The reaction was continued at 70 °C for 30 min to inactivate ATP-dependent DNase activity,and the product was then stored until analysis.

Based on the eccDNA sequence incorporated into the model,primers were designed using Primer3 software.After a homology search was performed with BLAST,the primers were synthesized by Invitrogen.The 5' ends of the primers were modified with a FAM fluorophore,and the 3' ends were modified with a BHQ1 quenching group.(1)NDUFB7.Forward sequence: TACCGTCAGCATCCACAGCCAT;reverse sequence: GCCTTCTCAGAAGGATGCCAGT;(2)CAMK1D.Forward sequence: TGAGCAGATCCTCAAGGCGGAA;reverse sequence: GTCCTTCTCCATCAGGTTCCGA;(3)PIK3CD.Forward sequence: TGCCAAACCACCTCCCATTCCT;reverse sequence: CATCTCGTTGCCGTGGAAAAGC;and (4)PSEN2.Forward sequence: GCTGTTTGTGCCTGTCACTCTG;reverse sequence: TGTGTCCTCAGTGAATGGCGTG.

Primers and probes were diluted with deionized water to the storage concentration of 200 μmol/L,and the working concentration was 10 μmol/L.The total PCR volume was 20 μL,including 2-fold ddPCRTTM Super mix 10 μL,forward and reverse primers 1.8 μL each (final concentration 900 nmol/L),probe 0.5 μL (final concentration 250 nmol/L),template DNA 4 μg,and ddH2O to make it up to 20 μL.Then,20 μL of the reaction system mixture was added to the droplet generation card for droplet generation.All of the resulting droplets were transferred to a 96-well plate for PCR amplification.The PCR conditions were: 95 °C/10 min;94 °C/30 s,60 °C/1 min,40 cycles;98 °C/10 min.Finally,Quanta Soft 1.6 software (Bio-Rad,USA) was used to analyze the results and the Flush System was used before each experiment.After the setup is complete,the sample droplets are analyzed.We analyzed the results of the run and view channels,scatterplots,concentration data,ratio data,and the number of events.

Evaluation of the diagnostic value of a single indicator

Second,we compared the 51 common clinical indicators and 4 kinds of eccDNA between the CRP group and CRC group based on the difference indicator,tested by the area under the curve (AUC) and thePvalue,for potential markers to evaluate their diagnostic value for distinguishing the CRP and CRC groups,CRP and early CRC groups,colon polyps and advanced CRC groups.

Establishment and evaluation of the multiparameter diagnosis model

Based on the differential diagnostic value (CRP groupvsCRC group),we established a multiparameter combined auxiliary diagnostic model.The models are binary logistic regression analysis,discriminant analysis,classification tree and neural network.Binary logistic regression analysis was used for the Forward: Conditional method.Discriminant analysis applied the Bayes discriminant method,and stepwise discriminant analysis was used in the fitting function process.A classification tree was the CHAID classification tree method,and a cross-validation evaluation was conducted to establish the classification tree model.An artificial neural network was the neural network's multilayer perceptron used to build the model.

Validation of the multiparameter diagnosis model

After comparing the diagnostic value of the binary logistic regression analysis,the discriminant analysis,classification tree and neural network with the diagnostic value of a single index were conducted.The optimal multiparameter auxiliary diagnosis model was selected,and 30 CRP groups and 62 CRC patients (30 early-stage CRC patients and 32 advanced CRC patients) were enrolled to validate the multiparameter model.Then,the stability of the model was evaluated.Finally,the validated model was compared with the commonly used clinical detection index CEA,and its clinical application value was evaluated by comparing the sensitivity,specificity,and AUC.

Statistical analysis

SPSS 22.0 was used for statistical analysis.Measurement data were expressed as medians (25%,75%).If the data were normally distributed,they were compared by two independent samplest-tests.If nonnormally distributed,comparisons were made by the rank-sum test.The AUC was used to assess the diagnostic value of the index.Four multiparameter analysis methods (binary logistic regression analysis,discriminant analysis,classification tree and neural network) were used to establish a multiparameter joint diagnosis model.The binary logistic regression model used the forward conditional method.The discriminant analysis used the Bayes discriminant method.The classification tree used the CHAID classification tree method,and the established classification tree model was evaluated by cross-validation.Artificial neural networks used multilayer perceptrons of neural networks to build the models.Univariate and multivariate logistic regression were used to analyze Exp (B) of the index.TheZscore test was used to compare the AUC of the different groups.P< 0.05 indicates that the difference is statistically significant.

RESULTS

Comparison of 51 common clinical indicators and 4 kinds of eccDNA between the colon polyp group and the colorectal cancer group

Thirteen tumor markers (CEA,AFP,CA125,CA199,CA153,CA724,CY211,Ferr,NSE,SCC,PG I/II,PG II,and PG I) and 38 blood biochemical indices (ALT,AST,TP,ALB,TB,DB,TBA,ALP,GGT,GLu,UN,Cr,UA,CHO,TG,CK,LDH,CKMB,Ca,P,Mg,K,Na,Cl,CO2,LPa,HDL,LDL,ApoA1,ApoB,CYS,SA,HCY,CRP,AMY,LPS,SOD,and IMA) were compared between the 59 CRP patients and the 101 CRC patients.Among the 51 commonly used clinical indicators,22 indicators,including IMA,CEA,SA,LPa,CK,TB,HDL,NSE,ALT,Ferr,DB,CA125,LDH,AMY,CY211,CA724,HCY,CHO,P,LDL,Cl and CKMB,were significantly different between the CRP and CRC groups (P< 0.05).The remaining 29 indicators were not significantly different.By comparison,among the four eccDNA indices,two indices,CAMK1DandPIK3CD,showed significant differences between the CRP and CRC groups (P< 0.05).The other two indicators were not significantly different,as shown in Table 2.

Table 2 Comparison of 51 common clinical indicators between colon polyp group and colorectal cancer group

CRP: Colorectal polyps;CRC: Colorectal cancer;CEA: Carcinoembryonic antigen;AFP: Alpha fetoprotein;CA125: Carbohydrate antigen 125;NSE: Neuron-specific enolase;SCC: Squamous cell carcinoma;PG: Pepsinogen;ALT : Alanine aminotransferase;AST: Aspartate aminotransferase;TP: Total protein;ALB: Albumin;TB: Total bilirubin;DB: Direct bilirubin;TBA: Total bile acid;ALP: Alkaline phosphatase;GGT: γ-glutamyl transfer enzyme;Glu: Glucose;UN: Urea nitrogen;Cr: Creatinine;UA: Uric acid;CHO: Cholesterol;TG: Triglyceride esters;CK: Creatine kinase;LDH: Lactate dehydrogenase;CKMB: Creatine kinase isoenzyme;Ca: Calcium;P: Phosphorus;Mg: Magnesium;K: Potassium;Na: Sodium;Cl: Chlorine;CO2: Carbon dioxide;LPa: Lipoprotein a;HDL: High-density lipoprotein;LDL: Low-density lipoprotein;ApoA1: Apolipoprotein A1;CYS: Cysteine;SA: sialic acid;HCY: Homocysteine;CRP: C-reactive protein;AMY: Amylase;LPS: Lipase;SOD: Superoxide dismutase;IMA: Ischemia-modified albumin.

Diagnostic value of the differential indicators between the CRP and CRC groups

Based on the 22 commonly used clinical indicators and 2 kinds of eccDNA that showed significant differences between the CRP and CRC groups,receiver operating characteristic (ROC) curves were used to evaluate the diagnostic value,as shown in Table 3.Fifteen commonly used clinical indicators and 2 kinds of eccDNA (IMA,CEA,SA,LPa,CK,TB,HDL,NSE,ALT,Ferr,DB,CA125,LDH,AMY,CY211,CAMK1DandPIK3CD) showed statistically significant differences in the area under the curve (P< 0.05) while the other 7 commonly used clinical indicators (CA724,HCY,CHO,P,LDL,Cl and CKMB) showed no significant difference.Therefore,15 commonly used clinical indicators and 2 kinds of eccDNA with significant differences between the groups and the areas under the ROC curve were selected for subsequent multiparameter combined auxiliary diagnosis model analysis.

Univariate logistic regression and multivariate logistic regression analysis

Indices with statistically significant differences between the CRP and CRC groups and the ROC included IMA,CEA,SA,LP (a),CK,TB,HDL,NSE,ALT,Ferr,DB,CA125,LDH,AMY,CY211,CAMK1DandPIK3CD(P< 0.05).First,univariate logistic regression analysis was performed,as shown in Table 4.The Exp (B)s of CEA,IMA,SA,E3 and LPa were significantly different (P< 0.05),while that of CK,TB,HDL,NSE,CHO,P,LDL,Cl,CKMB andCAMK1Dwere not significantly different.Second,multivariate logistic regression analysis was performed on the differences in CEA,IMA,SA,E3 and LPa.As shown in Table 5,the Exp (B)s were significantly different for all of them (P< 0.05).CEA,IMA,SA,PIK3CDand LPa were included in the subsequent multiparameter joint auxiliary diagnosis model.

Multiparameter combined auxiliary diagnosis model building

Based on CEA,IMA,SA,PIK3CDand LPa,a multiparameter combined auxiliary diagnosis model was built to distinguish the 59 CRP group and 101 CRC group (including 38 cases of early CRC and 63 cases of advanced CRC).

As shown in Table 6,binary logistic regression analysis based on CEA,IMA,SA,PIK3CDand LPa showed that the correct rate of CRP was 76.3%,the correct rate of CRC was 85.1%,and the overall accuracy was 81.9%.The predicted probability of each sample was used as an independent variable,as shown in Figure 1A,and the AUC was 0.900 (0.855-0.946).

Table 3 Evaluation of the diagnostic value of 26 commonly used clinical indicators with statistical differences (colon polyp group vs colorectal cancer group)

Table 4 Univariate Logistic Regression Analysis between the colon polyp group and the colorectal cancer group with statistically significant between-group and receiver operating characteristic indicators

Table 5 Multivariate Logistic Regreesion Analysis Exp (B) lndicators with Statistical Differences (Colon polyp group vs colorectal group)

Table 6 Multi-parameter combined auxiliary diagnosis model building

The discriminant analysis based on CEA,IMA,SA,PIK3CDand LPa showed that the correct rate of CRP was 86.4%,the correct rate of CRC was 69.3%,and the overall accuracy was 75.6%.Taking the predicted probability of each sample as an independent variable,as shown in Figure 1B,the AUC was 0.855 (0.794-0.916).

In the classification tree analysis based on CEA,IMA,SA,PIK3CDand LPa,the final independent variables included CEA,IMA,SA,PIK3CDand LPa,the number of nodes was 3,the number of terminal nodes was 2,and the depth was 1.Among them,the correct rate of CRP was 91.5%,the correct rate of CRC was 58.4%,and the overall accuracy rate was 70.6%.Taking the predicted probability of each sample as an independent variable,as shown in Figure 1C,the AUC was 0.750 (0.674-0.826).

The artificial neural network analysis based on CEA,IMA,SA,PIK3CDand LPa,CEA,IMA,SA,PIK3CDand LPa all entered the input layer.The number of hidden layers included 1 Layer,and the output layer included 2 Layers.The training set included 39 cases of CRP and 70 cases of CRC,among which the correct rate of identifying healthy controls was 79.5%,the correct rate of identifying colorectal cancer was 97.1%,and the overall accuracy rate was 90.8%.The test set included 20 cases of CRP and 31 cases of CRC,among which the correct rate of identifying CRP was 90.0%,the correct rate of identifying CRC was 87.1%,and the overall accuracy rate was 88.2%.Taking the predicted probability of each sample as an independent variable,as shown in Figure 1D,the AUC was 0.959 (0.934-0.985).

Optimal multiparameter combined auxiliary diagnosis model selection and diagnostic evaluation

Based on CEA,IMA,SA,PIK3CDand LPa,binary logistic regression analysis,discriminant analysis,classification tree and neural network were used to predict the CRP and CRC groups,and the accuracy rates were 81.9%,75.6%,70.6%,and 90.8%,respectively.Therefore,we chose the neural network as the optimal multiparameter joint auxiliary diagnosis model.As shown above,the overall accuracy rate was 90.8%,as shown in Figure 2A.The area under the curve was 0.959 (0.934-0.985),and the sensitivity and specificity were 91.5% and 82.2%,respectively.As shown in Figure 2B,when the CRP and early CRC groups were differentiated,the area under the curve was 0.956 (0.921-0.992),and the sensitivity and specificity were 89.8% and 86.8%,respectively.As shown in Figure 2C,when the CRP and advanced CRC groups were differentiated,the area under the curve was 0.961 (0.932-0.990),and the sensitivity and specificity were 88.1% and 87.3%,respectively.

Validation of the multi-index joint auxiliary diagnosis model

For distinguishing the CRP group from the CRC group,after comparing the multiple multiparameter joint analysis methods,the neural network based on CEA,IMA,SA,PIK3CDand LPa was the optimal multiparameter joint auxiliary diagnosis model.Thirty independent CRP patients and 62 CRC patients (32 in the early-stage CRC group and 30 in the advanced CRC group) were enrolled to validate the model.After validation,as shown in Figure 3A,for distinguishing CRP and CRC,the area under the curve of the neural network for CEA,IMA,SA,PIK3CDand LPa was 0.965 (0.930-1.000),its sensitivity and specificity were 66.1% and 70.0%,the area under the curve of the commonly used clinical indicator CEA was 0.723 (0.622-0.823),and its sensitivity and specificity were 96.8% and 86.7%,respectively.As shown in Figure 3B,for distinguishing CRP and 32 early-stage CRC,the area under the curve of the neural network model was 0.960 (0.916-1.000),with a sensitivity and specificity of 87.5% and 90.0%,the area under the curve of the commonly used clinical indicator CEA was 0.684 (0.548-0.821),and its sensitivity and specificity were 62.5% and 60.0%,respectively.As shown in Figure 3C,for distinguishing CRP and advanced CRC patients,the area under the curve of the neural network model was 0.970 (0.936,1.000),with a sensitivity and specificity of 96.7% and 86.7%,the area under the curve of the commonly used clinical indicator CEA was 0.763 (0.632-0.895),and its sensitivity and specificity were 76.7% and 63.3%,respectively.

Figure 1 Diagnostic evaluation of multi-parameter combined auxiliary diagnosis model building.

Figure 2 Diagnostic evaluation of the neural network multi-parameter diagnostic model building.

Figure 3 Diagnostic evaluation of the neural network multi-parameter diagnostic model and carcinoembryonic antigen validation.

DlSCUSSlON

A biomarker is a biological molecule found in blood,other body fluids,or tissues that is a marker of a normal or abnormal process or disease.Biomarkers are primarily based on DNA,RNA,microRNA (miRNA),epigenetic changes,or antibodies.The term tumor marker,considered by some researchers to be synonymous with biomarkers,refers to substances that represent biological structures (most typically proteins,glycolipids) that can be attributed to normal cell development or to different stages of cell development.For example,carcinogenesis-associated antigens (TAAs) are the largest group of clinically meaningful markers.Therefore,the concentration of TAA usually correlates with the quantity (or quality) of specific tumor cells.

Discovered 50 years ago in 1965,CEA is still the only tumor marker with proven efficacy in monitoring treatment in CRC patients.CEA was initially thought to be CRC specific,but elevated CEA levels have since been detected in other tumors,e.g.,gastric and pancreatic cancer,and inflammatory states.Rarely,elevated CEA concentrations are found in CRC stage I[16].Furthermore,CEA cannot differentiate between benign and malignant polyps.Recently,several studies have explored the advantages of mRNA molecules encoding CEA for the detection of CRC,but the results were not superior to CEA[17].In some studies,high CEA concentrations in patients with CRC stages II and III may be indicative of a more aggressive cancer type.CEA is the marker of choice for monitoring disseminated disease during systemic therapy.Sustained increases in CEA levels are often associated with disease progression,even though radiological examination may prove otherwise.However,chemotherapy may also cause a temporary increase in CEA concentrations,which must be taken into account.Therefore,it is not recommended to measure CEA levels within 2 wk after chemotherapy but only after 4 to 6 wk in oxaliplatin-treated patients.Cancer antigen 19-9 (CA 19-9) is a glycoprotein whose relevance in the diagnosis of CRC remains unclear.Most investigators concluded that the sensitivity of CA 19-9 was much lower than that of CEA and that elevated CA 19-9 Levels indicated a poor prognosis[18].Other carbohydrate antigens,CA 19-5 and CA 50,have also been investigated with relatively disappointing results.CA 72-4 is a biomarker with poor sensitivity,ranging from 9% to 31%,and good specificity,ranging from 89% to 95%,for screening patients for CRC.The diagnostic information provided by CA 72-4 in recurrent CRC is borderline and far inferior to that of CEA.There is a consensus that CA 72-4 has a rather low sensitivity and incomplete specificity in the screening and follow-up of CRC patients[19].Tissue polypeptide-specific antigen (TPS) and tissue polypeptide antigen (TPA),which detect cytokeratin 8,18,and 19 fragments,are not recommended for CRC screening due to their lack of sensitivity and specificity.Most investigators found that elevated levels of TPA and TPS were observed in the metastatic stage of CRC.Further studies showed that the combination of TPA and CEA improved the sensitivity of these biomarkers in identifying patients with CRC recurrence.Other biomarkers,such as thymidine phosphorylase and DNA ploidy,were found to have no utility in the detection,staging or follow-up of CRC patients.

NDUFBis an accessory subunit ofNADHdehydrogenase (com-plex I) of the mitochondrial membrane respiratory chain,encoded by nuclear genes[20].Mutations inNDUFBmay promote tumor metastasis[21].In addition,a SNP (rs7830235) associated with prostate cancer risk is located in theNDUFBgene[22].In addition to this,most of the other subunits ofNADHdehydrogenase (NDUFB1-8/11) family were found to have significant prognostic value (DMFS) in breast cancer patients,and it was the mainstay of MDA-MB-231 breast cancer cell proliferation,inhibition of migration and invasion[23].Its high expression is positively correlated with the prognosis of gastric cancer,suggesting that these proteins may serve as new candidate diagnostic and prognostic biomarkers for gastric cancer[24].CAMK1Dis a member of the calcium/calmodulin-dependent protein kinase 1 family[25].It involved in a variety of physiological processes,including activation of CREB-dependent gene transcription,differentiation and activation of neutrophils,and regulation of apoptosis in erythrocytic leukemia[26].Recent studies have shown that overexpression ofCAMK1Dcan promote the proliferation of breast cancer[27].

Knockdown ofCAMK1Din HT-29 and SW480 cells significantly reduced cell proliferation,invasion/migration capacity,and significantly increased apoptosis[28].Activation of phosphoinositide 3-kinase (PI3K) signaling is one of the most common events in several human cancers,including CRC.PI3K is a family of lipid kinases that phosphorylate phosphatidylinositol 4,5-bisphosphate to generate phosphatidylinositol-3,4,5-triphosphate,which in turn activates serine-threonine[29-31].PI3Ks are classified into 3 classes according to their substrate specificity and structure in mammals.Of these,class I PI3Ks appear to be most associated with human cancers.Class I PI3Ks are further divided into subclasses IA and IB based on their adapters.Class IA PI3Ks contain a p110 catalytic subunit and a p85 regulatory subunit.The class IA catalytic isoforms p110α,p110β and p110δ are encoded by the genesPIK3CA,PIK3CBandPIK3CD,respectively.PIK3CBandPIK3CDare often overexpressed or amplified in cancer[32,33].PIK3CDis mainly expressed in leukocytes and plays a key role in some hematological malignancies.Furthermore,PIK3CDhas recently been associated with several human solid tumors,including hepatocellular carcinoma,glioma,glioblastoma,neuroblastoma,and breast cancer[33,34].PIK3CDinduces cell growth and invasion in colorectal cancer by activating AKT/GSK-3β/β-catenin signaling[35].Presenilin 2 (PSEN2) is a protein-coding gene.Diseases associated withPSEN2include Alzheimer’s disease[36].Its related pathways include EPH-Ephrin signaling and p75 NTR receptormediated signaling.Presenilin (PSEN1 or PSEN2) mutations are generally thought to be present in Alzheimer’s disease patients with inherited disorders[37,38].Although We have built a multi-parameter neural network diagnostic model for CRC,however,multi-centers and larger sample size still needed in the future study.

CONCLUSlON

In conclusion,we built a multi-parameter neural network diagnostic model included CEA,IMA,SA,PIK3CDand LPa for early detection of CRC,compared to the conventional CEA,it showed significant improvement.

ARTlCLE HlGHLlGHTS

Research background

Most patients with colorectal cancer (CRC) are diagnosed at an advanced stage.The high morbidity and mortality of advanced CRC indicates an urgent need for clinical improvements in early CRC detection and individualized management.

Research motivation

Early detection of CRC is urgently needed in clinical practice.Commonly biomarker and extrachromosomal circular DNA (eccDNA) may have potential diagnostic value for CRC.

Research objectives

This study aimed to build a multi-parameter diagnostic model for early detection of CRC.

Research methods

Total 59 colorectal polyps (CRP) groups,and 101 CRC patients (38 early-stage CRC and 63 advanced CRC) for model establishment.In addition,30 CRP groups,and 62 CRC patients (30 early-stage CRC and 32 advanced CRC) were separately included to validate the model.51 commonly used clinical detection indicators and the 4 eccDNA markers NDUFB7,CAMK1D,PIK3CD and PSEN2 that we screened earlier.Four multi-parameter joint analysis methods: binary logistic regression analysis,discriminant analysis,classification tree and neural network to establish a multi-parameter joint diagnosis model.

Research results

Neural network included carcinoembryonic antigen (CEA),ischemia-modified albumin (IMA),sialic acid (SA),PIK3CD and lipoprotein a (LPa) was chosen as the optimal multi-parameter combined auxiliary diagnosis model to distinguish CRP and CRC group,when it differentiated 59 CRP and 101 CRC,its overall accuracy was 90.8%,its area under the curve (AUC) was 0.959 (0.934,0.985),and the sensitivity and specificity were 91.5% and 82.2%,respectively.After validation,when distinguishing based on 30 CRP and 62 CRC patients,the AUC was 0.965 (0.930,1.000),and its sensitivity and specificity were 66.1% and 70.0%.When distinguishing based on 30 CRP and 32 early-stage CRC patients,the AUC was 0.960 (0.916,1.000),with a sensitivity and specificity of 87.5% and 90.0%,distinguishing based on 30 CRP and 30 advanced CRC patients,the AUC was 0.970 (0.936,1.000),with a sensitivity and specificityof 96.7% and 86.7%.

Research conclusions

We built a multi-parameter neural network diagnostic model included CEA,IMA,SA,PIK3CD and LPa for early detection of CRC,compared to the conventional CEA,it showed significant improvement.

Research perspectives

Larger sample size and multi-center study should be performed to validate the diagnostic model in future studies.

FOOTNOTES

Author contributions:Li J and Xian GA designed the study;Li J,Ren ZC and Jiang T performed the research;Li J,Wang ZL and Jiang T analyzed the date;Li J wrote the paper;Xiang GA and Zhang PJ revised the manuscript for final submission;Li J and Jiang T contributed equally to this study;Zhang PJ and Xiang GA the co-corresponding author.

Supported byNational Natural Science Foundation of China,No.81972010;National Key Research and Development Program of China,No.2020YFC2002700;National Key Research and Development Program of China,No.2020YFC2004604.

lnstitutional review board statement:The study was reviewed and approved by the Chinese PLA General Hospital Review Board.

lnformed consent statement:All study participants,or their legal guardian,provided informed written consent prior to study enrollment.

Conflict-of-interest statement:We declare that we have no financial or personal relationships with other individuals or organizations that can inappropriately influence our work and that there is no professional or other personal interest of any nature in any product,service and/or company that could be construed as influencing the position presented in or the review of the manuscript.

Data sharing statement:No data was to share.

Open-Access:This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers.It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BYNC 4.0) license,which permits others to distribute,remix,adapt,build upon this work non-commercially,and license their derivative works on different terms,provided the original work is properly cited and the use is noncommercial.See: https://creativecommons.org/Licenses/by-nc/4.0/

Country/Territory of origin:China

ORClD number:Jian Li 0000-0002-2168-4240;Tao Jiang 0000-0002-2127-9085;Zeng-Ci Ren 0000-0002-0931-3535;Zhen-Lei Wang 0000-0001-7126-9136;Peng-Jun Zhang 0000-0002-7391-2495;Guo-An Xiang 0000-0002-4218-6160.

S-Editor:Wang JL

L-Editor:A

P-Editor:Wang JL