APP下载

Screening and bioinformatics analysis of thyroid cancer-related hub genes

2020-08-27ShuFeiWuShuangYangJiuPuZhengHaiLingZhengWeiLengLingMiHou

Clinical Research Communications 2020年3期

Shu-Fei Wu,Shuang Yang,Jiu Pu,Zheng-Hai Ling,Zheng-Wei Leng,Ling-Mi Hou

1Department of Anesthesiology,North Sichuan Medical College,Nanchong 637000,China.2Department of Clinical Medical,North Sichuan Medical College,Nanchong 637000,China.3Department of Hepatobiliary Surgery II,North Sichuan Medical College,Nanchong 637000,China.4Department of Thyroid and Mammary Surgery,Affiliated Hospital of North Sichuan Medical College,Nanchong 637000,China.

Abstract Objective:To identify the thyroid cancer-related hub genes and pathways by bioinformatics initially in order to lay the foundation for further study. Methods: The expression profile chips and data of thyroid cancer were screened and downloaded from the gene expression omnibus (GEO).The GEO2R was applied to identify the differential expressed genes between thyroid cancer tissues and normal thyroid tissues.And the Metascape online website was used for pathway and function enrichment.With the usage of STRING and Cytoscape,the protein-protein interaction network was constructed,and the plug-in app cytoHubba in Cytoscape was applied to screen hub genes.Kaplan-Meier Plotter was implemented to conduct survival analysis of hub genes for further screening and discussion. Results: A total of 304 differential expressed genes were screened,and were mainly enriched in the biological processes of extracellular matrix,cell-substrate adhesion,response to wounding,muscle structure development and hormone metabolic process etc.by Metascape.Protein-protein interaction network visualized 284 nodes;the top ten scores of Maximal Clique Centrality algorithm were taken as the criteria to screen out the hub genes with high connectivity in the gene expression network.The KM plotter analysis confirmed that 5 of 9 hub genes were correlated with the prognosis of thyroid cancer patients. Conclusion: FN1, SPP1, TIMP1, VCAN,COL1A1, COL1A2, MMP1, DCN, COMP and FMOD may play a significant role in the development of thyroid cancer.Genes which have prognostic significance in survival analyses were found to be relevant to the composition and regulation of extracellular matrix.

Keywords:Bioinformatics,Thyroid cancer,Hub genes,Differential expressed genes

Background

Thyroid cancer (TC) is the most common malignant tumor of the endocrine system,ranking fifth among female malignant tumors,and its incidence has continued to rise worldwide in recent years [1,2].The etiology of TC is thought to be complex and associated with multiple factors,including obesity,radiation,insufficient iodine intake,genetic mutations,stress,etc,while the critical factors that lead to the development of TC have remained elusive [3].With the rapid development of biomarkers and bioinformatics technologies in recent years [4],a large number of clinical tumor markers have been consequently found by applying bioinformatics mining and analyzing data,which has significantly improved the early diagnosis and prognosis of tumors [5,6].Therefore,bioinformatics is of great significance for further revealing the molecular mechanism of TC.

The present study used bioinformatics to analyze and integrate TC chip data from the Gene Expression Omnibus (GEO) database [7,8].The analyzed molecular functions and signaling pathways assisted to further understand the etiology and potential molecular targets of TC [9,10].Our study provided a reliable basis for exploring the molecular mechanism of TC pathogenesis and screening potential molecular targets for early clinical diagnosis and treatment.

Materials and methods

Screening of gene chips

In our study,the GEO database of NCBI(https://www.ncbi.nlm.nih.gov/geo/) was implemented to search for homo sapiens and normal-control series with the keyword “thyroid cancer”,while eliminating intervention or non-organized samples.The retrieval time is from the establishment of the database to December 2019.

Data mining and processing

GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/)was applied to conduct differential analyses on the gene chips obtained by screening.A volcano graph was drawn with the raw data cut off by adj.P.Val <0.01,|log2 FC| ≥1.5,and Venn diagrams were drawn by Draw Venn Diagram website(http://bioinformatics.psb.ugent.be/webtools/Venn/).Differential expressed genes (DEGs) were obtained by taking the intersection of genes screened in different chips.

Gene ontology and enrichment analysis

Metascape (https://www.metascape.org/)[11]was used to enrich the functions and pathways of DEGs,with homo sapiens as the background.Select 4 ontologies of“GO Molecular Functions,KEGG Pathway,GO Biological Processes,GO Cellular Components”,and the enrichment was constructed by custom analysis,with Min Overlap=3,P-Value Cutoff=0.01 and Min Enrichment=1.5.

Construction of the protein-protein interaction network and functional modules

A protein-protein interaction (PPI) network of DEGs was established by the STRING 11.0 online database(https://string-db.org/)[12],and the resulting files were imported into Cytoscape 3.7.2 software for visual analysis [13].The cytoHubba plugin of Cytoscape was utilized to further filter and obtain the hub genes consequently[14].

Prognostic survival analysis

To further illuminate the relationship between TC prognosis and hub genes expression,Kaplan-Meier Plotter (http://kmplot.com/analysis/)[15]was employed for survival and statistical analysis,withP<0.05 considered as statistically significant.

Results

Screened chips and DEGs

Two gene expression profiling datasets (GSE29265,GSE33630) were obtained from the GEO database.A total of 89 TC tissue samples and 65 normal thyroid tissue samples were measured in this array (Platform:GPL570,Affymetrix Human Genome U133 Plus 2.0 Array).The genes obtained by GEO2R analysis were screened under the condition of adj.P.Val <0.01,|log2 FC| ≥1.5,as shown in Figure 1.After removing the blank and duplicate genes,304 overlapping DEGs were acquired after the intersection,in which 181 were up-regulated (log2 FC ≥ 1.5) and 123 were down-regulated(log2 FC ≤-1.5),as shown in Figure 2.

Function of DEGs and enrichment of pathways

The Metascape online website was applied to analyze the biological functions and pathway enrichment of the 304 DEGs obtained above based on Gene Ontology and KEGG.The results were arranged in the order ofP-value.The results are shown in Figure 3.The darker the color,the smaller theP-value.The results showed that DEGs were significantly enriched in a variety of biological processes,including extracellular matrix(ECM),cell-substrate adhesion,response to wounding,muscle structure development,hormone metabolic process,axon development,aging,ossification,ECM-receptor interaction,cell-matrix adhesion,glycosaminoglycan binding,etc.The enrichment networks were shown in Figure 4.

Protein-protein interaction network

The DEGs obtained were imported into the STRING database and the PPI network was acquired and visualized with disconnected nodes hid.The PPI network revealed 284 nodes and 538 edges.Cytoscape was applied to visualize the interacting genes imported(Figure 5A).The cytoHubba plug-in was used to further analyze the network.According to the Maximal Clique Centrality (MCC) algorithm,10 genes (FN1,SPP1,TIMP1,VCAN,COL1A1,COL1A2,MMP1,DCN,COMPandFMOD) with the most stable and highest scores in the network were selected as the hub genes(Figure 5B).

Kaplan-Meier analysis of hub genes

Kaplan-Meier Plotter's prognostic evaluation of the hub genes indicated that the expressions ofTIMP1,MMP1,DCN,COMP,andFMODin TC closely related to the prognostic survival of patients(P<0.05).We found that higher expression ofTIMP1,better survival prognosis;contrarily,the expressions of the other 4 genes correlated negatively with the overall survival rates of TC patients(Figure 6).

Discussion

In our study,304 DEGs were screened from GEO’s GSE29265 and GSE33630,of which 181 were highly expressed and 123 were lowly expressed in TC tissues.Using Metascape to enrich these genes with GO and KEGG,it was found that DEGs were mainly involved in biological processes such as ECM,cell-substrate adhesion,response to wounding,muscle structure development,hormone metabolic process,axon development,aging,ossification,ECM-receptor interaction,cell-matrix adhesion,and glycosaminoglycan binding.

Using cytoHubba and KM-Plotter,we screened and identified 5 hub genes from the PPI network:TIMP1,MMP1,DCN,COMP,andFMOD,and found that these 5 genes all related to the composition and regulation of the ECM.MMP 1 (matrix metalloproteinase 1)was involved in the degradation of ECM and could promote the formation of blood vessels [16,17],and associated with the occurrence and development of breast cancer,malignant melanoma,and squamous cell carcinoma of the head and neck [16,18].Related studies validated that MMP1 was highly expressed in TC,and closely related to invasion and metastasis phenotype and clinical prognosis [19,20].The protein encoded byTIMP1(Matrix Metalloproteinase Inhibitor 1) was an endogenous natural inhibitor of MMPs [21,22].It was currently recognized that it played an inhibitory role in the occurrence and development of cancer,including inhibiting tumor vascular proliferation,invasion and metastasis,etc [22].As possible biomarkers,multi-dimensional studies had shown thatTIMP1was overexpressed at the molecular,cellular,tissue level,or in the peripheral blood of TC patients,and its mechanism relates to the balance between MMPs and TIMPs [23,24].The studies believed that the balance ofMMPandTIMPexpression in tumor tissues was disrupted,which actives the process thatMMPdegrades ECM,and enhanced the invasion and metastasis of tumors,making TIMP more reactive to inhibit tumor development [22,23].As an essential part of ECM,DCN(decorin) has multiple functions such as Anti-tumor,anti-fibrosis,promoted inflammatory response,induced autophagy,inhibited the capillary formation and endothelial cell migration[25],and accordingly had potential for cancer treatment.Regarding the relationship between DCN and TC,there had been studies at the molecular,protein,and cell levels that had confirmed that DCN was differentially under-expressed in TC and could inhibit tumor proliferation[26-28].

COMP(cartilage oligomeric matrix protein),also known asTSP5,member of the thrombospondin family of calcium-binding proteins,participated in the assembly and stabilization of the ECM through interaction with type I and type II collagen [29,30].In addition to recognized skeletal and joint diseases,COMPhad also been found to be overexpressed in breast cancer,colon cancer,and prostate cancer,and related to tumor cell proliferation,metastasis,relapse,and overall survival rate [29,31].A few studies reported thatCOMPwas overexpressed in TC[32,33],but its specific role and mechanism still need to be further elucidated.FMOD(fibromodulin) also participated in the production and assembly of ECM by regulating the production of collagen fibers and the activity of TGF-β,and also played an important role in the processes of angiogenesis,inflammation,apoptosis,and metastasis [34,35].In terms of the association with the tumor,FMODwas considered as a new tumor-associated antigen in lymphoma and leukemia,and it was discovered to be overexpressed in prostate cancer,lung cancer,and breast cancer [35,36].However,the specific role and mechanism ofFMODin TC had not been accurately described,which merits further study.

Conclusion

In summary,our study utilized bioinformatics methods to mine and process TC gene chips.The results contributed to providing potential biological targets for early diagnosis and treatment of TC and also helped to guide subsequent experimental researches.