APP下载

DeepPurpose-based drug discovery in chondrosarcoma

2023-01-13JianruiLiMingyueShiZhiweiChenYuyanPan

Jianrui Li ,Mingyue Shi ,Zhiwei Chen ,Yuyan Pan

a Department of Plastic and Reconstructive Surgery,Zhongshan Hospital,Fudan University,Shanghai 200032,China

b Big Data and Artificial Intelligence Center,Zhongshan Hospital,Fudan University,Shanghai 200032,China

Keywords:Chondrosarcoma Text mining DeepPurpose Drug therapy Drug-target interaction

ABSTRACT Background:Chondrosarcoma(CS)is the second most common primary bone tumor,accounting for approximately 30% of all malignant bone tumors.Unfortunately,the efficacy of currently available drug therapies is limited.Therefore,this study aimed to explore drug therapies for CS using novel computational methods.Methods:In this study,text mining,GeneCodis STRING,and Cytoscape were used to identify genes closely related to CS,and the Drug Gene Interaction Database(DGIdb)was used to select drugs targeting the genes.Drug-target interaction prediction was performed using DeepPurpose,to finally obtain candidate drugs with the highest predicted binding affinities.Results:Text-mining searches identified 168 genes related to CS.Gene enrichment and protein-protein interaction analysis generated 14 genes representing 10 pathways using GeneCodis,STRING,and Cytoscape.Seventy drugs targeting genes closely related to CS were analyzed using DGIdb.DeepPurpose recommended 25 drugs,including integrin beta 3 inhibitors,hypoxia-inducible factor 1 alpha inhibitors,E1A binding protein P300 inhibitors,vascular endothelial growth factor A inhibitors,AKT1 inhibitors,tumor necrosis factor inhibitors,transforming growth factor beta 1 inhibitors,interleukin 6 inhibitors,mitogen-activated protein kinase 1 inhibitors,and protein tyrosine kinase inhibitors.Conclusion: Drug discovery using in silico text mining and DeepPurpose may be an effective method to explore drugs targeting genes related to CS.

1.Introduction

Chondrosarcoma(CS),a group of heterogeneous,primary malignant tumors of the bone characterized by hyaline cartilaginous neoplastic tissue,is the second most common primary bone tumor,accounting for approximately 30% of malignant bone tumors.1,2Currently,the main therapeutic methods for CS,such as surgery and chemotherapy,are unsatisfactory.3However,research on drug therapy remains limited,and adjuvant/neo-adjuvant therapy of CS has developed slowly,resulting in no significant decrease in the recurrence and metastasis rate of CS.

Traditional drug research and discovery(R&D)has the disadvantages of high cost and time consumption,while the combination of artificial intelligence and data mining has become a powerful alternative strategy to improve drug R&D efficiency.4Recently,deep learning technology has been proven to have the ability to predict compound-protein interactions on a large scale using only limited data,and has subsequently been successfully applied to new drug development,greatly shortening the associated time and cost.5,6DeepPurpose is a deep learning framework developed to predict the affinity between drugs and targets.7This framework utilizes a variety of vector embedding methods through an encoding-decoding architecture to convert sequence-based sparse features into dense vector features,realizing automatic extraction of drug and target features through a variety of deep neural networks,and finally achieving the learning of affinity prediction through a connected network.DeepPurpose integrates a variety of the latest deep neural network models and provides 15 models pre-trained on DAVIS,BindingDB-Kd,and kinase inhibitor bioactivity(KIBA)datasets.8-10The prediction results are output as the score of the binding affinity between the drug and the molecule.Therefore,thisin silicostudy focused on drugs targeting genes related to CS.

Target genes that were highly related to CS were identified viain silicotext mining.DeepPurpose was used to predict the drug-target interactions(DTI)and generate a ranked drug list.Finally,drugs with the highest predicted binding scores were obtained.We hypothesized that drug discovery usingin silicotext mining and DeepPurpose may be an effective method to explore drugs targeting genes related to CS.

2.Methods

2.1.Text mining

Pubmed2ensembl (http://pubmed2ensembl.ls.manchester.ac.uk/)was used for text mining.11A query with the concept“chondrosarcoma”was performed,with “Ensembl Gene ID” and “Associated Gene Name”selected under attributes.The “Search for PubMed IDs” and “filter on Entrez:PMID”drop-down menus were chosen in the search of the query,which returned a list of genes used in the next step.

2.2.Biological process and pathway analysis

GeneCodis (http://genecodis.cnb.csic.es/) was used for enrichment analysis of the genes related to CS.12Genes identified in the previous step were analyzed using Gene Ontology (GO) biological process categories,and the most significantly enriched biological processes were selected.Genes with selected annotations were then annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.The pathways most relevant to CS pathology were selected.Genes in these pathways were selected for further analysis.

2.3.Protein-protein interaction network

The STRING database(http://string-db.org)was used to simulate the protein-protein interaction (PPI) network of the target genes.13In this study,the confidence score was set to the highest score(0.900)in order to screen out genes with strong interaction.Next,Cytoscape software was applied to analyze the interaction network.14CentiScaPe,an app that calculates network parameters,was used to analyze the topological characteristics of each node.“Degree”and“Betweenness”were chosen as parameters to select key genes.A node for which the degree and betweenness values were greater than or equal to the mean value was selected as the key node for further analysis.

2.4.Drug-gene interactions

The Drug Gene Interaction Database (DGIdb) (http://www.dgidb.org) was used to explore potential drug targets interacting with the genes.15The final list of genes was put into the input set.The query returned all the drug hits and provided the related information,including interaction type,sources,PubMed Unique Identifiers (PMIDs),and scores.

2.5.DeepPurpose

DeepPurpose uses the primary structural sequence of drugs and targets as inputs.The primary structure sequences of the drugs and targets were represented by the simplified molecular-input line-entry system(SMILES) and amino acid sequence pair,respectively.The binding affinity of each drug and target was predicted using 15 pretrained models provided by DeepPurpose.The threshold of the DAVIS and BindingDB dataset models was set to 7.0,while the threshold of the KIBA dataset model was set to 12.1.8-10

3.Results

3.1.Results of text mining,biological process,and pathway analysis

Fig.1 shows the process used to explore potential drugs for CS;in summary,a text mining search revealed 168 genes related to CS(Fig.2).To ensure that only the most enriched biological processes were selected,aP-value cutoff ofP=1.00E-8 was set for the analysis of enriched GO biological process annotations,resulting in 14 sets of annotations containing 83 genes (Table 1).The three most enriched biological process annotations were:(1)“extracellular matrix organization”(P=6.16E-30);(2) “skeletal system development” (P=8.55E-18);(3) “proteolysis,extracellular matrix organization,collagen catabolic process”(P=7.15E-14).In the analysis of enriched KEGG pathway annotations,theP-value cutoff was set atP<1.00E-10,which yielded 11 pathways containing 54 genes (Table 2).The three most significantly enriched pathways were: (1) “proteoglycans in cancer” (P=3.08E-19);(2)“pathways in cancer”(P=7.36E-15);and(3)“MAPK signaling pathway”(P=6.57E-13),while other highly enriched pathways included “osteoclast differentiation,” “PI3K-akt signaling pathway,” and “cytokinecytokine receptor interaction.”

Fig. 1.Flowchart showing an overview of the data mining process.Text mining and Genecodis were performed to identify genes associated with chondrosarcoma.Protein-protein interaction analysis was performed with STRING and Cytoscape.The Drug Gene Interaction Database(DGIdb) was used to select drugs targeting the genes highly related to chondrosarcoma.DeepPurpose was then used to select candidate drugs based on gene-target interaction.

Table 1 Summary of the GO biological process gene set enrichment analysis.

Table 2 Summary of KEGG process gene set enrichment analysis.

3.2.Results of protein-protein interaction analysis

STRING was used to construct the PPI network of the target genes(Fig.3).Cytoscape revealed a network of 54 nodes (Fig.4),while CentiScaPe showed that the average degree and betweenness were 71.49 and 7.05,respectively.A total of 134 genes were selected based on the criterion that the nodes for which the degree and betweenness were both greater than or equal to the mean were the key nodes.The final gene list included the genes ITGB3,ITGAV,ITGB1,EP300,CREBBP,HIF1A,VEGFA,AKT1,TNF,TGFβ-1,MMP9,IL-6,MAPK1,and PTK2.

Fig. 2.Summary of data mining results.(A) Text mining: text mining was performed using the search term“chondrosarcoma”in pubmed2ensembl,yielding 269 genes in total,including 168 genes after deleting duplicates.(B) Gene set enrichment: 83 and 54 genes were enriched using GO biological processes and KEGG pathway analysis in GeneCodis,respectively.(C) Protein-protein analysis: 14 genes were selected using STRING and Cytoscape.(D) Drug-gene interactions: 70 drugs were selected with DGIdb.(E)Drug-target interaction: 25 candidate drugs with highest predicted binding affinity were finally derived.GO,Gene Ontology;KEGG,Kyoto Encyclopedia of Genes and Genomes;DGIdb,Drug Gene Interaction Database.

Fig. 3.The protein-protein highest (confidence score,0.900)interaction network of the targeted genes constructed with STRING.Network nodes represent proteins and different colored edges represent protein-protein associations.

Fig. 4.The protein-protein interaction network of the targeted genes constructed using Cytoscape.Network nodes represent proteins and edges represent proteinprotein associations.

3.3.Results of drug-gene interactions

Using the final list of 14 genes identified as potential targets in the drug-gene interaction analysis,a list of 70 drugs was initially selected as possible targets for drug treatment for CS(Supplementary Table 1).These drugs were divided into integrin protein inhibitors,E1A binding protein P300 (EP300) inhibitors,hypoxia-inducible factor 1 alpha (HIF1A) inhibitors,vascular endothelial growth factor A(VEGFA)inhibitors,AKT1 inhibitors,tumor necrosis factor (TNF) inhibitors,transforming growth factor beta 1 (TGFB1) inhibitors,matrix metalloproteinase 9 (MMP9)inhibitors,interleukin 6 (IL-6) inhibitors,mitogen-activated protein kinase 1(MAPK1)inhibitors,and tyrosine kinase(PTK2)inhibitors.

3.4.Results of DeepPurpose

Drugs in the SMILES format were selected for the DeepPurpose analysis.Subsequently,each pre-trained model in DeepPurpose generated a ranked list showing the predicted binding affinity between drugs and molecules.Drugs with binding affinity scores higher than the threshold were selected as potential treatments for CS(Table 3).The final drug list comprised 25 drugs,including inhibitors of ITGB3,HIF1A,EP300,VEGFA,AKT1,TNF,TGFB1,IL-6,MAPK1,and PTK2.

Table 3 Identification of drug candidates for chondrosarcoma by DeepPurpose analysis.

4.Discussion

CS is a common cartilage-forming tumor that accounts for approximately one-third of all malignant bone carcinomas.16Unfortunately,current therapeutic approaches have limited effectiveness.In this study,we combined text mining and deep learning technologies to explore novel drugs for CS.In particular,DeepPurpose,a powerful toolkit to predict promising drugs to treat various diseases,was a useful tool for ranking candidate drugs targeting relevant genes and selecting the 25 most likely drugs for CS based on DTI analysis.As an encoding-decoding framework,DeepPurpose provides the five encoders we used for drug molecules,namely,Morgan,Daylight,convolutional neural network(CNN),transformer encoders,and message passing neural network(MPNN),as well as the two encoders we used for protein targets,including amino acid composition(AAC)and CNN.With a combination of different encoders and the three datasets mentioned above (DAVIS,BindingDB,and KIBA),we obtained 15 pre-trained models on Deep-Purpose,thus obtaining 15 binding affinity score predictions.Here,we briefly introduce the 25 drugs selected using text mining and DeepPurpose.

Among these candidate drugs,there were eight drugs targeting AKT1.The PI3K/AKT/mTOR pathway is frequently activated to promote tumorigenesis in CS,which greatly accelerates tumor aggression and inhibits tumor-associated apoptosis in cells where AKT serves as an oncogene.17Among the eight listed drugs,sirolimus has been applied to patients with CS,whereas everolimus and paclitaxel have been approved for various solid tumors.18,19The others,have shown antitumor ability,but are still under clinical testing.The MAPK pathway is a major oncogenic pathway in human cancer and is interlinked with the PI3K/AKT pathway,which mediates tumor proliferation and invasion through co-regulated cancer-associated proteins.Derivatives of irofulven have been found to display significant antitumor activity,and clinical trials of refametinib for various solid tumors have already been proven to have positive effects on tumor growth.20Therefore,inhibition of the MAPK pathway may present a selective treatment option for CS.

Hypoxia is a critical characteristic of the microenvironment of solid tumors.HIF-1α is a key molecular mediator that can be activated under hypoxia,as well as a proangiogenic transcription factor which can induce the expression of VEGF.21HIF-1α has been identified as a marker of malignancy and prognosis in CS by regulating cell metabolism,apoptosis,and neo-angiogenesis.22As an inhibitor of HIF-1α,2-methoxyestradiol has previously been tested for the treatment of hypertension,kidney injury,and CS for its antitumor effect.23Notably,sunitinib,another inhibitor of HIF-1α,has been approved for the treatment of several solid tumors,and has acquired prospective results in tests for CS.24-26

Angiogenesis is also a critical step in tumor proliferation,invasion,and metastasis.VEGFA is a key angiogenic factor that promotes angiogenesis and metastasis in CS patients.The expression of VEGFA has been found to be associated with the clinical stages of CS.27Among our list,vandetanib has been applied to treat cancers such as thyroid cancers and non-small lung cell carcinomas,but has not been formally evaluated in the setting of CS.28

Cytokines produced in the tumor microenvironment play important roles in tumor progression.Tumor cells modify their bone microenvironment by secreting osteoclast-activating factors such as the cytokines TNF and IL-6.There is evidence to show that TNF-α enhances the migration of CS cells.29As TNF inhibitors,midostaurin is a substantial drug for hematologic malignancies,and miltefosine is a classic treatment for leishmaniasis.30,31Arsenic trioxide,an IL-6 inhibitor,evolved from an ancient Chinese medicine to a novel potent antitumor drug for acute promyelocytic leukemia and specific solid tumors.32These inhibitors require further experimental research in the treatment of CS.

Integrin beta 3(ITGB3)is a transmembrane receptor of integrin that participates in tumor invasion and metastasis by activating the PTK2 and AKT signaling pathways.33Cilengitide,an inhibitor of the integrins αvβ3 and αvβ5,targeting integrin proteins and interacting in signaling pathways responsible for cancer metabolism,has shown promise in patients with myeloma and advanced solid tumors.34Previous studies have shown that TGF-β1 contributes to the migration of CS cells.35The TGF-β1 inhibitors tretinoin and fenretinide are currently under clinical trials for cancers,such as leukemia.36One drug in our list,garcinol,is an effective EP300 inhibitor which shows anti-neoplastic effects on cancer cell lines and experimental animal models,37,38and may be considered as a candidate for anti-cancer drugs;however,further studies are needed to ascertain its tolerability,efficacy,and safety.

The potential of machine learning models to predict the binding affinity between new drugs and targets has been confirmed in various studies,including the study of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2),39-41which has provided a method for drug R&D.DeepPurpose provides the mean value,maximum value,and mean value of the mean-maximum value to realize ensemble learning based on multiple models.However,this method may not provide the best predictions.The reasons for the different models outputting different predictions are as follows.First,the number of training samples may be insufficient.The advantage of deep learning lies in the large number of parameters and multilayer structure,which promote the feature presentation of models.However,a complicated structure may result in simultaneous overfitting of the models.Therefore,a large number and a wide variety of training samples must be ensured to promote the generalization of models and to diminish overfitting.Second,because the representation formats of SMILES have not been unified,models may not be able to extract the sequence feature when the representation formats of training samples are different from those of predicting samples,leading to inaccurate prediction.

5.Conclusion

Our study demonstrated that drug discovery usingin silicotext mining and DeepPurpose may be an effective method to explore drugs targeting genes highly related to CS.Our study provides a theoretical basis for the development of novel targeted therapies for CS.

Ethics approval and consent for participate

Not applicable.

Consent for publication

All the authors have consented for the publication.

Authors’ contributions

Li J: Writing-Original draft.Shi M: Writing-Original draft.Chen Z:Data curation,Writing-Review and editing.Pan Y: Writing-Review and editing.

Competing interests

The authors declare that they have no competing interests.

Acknowledgments

This study was supported by the National Natural Science Foundation of China(grant no.82102333).

Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.cjprs.2022.10.004.