Development status and trend of traditional Chinese medicine database
2022-12-06TaoXueShuaiGongTianHaoXieWenJuanLiJianPingHuang
Tao Xue, Shuai Gong, Tian-Hao Xie, Wen-Juan Li, Jian-Ping Huang*
1Alibaba Business School, Hangzhou Normal University, Hangzhou 310000, China.
2Qianjiang College, Hangzhou Normal University, Hangzhou 310000, China.
Abstract
Following the trend of information technology, the development of traditional Chinese medicine(TCM) databases has led to great changes in terms of data. For example, the storage and dissemination medium of data has achieved a shift from paper books to the internet, and the content has expanded from basic information to comprehensive information such as targets and molecular structures of modern medicine. In recent years, the amount of information contained in the TCM databases has grown at an unparalleled rate. However, there are still challenges correlated with the database construction, including insufficient data volume, inconsistent construction standards, and a low level of platformization. Additionally, the prevalence of applications in the field of TCM like network pharmacology, bioinformatics, and artificial intelligence requires a large volume of high-quality data.Generally speaking, the advancements in life science and artificial intelligence technology have outpaced the development of TCM databases. Therefore, this paper compiled the current status of TCM databases, discussed the benefits and drawbacks of various databases, and concluded that the development trend should be comprehensive, platform-based, and tightly integrated with modern life science technology and artificial intelligence technology, so as to provide assistance to the modernization research of TCM.
Keywords Traditional Chinese medicine; Database; Modernization
Highlights
This paper summarized the construction of traditional Chinese medicine (TCM) databases, discussed the advantages and disadvantages using the application scope as a guide, and concluded the development trend.
Background
Traditional Chinese medicine (TCM) is a system of medicine with a long history, primarily derived from natural drugs and their processed products.When the Chinese people's ancestors were looking for food, they found some plants and animals useful for treating disease. As a result, some medicinal knowledge was collected, and a complete theoretical system of TCM was then gradually established over thousands of years of continual enrichment of the clinical experience and pharmacological information contained therein [1].TCM is actively altering its development trajectory to better achieve the change towards modernity, which includes the construction of TCM databases, as modern life sciences and medicine improve. In China, for example, TCM data has steadily been computerized, which storage media changed from paper books to internet databases, and the databases’ content changed from basic information to comprehensive information, including compounds, targets, pathways, diseases, proteins, and drugs integrated with modern medicine. The extensive data of TCM databases can be utilized to investigate the interaction between TCM components, to find new drugs by discovery approaches such as network pharmacology, and even to explore the association between TCM and biomedicine [2].
Despite the fact that many TCM databases have been created, the majority of them have some issues or flaws: (1) Inadequate data coverage and volume.Some databases lack complete information on TCM, such as compounds,proteins, genetic loci, diseases, and other biological entities, as well as corresponding correlations between different data types. (2) Poor verifiability of data. For example, some of the data in the TCM database lacks citations,such as the inability to confirm the validity of formulations, potential targets of action, and active ingredients in herbs. (3) The content and design of database does not take full advantage of the opportunities provided by cutting-edge research and technology. (4) The level of platforming is insufficient. In addition to some databases not offering a platform for data analysis, there are some databases whose platforms for data mining and analysis could be improved. It would be easier to attain comprehensiveness,synthesis, and standardization if TCM databases could continuously incorporate important data and information from modern medicine and progress with contemporary cutting-edge science and technology.
With the goal of assisting in the construction of TCM databases and promoting the continuous progress of TCM modernization, this paper compiled existing TCM databases and analyzed their possible future development trends in light of recent developments in life sciences and other related technologies.
The current state of the TCM database construction
TCM database is the outcome of a huge number of original TCM materials being collected, analyzed, organized, and improved that utilize database technology, primarily covering diverse names, qualities, tastes, efficacy,principal treatment, usage, dose, and other basic TCM data. Many databases integrated with modern medicine emerged as a result of the modernization of TCM and the development of related science and technology, as well as the accumulation and integration of data resources such as compounds, targets,pathways, diseases, and drugs, and have become important platforms for the analysis of targets and drug effects [3]. Ancient literature databases,comprehensive databases, and specialized databases are the three primary categories of TCM databases. In the modernization of TCM, different types of databases serve different functions and missions.
The ancient literature database
The ancient literature of TCM recorded China's major medical achievements in the form of text, which represented the crystallization of the Chinese nation's wisdom and played a crucial role in China's TCM industrial development [4]. Digital means were employed to deal with classic TCM books and create a database of ancient TCM books. On the one hand, it can be used to preserve and reduce damage to ancient books; on the other hand,it can be used to analyze and mine the tacit information in ancient literature.
In China, most TCM universities have created their own ancient literature databases. For example, the ancient literature database of Nanjing University of Traditional Chinese Medicine is a full-text database based on the collection's features that preserve the original appearance of the ancient literature while also providing a search function [5]. The multimedia database of rare ancient TCM books in Zhejiang was built by Zhejiang Chinese Medical University Library, which includes acupuncture, Chinese medicine,various clinical disciplines, and health care. In addition to the databases created by universities, various commercial businesses have also created databases of historic TCM literature. For example, the Basic Chinese Ancient Texts Collection offers four sub-collections, each with 20 primary categories:philosophy, history and geography, literature and arts, as well as general category. The Hytung Ancient Book Database, which is constantly being updated, covers specialist ancient literature utilized in teaching and study in the domains of literature, history, and philosophy. It's also worth noting that the State Intellectual Property Office created a bilingual Chinese and English database for the patent of TCM, which satisfied the needs of patent examination for TCM research [6]. To some extent, it raised awareness of intellectual property protection for TCM.
The creation of the ancient TCM literature database increased the use of ancient literature, which was beneficial to TCM research and development,clinical guidance application, and theoretical research (comparative argumentation with traditional theories), as well as promoted the modernization of TCM to a certain extent. Table 1 describes the relevant information of the ancient literature databases listed in this paper. However,there are still issues in the development of TCM ancient literature databases,such as uneven technical specifications and standards, insufficient digitization and a lack of intellectual property protection [7].
Table 1 The information of ancient literature database
The comprehensive database
The study of the composition, efficacy, mechanism of action, and related targets of TCM is a challenge for the modernization of TCM. Studying the pharmacological mechanisms of TCM at the molecular level will help to solve the above challenges. For example, Tu [8] extracted artemisinin molecules with active ingredients from Artemisia annua to find anti-malarial drugs.However, earlier TCM databases focused on the basic data of TCM and did not involve information at the molecular level, which affected the integration and development of TCM with modern science and technology.
Modern scientific and technological approaches and equipment must be fully utilized in the modernization of TCM. In this context, TCM databases begin to incorporate information on the compounds and targets of modern life sciences. TCMD [9], TCM@Taiwan [10], TCM-ID [11], and TM-MC [12]are four examples of representative databases. The basic information on the name, tastes, ascriptions and effects of herbs and prescriptions is provided.Additionally, the data on the compounds is also supplied. To some extent,they can already help researchers conduct research on the molecular mechanisms of TCM, drug-target interactions and so on. For example, Liang et al. [13,14] proposed a computational approach for studying the molecular mechanisms of TCM and applied it to analyze anti-AIDS formulations. Tou et al. [14] used a ligand-based drug design approach with the help of a database for virtual screening and discovered that Guineensine could act as a drug-like compound to alleviate neuropathic pain by inhibiting FLAT. Wang et al. [15] discovered that Macranthoside B has potent anticancer properties.However, the data for TCM in these databases is incomplete. For example, in TCM-ID, there is a lack of information on the targets and diseases on which the compounds act, while in TCMD and TCM@Taiwan, there is a lack of relevant drug and compound data.
The expansion of data volume and the continuous enlargement of data variety are both results of the advancement of related research. As a result,the TCM database's data volume has significantly increased, and new data entries like pathways have appeared. TCMID is a database with a wealth of data, including not only information on targets, diseases, and medications, but also a vast amount of chemical data [16]. The database contains the structures of the compounds (molecular formulae, SMILES structures, and PubChem IDs), as well as mass spectral information on the compounds. The relationship among herbs, diseases, active compounds, and targets can promote the research of combination therapy and the understanding of the underlying mechanisms of TCM. For example, Wang et al. [17] discovered a viable technique for identifying potential asthma targets based on the efficacy of prescriptions.
Additionally, the database platform has been improved, and various analytical techniques have been implemented within the platform of the database for TCM. YaTCM, a database similar to TCMID [18]. However, it established a more comprehensive network of the relationships among the components of TCM and included information on identified and unidentified protein targets and pathways, as well as data on herbs, prescriptions, and diseases. And it also provides an analytical tool containing functions such as similarity search and substructure search for potential structures. Its development facilitates the elucidation of the systemic mechanism of action of TCM and promotes the discovery of new drugs. For example, Liu et al. [19]demonstrated that isocoumarin A can improve fracture healing and postmenopausal osteoporosis. Despite the fact that YaTCM has its own unique route information, it only includes a few prescriptions and no disease information. In contrast, the ETCM database, also known as the TCM encyclopedia, provides more extensive, rich, and standardized data, such as the relationships among formulas, herbs, and ingredients [20]. It also offers information on the origin of herbs, as well as targets of herbs, prescriptions,and compounds to allow functional and mechanistic investigations of TCM.After creating networks of chemical component-target and protein-protein interactions, Zhu and Hou [21] discovered that Kushen (Sophora Flavescens Alt) relieves inflammation mostly through IL-6, IL-1+, VEGFA, TNF-α, and PTGS2 (COX-2) and NF-B signaling pathways. Even though ETCM has a lot of data, there are still scalable options in the future, such as expanding the number of herbs.
Many efforts have been devoted to clarifying the molecular mechanism of TCM, although it is still difficult due to the diversity and complexity of TCM components. For example, BATMAN-TCM is an online bioinformatics analysis tool designed specifically for the study of molecular mechanism of TCM, which aids in understanding the "multi-component, multi-target, and multi-pathway" therapeutic mechanism [22]. Moreover, Li et al. [23] used bioinformatics network pharmacology to identify potential targets and pathways of vitamin C and glycyrrhizic acid for COVID-19 treatment, as well as its pharmacological mechanism. However, the above-mentioned database is far from sufficient for the study of the molecular mechanism of TCM.
As life science and technology progress, it is possible to combine TCM with gene therapy, and databases have emerged to study the relationship among herbs, genes, and disease [24]. For instance, TCMGeneDIT, a database developed by Taiwan University to combine TCM and biomedical research, employed text mining techniques to collect and integrate relevant data on herbs, genes, and diseases, allowing researchers to better understand TCM therapeutic mechanisms and gene interactions [25]. For example, Chen et al. [26] used text mining to identify 53 genes that were used in combination with Danshen (Salvia Miltiorrhiza Bunge, SM) and Sanqi(Panax Notoginseng, PN) and then searched the database for information on proteinprotein interactions of these genes, leading to the discovery of a major role for SM and a secondary role for PN in the treatment of coronary heart disease.Despite the fact that the database emphasized information about herbs,diseases, and genes, the information about diseases and medicine is scarce.
The interaction of small molecules and proteins in biological processes is also very important in biomedical and pharmacological research.Additionally, pertinent databases exist for this kind of investigation. HIT, a database focused on providing complete information on herbs, components,and proteins, offers essential support for mechanistic investigations, new drug development, and prospective therapeutic target discovery in TCM [27]. For example, Georrge and Umrania [28] compared and analyzed the proteomes of disease-causing organisms and humans, screened for non-homologous proteins, listed essential proteins of the organism, performed KEGG pathway analysis to study protein function, and used the database to search for new drug targets, eventually identifying 105 new drug targets involved in the 24 specific pathways of Klebsiella pneumoniae MGH78578. However, it should be noted that the number of herbs and their respective constituent components and targets for HIT was limited, and did not include illness information.Notably, the TCM genome is important in the discovery of functional genes for active ingredients in TCM, as well as in the cultivation and improvement of TCM. TCM-Blast is a database and platform for searching TCM protein and DNA sequence similarity [29]. It provides batch searches on TCM genomic sequence data, improving the efficiency of TCM genomic research.
In addition to the databases listed above, TCM database development is progressing in lockstep with the advent of databases for systems pharmacology and network pharmacology analysis. TCMSP was based on a systems pharmacology framework and included data on ingredients of herbs,targets, diseases and their connections, as well as the components of herbs and their ADME qualities [30]. It can be used in studies to help uncover the nature of TCM theories and generate new herbal guidelines. For example, Liu et al. [31] discovered 23 active chemicals in Xijiao Dihuang Decoction(XJDH) and identified 118 relevant targets for viral hemorrhagic fever (VHF),which were studied and concluded that XJDH may benefit VHF patients through numerous pathways utilizing a system pharmacology approach. On the other hand, the amount of data in TCMSP still needs to be expanded because the number of herbs, targets, and disorders was limited, and there was no appropriate prescription and drug information. In contrast, TCMMesh is a database integrated by a data mining system, which can be used for the network pharmacological analysis of Chinese pharmaceutical preparations [32]. It includes information on herbs, chemicals, genes, and diseases, as well as 211 records of side effects and 71 records of toxicity. The database was created to find the "compound-protein/gene-disease" route in order to better understand the molecular mechanisms of TCM formulations.Lee et al. [33] discovered that cordycepin has strong anti-cancer properties in breast cancer cells by triggering apoptosis. Although TCM-Mesh's effects and toxicity were reported, there was no comparable data for formulations.Additionally, TCM-Suite is made up of two sub-databases, i.e., Holmes-Suite and Watson-Suite, which combines the identification of biological components of TCM with downstream network pharmacology research [34].The database contains biological ingredient data of TCM, herbs, formulas,compounds, proteins, gene targets, and diseases, as well as corresponding correlations between different types of entries, which can aid in the management of TCM-related information, data-driven hypothesis generation,and potential drug discovery.
In recent years, the research in TCM has shifted towards high-throughput transcriptome screening, Zhao and his team constructed the HERB [35].HERB is a high-throughput, experimental, and reference-oriented database of TCM. It combines many databases and offers the widest selection of resources about herbs and ingredients available, as well as high-quality targets and information on diseases that was cross-referenced with modern drug databases. It explains the objective connections between TCM and current medications and bolsters the case for pharmacology research. For example, Li et al. [36] used network pharmacology to uncover the possible mechanism of Huang Lian Jie Du Tang for treating metastatic melanoma and immune infiltration analysis to discover its potential mechanism for preventing melanoma metastasis. Despite having a great deal of data and citations, the HERB regrettably lacks prescription data.
To create a more intuitive feeling, we list the essential details of comprehensive databases in Table 2, describe their benefits and drawbacks,and perform a quantitative comparison of the data for each database in Table 3. Furthermore, the process of creating a TCM database is inextricably linked to the support of basic databases that provide data sources and establish crosslinks. The following describes a few of them: (1) TTD [37] and DrugBank[38] which mainly contain information on drugs and their targets; (2) HPO[39], a database of human disease phenotypes; (3) OMIM [40] and GAD [41]which contain information on protein-disease associations; (4) STITCH [42],which is used for compound-protein interactions; (5) PubChem [43],ChemSpider [44] and ChEMBL[45], which provide molecular structures of compounds; (6) PharmGKB [46], KEGG [47], and DisGeNET [48] (disease related), which contain gene information. The existence of these databases serves as a solid foundation for the development of TCM databases.
Table 2 The information of comprehensive database
Table 3 Statistical comparison of comprehensive database
The specialized databases
Aside from the aforementioned databases, there are some TCM databases designed for specialized areas (specific sorts of diseases). The typical examples are as follows (Table 4):
Table 4 The information of professional database
(1) TCMIO, a database on TCM immuno-oncology [49]. Based on it,researchers can investigate the molecular mechanisms by which TCM modulates the immune microenvironment of tumors, which contains over 120,000 small molecules against 400 immuno-oncology targets and is mapped to the chemical composition of TCM, allowing researchers to identify TCM interacting with immuno-oncology targets. UniProt, CHEMBL,TCMAnalyzer, TCMSP, and TCMID are the key sources of data. As the first database of TCM for immuno-oncology, it can aid in the study of the mechanisms of action in cancer immunity and the creation of new therapies for cancer immunotherapy as the first database of TCM for immuno-oncology.For example, Cai et al. [50] used the system pharmacology method to reveal that three natural compounds (baicalin, wogonin, and oroxylin A) in Huangqi(Scutellaria baicalensis Georgi) might overcome resistance to recognized oncology medicines by modifying the tumor immunosuppressive microenvironment.
(2) CVDHD, a database of chemical components of TCM linked to cardiovascular disorders [51]. The database was primarily based on the Chinese Herbal Pharmacopoeia, the Chinese Pharmacopoeia, the CHDD, and UNDD produced by the research team, and it comprised information on herbs,natural products, and protein targets relevant to cardiovascular diseases. It can be utilized for natural product drug discovery investigations, as well as molecular mechanisms of TCM on cardiovascular diseases and network pharmacology of natural products studies connected to cardiovascular diseases. For example, Liang et al. [52] used a molecular docking approach to explore the interaction between numerous essential proteins in the cardiovascular system and the formulae's components, indicating the possible causes of the formulae's synergistic effects on cardiovascular protection.
(3) CEMTDD, a database of TCM for ethnic minorities in China's Xinjiang Uyghur area [53]. The database covers 621 herbs, 4060 chemicals, 2163 targets and 210 diseases, most of which are used in geriatric medicine to treat inflammation, cardiovascular and neurodegenerative diseases. PubMed,Chinese Pharmacopoeia, SCI Finder, CNKI. and other databases were the main data sources for CEMTDD. Furthermore, CEMTDD has connections to OMIM and Uniprot. The architecture of CEMTDD allowed for the research of molecular mechanisms in TCM and the development of future combination therapies (especially in geriatrics). For example, Pang et al. [54] found that Bai Xuan Xia Ta Re Pian (BXXTR) can inhibit psoriasis inflammation synergistically through multiple targets modulated by multiple components,making a significant contribution to the study of the active ingredients and mechanism of action of BXXTR for the treatment of psoriasis inflammation.
(4) SymMap, a database of TCM evidence and symptom associations [55].It comprised 499 herbs and their 1717 TCM symptoms, as well as 5235 disorders related with these symptoms, 19595 herbal components, 4302 pharmacological targets, and the connections between these six types of data.At the phenotypic and molecular levels, SymMap integrated TCM with contemporary medicine. SymMap enabled the integration of phenotype- and target-based knowledge, as well as the development of new medications using phenotype-based compound screening guided by current compound target and disease target data. For example, Yang et al. [56] discovered that the anti-inflammatory action of Ma Xing Shi Gan Decoction may be responsible for the therapeutic efficacy of Qingfei Paidu Decoction against COVID-19.
Throughout this paper, the ancient literature databases were primarily used for teaching and study, whereas the comprehensive and specialty databases were primarily used for research utilizing modern medical procedures. The databases like TCMD, TCM@Taiwan, and TM-MC, which added relevant chemical composition data to help researchers conduct research on TCM at the molecular level. According to the basis of the above, the databases like TCM-ID, TCMID,YaTCM, BATMAN-TCM, and ETCM, continued to enrich data content and data volume by adding various targets and other information, further assisting researchers to conduct research like new drug discovery. In the meanwhile,TCMGeneDIT, HIT, TCM-Blast, and other TCM databases including genetic data can help scholars perform meaningful TCM research in the setting of life sciences.The TCMSP, TCM-Mesh, and TCM-Suite databases can be utilized for systemic pharmacology and network pharmacology research as associated technology and methodologies advance, while the recently born HERB database can be used for high-throughput transcriptome screening studies based on this application.Furthermore, TCM databases in related specialist disciplines, such as TCMIO,CVDHD, CEMTDD, and SymMap, have evolved alongside the ongoing development of comprehensive databases. All of them can be employed to modernization research on TCM in the specialized fields they cover.
Finally, it could be discovered that database independence is diminishing,database association is growing, and database integrity and efficiency are improving. Correspondingly, the support of TCM databases is also essential for applications in life science and artificial intelligence, such as the development of intelligent diagnostic systems based on deep learning and the identification of active ingredient targets based on proteomics. Clearly, tracking the development of linked technology and integrating the huge data created from associated scientific fields will be major development directions for TCM databases in the future.
The possible development trend of TCM database
TCM, being a valuable resource, plays a vital part in China's economic and social growth. As China's modernization process continues, the trend of population aging deepens and the healthcare industry develops rapidly, people's demand for TCM grows, necessitating the inheritance and modernization of TCM business.However, TCM is a complex system with many components, multiple targets of action, and multiple paths of action, and the material foundation and mechanism of action of its medicinal benefits are difficult to clarify [57]. To avoid the issue confronting research and development of TCM, it is vital to combine it with fastgrowing life sciences and other technologies, as well as to make full use of the relevant existing data resources, in order to seize the chance for progress.
The application of technologies connected to the modernization of TCM is currently widespread and has resulted in significant breakthroughs. Modern analytical methods, such as quantum medicine, molecular biotechnology, and computerized virtual screening techniques, have been used to study the composition of TCM [58]. Metabolic and other histological methods have been used to study the mechanism of action of TCM; network pharmacology methods have been used to study the potential active ingredients and targets of action of prescriptions; and artificial intelligence technology has been used to study data mining, assisted diagnosis, and new drug development. It should be obvious that the above technologies cannot be used without the assistance of data. The importance of data is self-evident, especially in the contemporary era of big data.TCM database is a necessity and basis for obtaining successful outputs using various new technologies, and it is a major task in the modernization of TCM. As a result, this paper believes that for the construction of a TCM database, we can strive to find a breakthrough in the following areas:
(1) Aligned closely with the development requirements of new life science and artificial intelligence technologies. Transcriptomics, proteomics, metabolomics,and other high-throughput analysis and detection technologies-based research methods do not focus on a single target and pathway, but rather evaluate thousands of targets and pathways to screen for changes in genes, proteins, metabolites, and other factors involved in the process of biological signal transduction. Furthermore,artificial intelligence technologies are being increasingly used in the field of TCM,such as drug-target interaction prediction based on deep learning, protein interaction prediction, DNA sequence analysis, drug action target identification for complex network diseases, and data mining of TCM based on machine learning.All of the above studies require a large amount of high-quality data, and as the largest data source for scientific research, the data in TCM databases must be compatible with cutting-edge technologies.
(2) Data on TCM should preferably be comprehensive, verifiable, and up-todate. As research continues, data is being accumulated and updated, and thus the TCM database can be kept automatically updated using text mining techniques.Furthermore, data coverage must be broadened to include biological sources of pharmaceuticals, distribution, traditional applications, efficacy, harmful side effects, clinical applications, and references, as well as corresponding relationships between different data types. Moreover, citations should be included with data sources such as ingredients of herbs and prescriptions to ensure their authenticity and reliability.
(3) Encourage the uniformity of building codes. There are discrepancies in language between traditional medicine and modern medicine, such as disease classification and pathological descriptions. Therefore, while developing TCM databases, it is critical to integrate the two and establish consistent standards [59].Furthermore, differing database building standards make integration and sharing between databases problematic, so in the future, we must optimize data and adopt unifying standards to produce standardized TCM databases.
(4) Make full advantage of sophisticated computing technology and progress toward platformization. The TCM database's elastic expansion, vast storage, low cost, safety, and reliability are the fundamental promises that its future development will be benign. Furthermore, building a user-friendly database platform is also currently a mainstream. In addition to traditional database search functions, functions such as target prediction of Chinese medicine components,biofunctional analysis of targets, and association network visualization are required.
Conclusion
TCM databases, as the underlying service facilities for scientific research, not only provide data support for TCM modernization research but also speed up the update and distribution of TCM and associated information. Considering the construction of existing databases, it is clear that the development of TCM databases will be in line with cutting-edge technologies. Consequently, it is important to concentrate on the following issues when building the TCM database: (1) analyzing the data requirements of cutting-edge technologies to create the TCM database's data dictionary and provide data support for its development; (2) continually increasing the data volume and broadening the data coverage is crucial, in addition to using methods like text mining to keep track of the most recent literature to maintain data timeliness and reliability; (3) the database platform should be user-friendly and diverse in terms of analytical functions. However, the development of TCM databases and their application technologies should adhere to the scientific components of TCM's theoretical system and compensate for its shortcomings by actively comparing it to modern medicine.
杂志排行
TMR Modern Herbal Medicine的其它文章
- Role of traditional herbal medicine in the treatment of malaria
- Clinical observation and experimental study on Kangfuxin fluid in treating indwelling needle-related phlebitis
- Thinking and practice on clinical safety evaluation of combination of traditional Chinese medicine and western medicine
- New flavanone-monoterpene hybrids as α-glucosidase inhibitors from the root bark of Morus alba
- Phytochemical screening and traditional medicinal potential of Albizia lebbeck (L.) Benth: An update