A knowledge-guided and traditional Chinese medicine informed approach for herb recommendation∗
2023-11-06ZheJINYinZHANGJiaxuMIAOYiYANGYuetingZHUANGYunhePAN
Zhe JIN,Yin ZHANG‡,Jiaxu MIAO,Yi YANG,Yueting ZHUANG,Yunhe PAN
College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China
Abstract: Traditional Chinese medicine (TCM) is an interesting research topic in China’s thousands of years of history.With the recent advances in artificial intelligence technology,some researchers have started to focus on learning the TCM prescriptions in a data-driven manner.This involves appropriately recommending a set of herbs based on patients’ symptoms.Most existing herb recommendation models disregard TCM domain knowledge,for example,the interactions between symptoms and herbs and the TCM-informed observations(i.e.,TCM formulation of prescriptions).In this paper,we propose a knowledge-guided and TCM-informed approach for herb recommendation.The knowledge used includes path interactions and co-occurrence relationships among symptoms and herbs from a knowledge graph generated from TCM literature and prescriptions.The aforementioned knowledge is used to obtain the discriminative feature vectors of symptoms and herbs via a graph attention network.To increase the ability of herb prediction for the given symptoms,we introduce TCM-informed observations in the prediction layer.We apply our proposed model on a TCM prescription dataset,demonstrating significant improvements over state-of-the-art herb recommendation methods.
Key words: Traditional Chinese medicine;Herb recommendation;Knowledge graph;Graph attention network
1 Introduction
Traditional Chinese medicine(TCM)carries the experience and theoretical knowledge of the ancient Chinese people’s struggle against diseases and has played an indispensable role in health care for Asian people for thousands of years as the foundation of the Chinese medical system.It constantly combines the basic principles and methods of ancient Chinese philosophy with long-term specific practice.In recent decades,TCM is increasingly accepted worldwide(Cheung,2011),and with the continuous development of various medical specialities such as microbiology and immunology,more studies have begun to focus on the importance of TCM as a complementary and alternative therapy (Lu et al.,2022;Mao et al.,2022),providing new ideas and references for clinical practice.
Traditionally,TCM consultation has been mainly offline.Doctors obtain the symptoms of patients by four basic diagnostic methods calledwangwen-wen-qieand develop a prescription for treatment.The whole process is based mainly on the doctor’s experience in treating patients and clinical practice,which is highly subjective.Therefore,several studies have tried to learn such prescribing patterns and knowledge through the topic model and other machine learning algorithms(Yao et al.,2018;Zhao GS et al.,2018).For example,Yao et al.(2018)proposed a topic model to characterize the treatment process by regarding herbs and symptoms as observed variables,while treating syndromes,treatment methods,pathogeny,and other elements as hidden variables.However,the topic model has several limitations in such information extraction tasks of TCM prescription: (1) Most TCM texts are short and deficient in contextual information,which leads to data sparsity.The topic model is a bag-of-words model,which ignores the order and semantic relationships among words,and is therefore not suitable for semantic analysis.(2) The topics in TCM often overlap,which might confuse the topic model.
With the rapid development of artificial intelligence and deep learning,graph convolutional networks(GCNs)are gradually being used in tasks such as social network analysis and knowledge graph mining due to their high performance and interpretability.GCNs excel in leveraging the graph structure and aggregating node information from neighborhoods;hence,they have great expressive power to learn graph representations and have achieved superior performance in a wide range of tasks and applications (Zhang S et al.,2019).Because of the excellent performance of GCNs on graphs,researchers have suggested that the mutual relationship between the herbs and symptoms in TCM prescriptions can also be represented as a graph (Jin et al.,2020;Yang et al.,2022).For example,a TCM prescription inDictionary of Traditional Chinese Medicine Prescriptions(Fig.1)shows the name,composition,usage,function,and treatment.The herbs,such as Chinese Thorowax Root andScutellaria baicalensis,are in the composition part,while symptoms,such as no appetite and vexation and vomiting,are listed in the treatment part.Jin et al.(2020) constructed a symptom-symptom synergy graph (S-S graph),an herb-herb synergy graph (H-H graph),and a symptom-herb synergy graph (S-H graph)based on prescriptions according to the frequencies of co-occurrence between herbs and symptoms.This method could be further improved by adding external knowledge and graph structure(Yang et al.,2022;Zhao W et al.,2022).
Fig.1 An example traditional Chinese medicine(TCM) prescription “Minor Bupleurum Decoction”including composition,usage,function,and treatment.The herbs marked in red,orange,blue,and green are the jun (monarch), chen (minister), zuo(assistant),and shi (envoy) ones,respectively.This classification is obtained based on the TCM formulation principle and reveals the different role that each herb plays in the prescription.References to color refer to the online version of this figure
Even though some of the previous studies have incorporated some external TCM knowledge,they still lack more comprehensive TCM knowledge to guide herb recommendation.The first type of knowledge concerns the TCM prescription process.In general,modern medicine understands and treats diseases mainly from the microscopic perspective of molecules,genes,and cells,while TCM is a holistic concept and uses evidence-based treatment.In TCM,the human body is an organic whole and the various components that make up the human body are structurally inseparable,functionally coordinated,and pathologically influenced by each other.Therefore,in the TCM therapeutic process,in addition to the doctor’s initial approach of collecting symptoms and the conclusion of prescribing treatment,there will be a syndrome induction stage between them.In this part,doctors will judge which syndrome the patient has on the basis of all symptoms and physical signs.In fact,one syndrome may encompass multiple symptoms and one symptom may appear in multiple syndromes,which makes the syndrome induction process complex.Inspired by Jin et al.(2020),we try to reproduce this prescription process and add the syndrome induction part with the possibility of multiple syndromes.
The second type of knowledge concerns the path interaction relationships from external knowledge graphs.Wang XY et al.(2019)used TransE(Bordes et al.,2013)to obtain knowledge graph embeddings,which are integrated into the topic model.Yang et al.(2022) constructed an herb knowledge graph,which is composed mainly of the herb’s effect,flavor,and tropism.Only herb’s one-hot embedding was used in the model by combination with the output of the H-H graph.In our study,we construct a much larger TCM knowledge graph that covers most of the mainstream TCM knowledge and provides richer and more comprehensive information.We also generate a subgraph based on the path interaction relationships between herbs and symptoms in the knowledge graph,which will be used to obtain the discriminative feature vectors of symptoms and herbs via a graph attention network(GAT).
The third type of knowledge is TCM-informed observations and patterns.Recently,a physicsinformed neural network(PINN)(Raissi et al.,2019)was presented to enable the synergistic combination of mathematical models and data using principled physical laws as prior knowledge to act as a regularization agent.Similarly,during the long period of practice and development of TCM,many TCMinformed observations and patterns have been summarized.The most important and influential TCMinformed principle of prescription composition isjunchen-zuo-shifromShennong Bencao Jing(Variorum of Shennong’s Classic of Materia Medica),which symbolizes monarch,minister,assistant,and envoy,four types of people in government.This is a metaphor indicating that different herbs have distinct roles in the prescription.The monarch herbs calledjunare the most irreplaceable and indispensable,and will target the main cause or primary symptoms of a disease.Thenchenherbs will assistjunherbs in enhancing their effects,just like the minister.Thezuoherbs are ancillary herbs that are used mainly to improve the effects ofjunandchenand relieve or eliminate the toxicities or side effects of these herbs.Theshiherbs harmonize other herbs and guide them to target organs or disease locations.In Fig.1,a famous prescription “Minor Bupleurum Decoction” is composed of several herbs.Chinese Thorowax Root marked in red is thejunherb and relieves exterior and fever.The main symptoms treated by Chinese Thorowax Root are alternating chills and fever,and chest fullness or pain,which can be found in the treatment part of the whole prescription.Scutellaria baicalensis,marked in orange,is thechenherb because it clears heat and reconciles lesser yang.Ginseng,Pinellia Tuber,Fresh Ginger,and Chinese Dates,marked in blue,are thezuoherbs.They can reduce vomiting,nourish the spleen and stomach,provide stomach nutrients,and promote gastrointestinal digestion.Liquorice Root,marked in green,is used to regulate all herbs and plays the role of theshiherb.Note that different herbs play multiple roles in different prescriptions.
Based on these three types of knowledge,in this work we propose a knowledge-guided and TCM-informed approach for herb recommendation.We construct a new large-scale TCM knowledge graph,TCMKG.Based on path interactions and cooccurrence relationships among symptoms and herbs from the knowledge graph,we determine the discriminative feature vectors of symptoms and herbs via a GAT.We also present a multi-syndrome prediction layer to simulate the possibility of multiple syndromes motivated by the first knowledge type,and introduce the third knowledge type,the TCM-informed principle of prescription composition,calledjun-chen-zuo-shi,into the prediction layer.
The main contributions of this paper are summarized as follows:
1.We construct a large-scale TCM knowledge graph,TCMKG,which consists of herbs,prescriptions,pieces,diseases,symptoms,syndrome,tropism,properties,flavor,and functions,and covers most of the common knowledge in TCM.
2.We propose a knowledge-guided and TCMinformed approach combining path interactions and co-occurrence relationships from the knowledge graph,multi-syndrome prediction,and the TCMinformed principle of prescription composition.
3.Experimental results on a TCM prescription dataset show that our proposed approach has better performance than state-of-the-art TCM herb recommendation models.
2 Related works
2.1 Graph neural network
Graph neural networks (GNNs) are deep learning based methods that deal with arbitrary graphstructured data.Compared with convolutional neural networks (CNNs) (LeCun et al.,1998),which have achieved remarkable results in both computer vision (Girshick et al.,2014) and natural language processing (Gehring et al.,2017),GNNs are applied to a graph in a non-Euclidean space,whereas CNNs can operate only on regular Euclidean data like images and texts.Based on CNNs and graphs,variants of GNNs are proposed to collectively aggregate information from graph structure (Xu et al.,2019).GNNs have been used in a wide variety of fields,such as side effect prediction (Zitnik et al.,2018),traffic state prediction (Guo et al.,2019;Zheng et al.,2020),and chemical reaction prediction(Do et al.,2019).
Graph convolutional networks(GCNs)are variants of GNNs that generalize convolutional operation on the graph-structured data.GCNs fall into two categories: spectral information based methods and spatial information based methods.Spectral information based methods work with a spectral representation of the graphs and define convolution from the perspective of graph signal processing.For example,Bruna et al.(2014) extended convolution to general graphs,and Kipf and Welling (2017) proposed a localized first-order approximation of spectral graph convolutions.Spatial information based methods directly operate on the graph and propagate node information from groups of spatially close neighbors.Atwood and Towsley (2016) treated graph convolutions as a diffusion operation,and Hamilton et al.(2017)introduced GraphSAGE,which generates embeddings by sampling and aggregating features from a node’s local neighborhood.
Inspired by attention mechanisms (Vaswani et al.,2017),GAT (Veliˇckovi´c et al.,2018) assumes that the contributions of neighbor nodes to the central node are different and uses a multi-head attention mechanism to fuse the importance of neighbors.Zhang JN et al.(2018) introduced a self-attention mechanism to compute an additional attention score for each head.Wang X et al.(2019) proposed a heterogeneous GNN based on hierarchical attention,including node-and semantic-level attention,to fully use the information of different types of nodes and links.
2.2 Herb recommendation
Because medical records contain the core ideas of TCM,the prescriptions in them play an important role in the inheritance and practice of TCM clinical experience.Herb recommendations based on prescriptions include two main types of approaches.
2.2.1 Topic models
The topic model,which is usually used in natural language processing,provides a more principled approach for developing novel models for latent variable discovery and data analysis.TCM prescriptions are documents containing herbs and symptoms as words,and the topic model may assume that herbs and symptoms occurring under the same topic are similar.Ma and Wang (2016) used a symptomsyndrome model to discover the correlation between symptoms and latent syndrome topics.Lin et al.(2016)considered“pathogenesis”as the latent topic to connect symptoms and herbs.Wang S et al.(2016) designed an asymmetric probability generation model to model symptoms,herbs,and diseases simultaneously.Ji et al.(2017)used a multi-content topic model for prescription recommendation by regarding pathogenesis as the latent topic to associate symptoms and herbs.Yao et al.(2018) proposed a prescription topic model inspired byli-fa-fang-yao,which represents principles,methods,prescriptions,and Chinese herbs,indicating the basic steps of diagnosis and treatment.Zhao GS et al.(2018)proposed a double-end fusion recommendation framework,including methods of adjusting weight proportion and similarity remapping.Wang XY et al.(2019) proposed the knowledge graph embedding enhanced topic model (KGETM),which adds co-occurrence information in TCM cases and comprehensive semantic relatedness of symptoms and herbs in the TCM knowledge graph.Chen et al.(2018) introduced TCM domain knowledge into a topic model to capture the herb compatibility regularities.
These models exceed traditional non-topic models in terms of generalizability and herb recommendation.However,the topic model is based on the bag-of-words concept,ignores the order and semantic relationships between words,and fails to adequately simulate the multiple relationships in TCM knowledge.Moreover,the topics in TCM often overlap,which might confuse the topic model.
2.2.2 Neural network models
Deep learning based methods have proven their effectiveness in several recent studies.Li and Yang (2017) adopted bi-directional recurrent neural networks (BRNNs) to do text representation learning of the herb words in the TCM literature for the treatment complement task.Ruan et al.(2019a,2019b) integrated the auto-encoder model with meta-path to mine the TCM heterogeneous information network.Li and Yang (2019) proposed a novel sequence-to-sequence model whose decoder is designed with a coverage mechanism and a soft loss function.As mentioned above,GNNs were proposed to leverage the graph structure and aggregate node information from the neighborhoods.Based on GNNs,which can fully learn graph representations,several methods have been proposed using the prescriptions’ own graphs and external graphs for herb recommendation.Zhou et al.(2021) fused a CNN with phenotype and molecule information.Jin et al.(2020)obtained low-dimensional embedding of symptoms and herbs through a GCN,and then proposed a syndrome-aware mechanism to obtain embedding of symptom sets in the prescription.Yang et al.(2022) constructed the herb knowledge graph and proposed multi-layer information fusion based on a GCN for knowledge-driven herb recommendation,using the properties of herbs in the knowledge graph as auxiliary information to help the model better fit the feature representation of herbs.Zhao W et al.(2022) considered then-ary relationships among symptoms,state elements,syndrome types,and herbs,and proposed a prescription recommendation model based on a multi-graph convolutional network.Although some research has incorporated additional TCM knowledge,such as a fixed vector of herbs trained through knowledge graphs,there still lacks more comprehensive TCM knowledge to guide herb recommendation.In this work,we use a GAT to dynamically train both path interaction and co-occurrence relationship graphs to ensure that the representation of each node is also continuously updated during training.
3 Proposed methodology
We formulate the definition of herb recommendation in Section 3.1,and introduce our TCM knowledge graph and the overall architecture of our approach in Sections 3.2 and 3.3,respectively.
3.1 Problem definition
Herb recommendation aims to learn the recommended route of prescribing from symptoms to herbs based on a large training corpus of prescriptions;we formulate this problem as follows.LetS={s1,s2,...,sM}represent a symptom set containingMdifferent symptoms andH={h1,h2,...,hN}an herb set containingNdifferent herbs.Each prescriptionp=〈{s1,s2,...,sm},{h1,h2,...,hn}〉consists of at least one set of herbs and one set of symptoms(an example is shown in Fig.1).The introduced directed TCM knowledge graphG=(V,E) containsnTCM-related nodesV={V1,V2,...,Vn}and edgesE ⊆V ×V,where(Vj,Vi)∈Edenotes an edge from a nodeVjto a nodeVi.We assume that every nodeVi ∈Vis represented by ad-dimensional vector.
Given a symptom setsset,the task of herb recommendation is to recommend an herb sethset.Usually this task is converted to learning a prediction function ˆysset=f(sset,H|θ),wheressetrepresents the probability matrix andθindicates the trainable parameters.The value of dimensioniofssetrepresents the probability of theithherb in the recommendation.
3.2 Knowledge graph
Because there is no authoritative and public structured TCM knowledge graph,we have constructed our TCM knowledge graph by Neo4j.Texts from TCM textbooks,dictionaries,standards,and other officially released data are extracted into triples as the basic unit of the knowledge graph.Triples are 3-tuples consisting of a subject,a predicate (or relation),and an object,usually expressed as (s,r,o).Both the subject and the object are regarded as entities,and the triples contain the relationship between entities.
We pre-define 10 types of entities based on the field of TCM: herb,prescription,pieces,disease,symptom,syndrome,tropism,property,flavor,and function.The pre-defined relations between them are shown in Fig.2.Property,flavor,and tropism often appear in the description text of the herb,showing the understanding of the herb from the perspective of Chinese medicine.The property refers to the four natures,or four characteristics of cold,heat,warm,and cool,which reflect the body’s semiinterior phase change after the herb acts on the body.Flavor means the herb’s localization in Yin and Yang.In the five flavors,pungent and sweet belong to Yin,whereas sour,bitter,and salty belong to Yang.Tropism describes the therapeutic effect of the herb on the pathology of certain viscera and meridians.The remaining entities are common and can be easily defined.
Fig.2 Pre-defined types of entities and relations of TCMKG(Rectangles represent entities and arrows indicate relationships between two entities)
All entities and relationships are stored in the knowledge graph as triples.We have constructed 128 357 entities and 1 138 718 triples in TCMKG.Detailed information about TCMKG can be found in the Appendix.
3.3 Proposed model
Our proposed approach,called MKMI-GAT,is shown in Fig.3,and the details of the graph embedding module are shown in Fig.4.The graph embedding module takes symptom listS={s1,s2,...,sM},herb listH={h1,h2,...,hN},and TCM knowledge graphG=(V,E) as input,and will update all embeddings of herbs and symptoms during each training iteration.The whole model MKMI-GAT will take a symptom setsset={s1,s2,...,sm}as input in both the training and testing stages,and outputs the predicted probability vector ˆyssetin which the value of dimensionirepresents the probability of herbiin the recommendation.
Fig.3 The overall architecture of our proposed approach,MKMI-GAT
3.3.1 Graph embedding module
We construct two graphs for learning discriminative feature embeddings of herbs and symptoms(Fig.4).The first graph is the path interaction graph.Distinct from other research efforts that have not fully used knowledge graph information,we incorporate TCMKG into the training of the entire model.For each herb and symptom that have occurred in prescriptions,we use TCMKG to take side information into account.
Assuming that a TCM prescriptionp=〈{s1,s2},{h1,h2}〉has two symptoms and two herbs,all the subgraphs of these four nodes are shown in the medical knowledge graph part of Fig.4.Based on Neo4j,we use the all-shortest-paths function in Cypher to find the most relevant relationships or paths between two nodes while avoiding excessive computational burden.Every two nodes need to be queried once by the all-shortest-paths function.We regard these paths as their path interaction and collect all paths to construct a path interaction graph.For instance,as we see in Fig.4,the shortest path betweenh1andh2is via a green node above them.In TCMKG,we can know that different herbs might have the same functions,properties,and flavors,which is exactly the possible category of this green node.The shortest path betweenh1ands1can pass through a yellow node(syndrome entity)or red node(disease entity),and both paths will be counted.The path interaction graph is defined as follows:
wherenj,nk ∈p,andfaspis the all-shortest-paths function in Cypher and is applied in TCMKG.
The second graph was inspired by Jin et al.(2020,2022).We construct a co-occurrence relationship graph based on the co-occurrence frequency in prescriptions,which represents the intrinsic therapeutic correlations between herbs and symptoms.
Taking a prescriptionp'=〈{s1,s2,...,sm},{h1,h2,...,hn}〉as an example,we can obtain a new treating relationship edge according to the co-occurrence of them in the form of{(s1,h1),...,(s1,hn),...,(sm,h1),...,(sm,hn)}.The whole co-occurrence relationship graph is defined as follows:
3.3.2 Graph attention network
After the construction of the path interaction graph and co-occurrence relationship graph,the embeddings of nodes in the embedding layer are summarized as follows:
whereh,s,andorefer to herbs,symptoms,and other nodes,respectively,andeh ∈RN×d,es ∈RM×d,andeo ∈R|O|×d,where|O|is the number of all other nodes besides herbs and symptoms.VGrepresents all nodes in graphG,and all the embeddings are randomly initialized and trained during optimization.With the initial representation of all nodes,according to GATv2 (Brody et al.,2022),the scoring function for every edge(j,i)used to indicate the importance of the features of neighborjto nodeiis formulated as follows:
wherea ∈RD',W ∈RD'×2D,D is the dimension of the hidden layers,D'is an internal hyperparameter,and concat()denotes vector concatenation.The attention function that calculates the attention scores across all neighborsj' ∈Neigh(i)is defined as
where Neigh(i)={j | j ∈V,(j,i)∈E}.We obtain the new representation ofiby computing the weighted average of itself and the transformed features of its neighbor nodes as follows:
3.3.3 Representation fusion
We have obtained herb and symptom features from the path interaction (PI) graph and cooccurrence relationship(CO)graph ashPI,hCO,sPI,andsCO,respectively,after each epoch’s neighbor information aggregation.Combined with herb embeddingehand symptom embeddinges,we have
Input symptom setsset={s1,s2,...,sm}will be transformed into one-hot formsonehotand used as a lookup mechanism to obtain the full input symptom representation:
wheresin∈RM×D.
3.3.4 Multi-syndrome prediction layer
As mentioned above,one syndrome may encompass multiple symptoms and one symptom may appear in multiple syndromes.Therefore,after obtaining the representation of the current input set of symptoms,we add a multi-syndrome prediction layer to predict the possible syndromes by introducing the multi-head self-attention mechanism withNheads:
3.3.5 TCM-informedjun-chen-zuo-shimodule
To add constraints of TCM-informed principles in deep learning,we propose a TCM-informedjunchen-zuo-shimodule to control the final prediction.We assume that there are four matricesMjun,Mchen,Mzuo,andMshi∈RM×Nto represent the relationship between symptoms and four types of herbs.We collect each herb’sjun-chen-zuo-shioccurrence for every symptom and herb tuple inDictionary of Traditional Chinese Medicine Prescriptionsand initializeMjun,Mchen,Mzuo,andMshiseparately after L1 normalization.
In each epoch,the one-hot form of symptoms,sonehot,will be calculated with thejun-chen-zuoshimatrix to obtain their first-stepjun-chen-zuo-shiprobability vectors as follows:
We integrate TCM-informed principles of prescription composition into the subsequent steps.As we have illustrated above,junherbs target the main cause and are most irreplaceable and indispensable;thus,the input symptom list could directly determine the probability vector ofjun:
Chenherbs are used to assistjunherbs in enhancing their effects.Thus,the choice ofchenherb will be influenced by the priority decision of thejunherb:
Zuoherbs are assisting herbs,and are used mainly to improve the effects ofjunandchenherbs and relieve or eliminate the toxicities or side effects of these herbs.To implement this principle,the model will learn to adjust the parameters ofin training according to:
There are generally only a few fixedshiherbs in datasets,and not all prescriptions containshiherbs.In our statistics,only 55.4%of the prescriptions haveshiherbs and 71.1%of them are Liquorice Roots.As most ofshiherbs are unrelated to other herbs,shiwill not have adjustments:
3.3.6 Herb recommendation
whereWsynandbsynare trainable parameters,andαis a hyperparameter used to balance the two parts of the predictions.
We use the probability of TOP@Kas the recommendation result and BCELoss (Braga-Neto,2020)as the loss function.
4 Experiments and analysis
4.1 Dataset
We use the benchmark dataset collected by Yao et al.(2018).After pre-processing there are 33 765 prescription pieces containing 390 symptoms and 811 herbs.
4.2 Evaluation
Given a symptom set,our proposed model outputs a set of herbs to relieve the symptoms.To evaluate the performance of our model,we adopt the following measures,which are commonly used in recommendation systems:
where herb-set is the ground-truth herb set in the prescription,and Top(,K) is the topKherbs with the highest prediction scores given.
4.3 Baselines
We compare the proposed MKMI-GAT with the following baselines:
KDHR (Yang et al.,2022) proposes a multilayer information fusion based on a GCN using the properties of herbs in the knowledge graph as auxiliary information to help the model better fit the feature representation of herbs.
SMGCN (Jin et al.,2020) develops a series of GCNs to simultaneously learn the symptom embedding and herb embedding from the symptomherb,symptom-symptom,and herb-herb graphs,and recommends herbs using the syndrome induction process.
HC-KGETM (Wang XY et al.,2019) integrates the TransE embeddings obtained from a TCM knowledge graph into the topic model,and adds cooccurrence information in TCM cases and comprehensive semantic relatedness of symptoms and herbs in a TCM knowledge graph.
GCNConv (Kipf and Welling,2017) introduces the degree matrix of the node to solve the problem of self-loops and the normalization of the adjacency matrix,and updates the node by adding embedding of neighbor nodes.
GATConv (Veliˇckovi´c et al.,2018) proposes a shared attention mechanism to distinguish the importance of different neighboring nodes.Every node updates its representation by attending to its neighbors.
4.4 Implementation details
We implement our model MKMI-GAT and the comparative methods using PyTorch.For the topic model HC-KGETM,we follow the parameter settings in Wang XY et al.(2019).For other neural network models,we perform fivefold cross-validation on the training set.For SMGCN and KDHR,the initial learning rate is 0.0003.For MKMI-GAT and other baseline models,the initial learning rate is 0.0001.The learning rate is attenuated every seven epochs.The dropout ratio of MKMI-GAT is 0.5.The number of epochs is set to 200.An early stopping strategy is applied,which means that the model will stop training when the loss of the validation set does not decrease within the seven consecutive epochs.We use the Adam optimizer (Kingma and Ba,2015) to train MKMI-GAT with a batch size of 256.The batch size is 512 for other methods.For MKMI-GAT,the dimensions of the embedded layer and the hidden layer are both 256.αis initialized with 0.5.GATConv,GCNConv,HC-KGETM,and SMGCN do not use any information from the knowledge graph.KDHR uses herb embeddings from TCMKG based on TransE.
4.5 Results and analysis
The results of MKMI-GAT and the baseline models are shown in Table 1.
Table 1 Performance evaluation_of_different_models
MKMI-GAT achieves satisfactory performance.It outperforms KDHR,whose performance ranks first among the baseline approaches,by 18.0%,19.0%,and 18.6% in terms of F1@5,F1@10,and F1@20 respectively,showing that by integrating the path interaction and co-occurrence relationship graphs,the multi-syndrome prediction layer,and the TCM-informedjun-chen-zuo-shimodule,our model can learn a better representation of herbs and symptoms and output a more accurate prediction of herbs.
Among GCNConv,GATConv,HC-KGETM,SMGCN,and KDHR,the GATConv and HCKGETM models have the worst performances.For GATConv,the reason might be that the network,fused only with the attention mechanism,might lose too much global information.The problem with HC-KGETM (Wang XY et al.,2019) is that it ignores symptom set information and is weaker in collecting high-order information from the graph compared with GCN and GAT.With the construction of symptom-herb,symptom-symptom,and herb-herb graphs,SMGCN (Jin et al.,2020)learns better symptom embedding and herb embedding,and performs better than HC-KGETM.KDHR (Yang et al.,2022) uses a multi-layer information fusion mechanism to integrate the different levels of information obtained from each layer of GCN,adds external herb information from an herb knowledge graph,and outperforms the baseline methods,which shows the importance of external knowledge.
4.5.1 Ablation study
To better understand and investigate the individual components of our proposed MKMI-GAT model,we record the results of ablation analysis(Table 2) and have the following findings: (1) From a whole perspective,all the knowledge integrated into our proposed model,including the graph embedding module,multi-syndrome prediction layer,and TCM-informedjun-chen-zuo-shimodule,is verified to be effective based on the better performance of the MKMI-GAT model.(2)Without the TCM-informedjun-chen-zuo-shimodule,the accuracy and recall both drop slightly,indicating that the module improves the model and finds one or two correct herbs based on principles of prescription composition.(3)Both the path interaction and co-occurrence relationship graphs are necessary for the representation learning of herbs and symptoms.(4) The most effective part is the multi-syndrome prediction layer.Without this layer,there is a significant drop in precision.The reason for this is that there is often more than one symptom in the input prescription,and there may be primary and secondary relationships between them,as well as concomitant and other relationships.Through the multi-head self-attention mechanism in the multi-syndrome prediction layer,the model can learn multiple symptom-focused patterns to decide which symptom should be prioritized and to predict the output list of herbs.When we go further and remove the whole syndrome layer,the performance of Recall@20 drops severely,which indicates that for those prescriptions with a small number of symptoms,the generalization ability of the model becomes weaker when it lacks the attention mechanism.
Table 2 Ablation study results for MKMI-GAT
4.5.2 Impact of the syndrome number
We investigate the impact of the number of heads in the multi-syndrome prediction layer when using self-attention.The number of heads corresponds to the number of syndromes we simulate and induct in the prediction layer.When there is only one head,or a single syndrome,the results are recorded in Table 2.Considering that the dimension of embedding must be divisible by the number of heads,we test two cases of two and four symptoms.As shown in Fig.5,the model performs better when the syndrome number is 2.This may be because when the number of pre-defined syndromes is too large,the complexity of the model increases and the prediction accuracy decreases in the case of single symptom input.
4.5.3 Study of the TCM-informedjun-chen-zuo-shimodule
We randomly select an example to visualize the four symptom-herb mapping matrices in the TCM-informedjun-chen-zuo-shimodule after training,and the results are shown in Fig.6.In this example,five symptoms on the vertical axis from top to bottom are asthma,opisthotonus,unconsciousness,fever,and epilepsy.The herbs in the horizontal axis are arranged from left to right according to the probability values in the final output matrix.This figure allows us to investigate the contribution of thejun-chen-zuo-shimodule to each symptom in the top 20 herbs recommended in the final output.
Fig.6 A visualization example for symptom-herb mapping matrices in the jun-chen-zuo-shi module: (a) jun;(b) chen;(c) zuo;(d) shi.The five symptoms on the vertical axis from top to bottom are asthma,opisthotonus,unconsciousness,fever,and epilepsy.The herbs in the horizontal axis are arranged from left to right according to the probability values in the final output matrix
The first correct herb with the highest prediction probability is cinnabar,which is a commonjunherb in TCM for epilepsy,as can be seen in the dark grid in the lower left corner of Fig.6a.The same effect of thejunprediction matrix is reflected in the second predicted herb,musk,because musk is a famous herb to open the orifices and induce resuscitation,which is specific for unconsciousness.The third and fourth correctly predicted herbs are the sixth and eighth herbs in the graph respectively,and it can be seen that it is the matrix ofchenplaying its role.The herb with the fifth highest prediction is tall gastrodia tuber.Although this prediction is wrong,this herb is described inPharmacopoeia of the People’s Republic of Chinaas a suitable herb for treating pediatric convulsions,epileptic convulsions,hand and foot disorders,and numbness of the limbs.This also shows that this symptom-herb prediction module is of high value in practical scenarios to assist doctors in prescribing.Theshigraph in Fig.6d illustrates that the three other types of herbs are given much higher priority than ashiherb in prediction,which correctly reflects the status and priority of the four herbs in the principles of prescription composition.
Based on the above analysis,we believe that TCM-informed observations and principles are effective in allowing the model to add the constraint ofjun-chen-zuo-shiknowledge,thus boosting the performance of herb recommendation.
5 Conclusions
In this study,we propose a knowledge-guided and TCM-informed approach based on GATs for herb recommendation.We construct a large-scale TCM knowledge graph,TCMKG,which covers most of the common knowledge in the TCM field.The TCMKG path interaction graph and co-occurrence relationship graph based on the treatment relationship from prescriptions are trained by GATs and output the discriminative representation of herbs and symptoms.After the fusion of these two representations,the multi-syndrome prediction layer will generate a possible representation of syndromes through multi-head attention.The model also introduces the TCM-informedjun-chen-zuo-shimodule,which provides knowledge about the principles of prescription composition.Experiments on a TCM dataset confirm the effectiveness of our method,and the ablation study affirms the significance of integration with each external knowledge type.In future work,we shall focus more on representation learning of TCMKG,because the current all-shortest-paths method does not make good use of multi-hop information among herbs,pieces,and prescriptions.We shall also try to modify the TCM-informedjun-chen-zuo-shimodule to allow thejun-chen-zuo-shiprinciple to be added to the model holistically.
Contributors
Zhe JIN and Yunhe PAN designed the research.Zhe JIN and Jiaxu MIAO processed the data and drafted the paper.Yin ZHANG helped organize the paper.Yi YANG,Yueting ZHUANG,and Yin ZHANG revised and finalized the paper.
Compliance with ethics guidelines
Yi YANG,Yueting ZHUANG,and Yunhe PAN are editorial board members forFrontiers of Information Technology&Electronic Engineering,and they were not involved with the peer review process of this paper.All the authors declare that they have no conflict of interest.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Appendix: TCMKG
We list the main data sources of TCMKG entities as follows:
Herb:Chinese Materia Medica(《中华本草》)andDictionary of Traditional Chinese Medicine(《中药大辞典》).
Prescription:Dictionary of Traditional Chinese Medicine Prescriptions(《中医方剂大辞典》).
Disease:Clinic Terminology of Traditional Chinese Medical Diagnosis and Treatment—Diseases(《中医临床诊疗术语——疾病部分》).
Syndrome:Clinic Terminology of Traditional Chinese Medical Diagnosis and Treatment—Syndromes(《中医临床诊疗术语——证候部分》).
Symptom:Traditional Chinese Medical Symptom Differential and Diagnosis(《中医症状鉴别诊断学》).
Entities like property,flavor,tropism,function,and pieces are extracted from the description context of the main entity.
猜你喜欢
杂志排行
Frontiers of Information Technology & Electronic Engineering的其它文章
- Towards robust neural networks via a global and monotonically decreasing robustness training strategy∗
- Federated mutual learning: a collaborative machine learning method for heterogeneous data,models,and objectives∗
- Robust cross-modal retrieval with alignment refurbishment∗
- Attention-based efficient robot grasp detection network∗
- RFPose-OT:RF-based 3D human pose estimation via optimal transport theory∗
- Synchronization transition of a modular neural network containing subnetworks of different scales*#