RotatS:temporal knowledge graph completion based on rotation and scaling in 3D space①

2023-12-15YUYongCHENShudongTONGDaQIDonglinPENGFeiZHAOHua

High Technology Letters 2023年4期

YU Yong (余泳),CHEN Shudong②,TONG Da,QI Donglin,PENG Fei,ZHAO Hua

(∗Institute of Microelectronics, Chinese Academy of Sciences,Beijing 100191,P.R.China)

(∗∗University of Chinese Academy of Sciences,Beijing 100191,P.R.China)

(∗∗∗Beijing Institute of Tracking and Telecommunications Technology,Beijing 100191,P.R.China)

Abstract

Key words:knowledge graph(KG),temporal knowledge graph(TKG),knowledge graph completion(KGC),rotation and scaling (RotatS)

0 Introduction

With the increasing application of knowledge graphs(KGs),a large number of knowledge graphs have been constructed.KG consists of entities,relationships,attributes,and attribute values.Generally speaking,KG can be represented by a set of triples.The representation of triples is (H,R,T),whereH(Head)andT(Tail) are a group of head entities and tail entities,respectively,andR( Relation) is a set of relationships between them.Existing KGs include YAGO[1],Freebase[2],Wikidata[3],and WordNet[4],etc.,which are used for intelligent search,social networks,deep question answering,etc.However,many KGs usually suffer from incompletion,and the purpose of knowledge graph completion (KGC) is to add new triples in KGs,such as obtaining a new triple (COVID-19,belongs to,Class B infectious diseases) from existing knowledge graphs related to contagious diseases.The task makes KG more complete.Therefore,KGC has drawn widespread attention.

At present,many KGC models do not take temporal information into account.However,in many KGs,the occurrence of many events is related to time.This type of knowledge graph is called temporal knowledge graph (TKG).Typically,TKGs are represented by a quadruple (H,R,T,Γ),whereHandTare a set of head entities and tail entities respectively,Ris a set of relationships between them,andΓis a set of time when the event occurs.For instance,There are many answers to the question (Obama,visit,?),if the time factor is not taken into account.But if the time factor is considered, the answer to this question is specific, as(Obama,visit,China) took place on November 15,2009,and (Obama,visit,Thailand) took place on November 18,2012.Ignoring temporal information can create significant uncertainty to KGC and negatively affect downstream tasks.

Traditional KGC models,such as TransE[5],Dist-Mult[6],RotatE[7], ComplEx[8],etc.,learn the low-dimensional embedding of entities and relations, and model the relationships between entities to perform KG completion.None of these models has a way to model temporal information,thereby ignoring the time when the fact occurs.To address this problem,several researchers have proposed related models, including TTransE[9],HyTE[10],TA-TransE[11],ATiSE[12],Te-Ro[13],TempCaps[14],TKGC[15], HTKE[16],etc.These models obtain the time factor by encoding temporal information and integrating this information with entities and relationships respectively,so that the event is timedependent.

However,the above mentioned models are merely extensions of traditional KGC methods,and lack the ability to extract the semantic information of time.As an example,the TeRo model defines the temporal evolution of entity embedding as the rotation of the entity in two-dimensional space from the initial time to the current time[13],and then uses the TransE method to obtain the relationship between the head entity and the tail entity after the rotation.Like other TKG completion methods,this model only considers the evolution of the entity itself in two-dimensional space with respect to time,and the semantic association between entities,relations and time is not better obtained.

This paper proposes a new method based on rotation and scaling (RotatS) for temporal knowledge graph completion.The model maps entities,relations,and time into three-dimensional space,and defines the evolution of head entities through a combination of time and relations to obtain tail entities.

Through the mapping of three-dimensional space,entities,relations and time can obtain richer semantic information.Furthermore,this paper defines the temporal and relational evolution of entity embedding as the rotation and scaling of head entity to the tail entity in the 3D space,and the model defines the relationship between head entity and tail entity in 3D space relative to relations and time,and obtains rich semantic association among entities,relations and time.

The advantages of the proposed model includes:

(1)It solves the problem that entity cannot be well combined with relations and time in traditional methods.A general approach to solving TKG completion is proposed in this paper.For events of either timestamps or time intervals,different rotation axes are used respectively,and the relationship between head entities and tail entities in different types of events are defined.

(2)The model considers the roles of time and relation in evolution at the same time.The rotation can help to make the direction between the head entity and the tail entity closer.Similarly,the scaling can help to make the size between the head entity and the tail entity closer.Through this method,entities,relations,and time are hopefully better associated.

(3)The performance of the proposed method is evaluated on four public datasets.Experimental results show that this paper's approach has certain advantages compared with several existing methods.

(4)The paper also conducts a comparison experiment between rotation and scaling in 3D space and 4D space.The experimental results show that the RotatS model performs better in these cases.

1 Related work

1.1 Static graph completion

Static KGC models can be divided into translationbased models,tensor decomposition-based models and neural network-based models.

1.1.1 Translation based models

KGC is performed by learning low-dimensional embedding representations of entities and relations in the knowledge graph.These models are mainly based on TransE for transformations of spatial mappings.TransE[5]defines relations as the transformation from head entity to tail entity,but can only handle one-toone relationships.For relations such as one-to-many,TransH[17]establishes a plane and projects entities and relationships onto the plane.TransR[18]uses the projection matrix of a particular relation to project entities into the relation space.sTransE[19]operates different transformations for head entity and tail entity,and projects head entity and tail entity into separate spaces.TransD[20]sparses the relational mapping matrix,therefore reducing the amount of parameters.TranSparse[21]solves the problem of unbalanced distribution of relations in triples.The relations that occur infrequently only need to train the matrix with low sparsity, and the relations that occur infrequently need to train the matrix with dense sparsity.TransM[22]focuses on the structure of the knowledge graph and constructs the model by pre-counting the weights of each relation in the training set.RotatE[7]introduces the complex domain on top of TransE,which maps entities and relations into a complex space,and defines each relation as a rotation from head entity to tail entity.

1.1.2 Tensor decomposition based models

These models decompose the score of the triples into a tensor product.RESCAL[23]expresses the relations with a full-ranking matrix,calculating the product between entity embeddings and relation matrices.This model is simple and effective,but the parameters are increased.Relations in DistMult[6]use diagonal matrices to reduce the number of parameters but cannot model the directionality of relationships.Complex[8]introduces the complex domain and considers that head entity embedding of the given entity and tail entity embedding are conjugate relationships.Among them,RESCAL and DistMult cannot obtain asymmetric relationships, for instance,triples(h,r,t)and(t,r,h)always have equal scores.

1.1.3 Neural network based models

Neural network-based models perform KGC variously: ProjE[24]model uses the feed-forward network to model the KG.ConvE[25]first splices the embedding of head entities and the embedding of relations,then performs a multi-layer convolution operation,and finally scores all candidate sets.ConvKB[26]modifies the form of the input data of ConvE,then obtains the feature layer by convolution,and finally splices and scores it.ConvR[27]performs the convolution operation by using relations directly as convolution kernels.

1.2 Temporal graph completion

Recently,some researchers have found that adding temporal information to the knowledge graph can further improve the performance of KGC.Some models are derived from TransE,such as TTransE[9],embedding time information into the score function.TA-TransE and TA-DistMult[11]obtained the temporal evolution of relations by stitching the embedding of predicates and the temporal embedding using long short term memory(LSTM).HyTE[10]treats timestamps as hyperplanes and projects entity and relation embeddings onto the hyperplane to obtain new representations.This model is the expansion of TransH with respect to time by adding temporal information to head entity,relation and tail entity respectively.TempCaps[14]proposes a capsule network-based model to construct entity embeddings by dynamically routing the retrieved temporal relations and neighbor information.TKGC[15]models global trends and local fluctuations by mapping entities and relations in TKG to approximations of multivariate Gaussian processes (MGPs).HTKE[16]proposes a new hyperplane-based temporal-aware KG embedding model that incorporates temporal information into the entity-relation space to more effectively predict missing elements in KGs.

2 Methods

In this section,the TKG completion problem is defined and the method RotatS is introduced in detail.This method integrates entities,relationships,and temporal information to solve the TKG completion problem better.

TKG completion is also called link prediction task.The goal of this task is to predict hidden quaternions using observable quaternions[28].This task can be divided into two parts: given a quaternion representation of the TKG as (h,r,∗,Γ),predicts the tail entity,and predicts the head entity given a quaternion representation of the TKG as (∗,r,t,Γ).

Quaternions have been used for static knowledge graph embeddings[29].The definition and basic mathematical properties of quaternions are presented for a better introduction to the method proposed in this paper.

2.1 Quaternions

Quaternion[30]is the result of the expansion of the imaginary part of a complex number.Usually the quaternionqis represented by (w,x,y,z),wherewis the real part;x,yandzare the imaginary parts,or it can also be expressed asq= [w,v] =w+xi+yj+zk(i,j, andkare imaginary units).Some of the rules and operations for quaternions are as follows.

Basic rules and derivatives are

The inner product of quaternions: the inner product betweenq1=w1+x1i+y1j+z1kandq2=w2+x2i+y2j+z2kis the sum of the products of the corresponding real and imaginary parts.

Quaternion Hamilton product:

Quaternion conjugates:

The conjugate ofqis

Quaternion inverse:

The inverse ofqis

Unit quaternion:

The norm of a quaternionqis

Rotation of quaternions[29]:

In 3D spatial rotation,the rotation axis is assumed to be the unit vector.

whereu=xi+yj+zkis the unit quaternion.

The vectorv(the real part is 0) is rotated by 2θaround the axis of rotationuto obtainv∗as

2.2 RotatS

When the real part of the quaternion is 0,the quaternion can be expressed asv=xi+yj+zk.Such quaternion can be represented as a 3D vectorv= (x,y,z).

Similar to the rotation of TeRo[13]in 2D space,this paper uses the rotation and scaling of the quaternion with real part 0 in 3D space which are used to complete the TKG completion task.This paper proposes a general model RotatS for TKG completion,which learns the embedding representation of entities,relations,and time in TKG and continuously optimizes the score function,so that correct quaternions receive a higher score and erroneous quaternions have a lower score.In 3D space,the model uses a rotation and scaling based approach where the head entity is rotated and scaled to obtain the tail entity using a combination of time and relations.The general form of the model is as follows.

RotSR,Γ(H) =T(11)whereHandTrepresent the embedding of head entity and tail entity in 3D space respectively,RandΓrepresent the embedding of relation and time in 3D space respectively,and RotS represents the rotation and scaling from head entityHto tail entityTin 3D space.All embeddings are represented in 3D space,that is

Through the embedding of entities,relations,and time in 3D space,vector representations of different dimensions can be obtained,thereby obtaining richer semantic information than TeRo.The evolution of head entity to tail entity is performed based on both the representation of relation and time to obtain the semantic association between entity,relation,and time.

It is possible for an event to occur at a timestamp or continuously over a time interval.Rotation axis should be varied to better model different kind of events.Subsections 2.3 and 2.4 present model details for rotating and scaling around the time axis and relational axis,respectively,to address the problem of link prediction for timestamp and time interval events,respectively.

2.3 Rotation and scaling around the time axis

The proposed model uses time as the axis of rotation when solving the timestamp problems.The score function of the model is

whereHandTare the embedding of the head entity and tail entity in 3D space respectively; ☉represents the rotation in 3D space.This paper definesQΓ=,whereuΓ=Γxi+Γyj+Γzk(Γx,Γy,Γzare the three representations of time in 3D space, and) is the time axis of the rotation ofHin 3D space;2θΓis the rotation angle associated with the time embedding.

Because the number of events occurring at a timestamp is limited,this paper uses time as the rotation axis,which can fix the timestamp for inference to narrow the target range of tail entities and thus improve the accuracy of inference.The role of the scoring function is to evolve the head entityHto get the tail entityTusingQΓ,which defines a new type of entity relationship and obtains richer semantic association between entities.The goal of this model is that through training,the evolvedHandTof the correct quadruple are getting closer and closer,i.e.,the value ofFis getting closer to 0.

However,in the above approach,the effect of relations is not considered.Relations are important and can provide a wealth of information.In static knowledge graphs,the knowledge graph can be complemented by relations alone.Therefore,the paper introduces relation to improve the performance of the model.R◁is the unitization of relational embedding,which can be represented in 3D space as

whereα,βare the angles of the relational embedding and are uniformly initialized between - π and π[29].This paper uses°to define Hadmard product between head entities and relations:H°R◁= (Hx,Hy,Hz)°(R′x,R′y,R′z)= (Hx R′x,Hy R′y,Hz R′z).With Hadmard product,this paper incorporates relational information into head entity,thus enriching semantic information.The improved score function abtained in this paper is as follows.

Using the above model,the head entity incorporating relational information is rotated in 3D space to obtain tail entity,and the direction of the entity vector changes after the rotation.In order to make the entity vector obtained after evolutions more approximate to the tail entity,this paper performs both the rotation and scaling evolution of the entity,so that the size and direction of the evolved vector are closer to the tail entity vector,and defineu′Γ=WH WR(Γxi+Γyj+Γzk)=WH WR uΓ,whereWHandWRare head entity-specific and relation-specific weights respectively.At this time,the value ofQΓis as follows.

The multiplication with relation-specificity biasWRin Ref.[29] is also used to improve performance.At this point,the score function is

Therefore,the rotation axis of the model isuΓ;the rotation angle is 2φ; and the scaling factor is

2.4 Rotation and scaling around relational axes

In subsection 2.3,this paper discusses the score function when solving the time interval problem.When solving the time interval problem,the relation is used as the rotation axis.The score function is as follows.

whereQR= cosθΓ+uRsinθΓ;uR=WH WR(Rxi+Ryj

An event occurs within a time interval.Here time is a range.It is not reasonable to take a random timestamp from the time interval.Relation is used as the axis of rotation and the head entity incorporating relational information is evolved around the relation axis to obtain the tail entity.The model has the same rotation angle and scaling as the model dealing with timestamp events,and the rotation axis is(Rxi,Ryj,Rzk).

2.5 Loss function

This paper uses the same loss function as in Te-Ro[13]to optimize the RotatS model,with self-adversarial negative sampling in the loss function:

whereσis the sigmoid function,yis the score of the correct quadruple (H,R,T,Γ),yˊis the score of the negative sample (Hˊ,Rˊ,Tˊ,Γˊ),gammais a fixed margin,pis the ratio of negative training samples to positive training samples.The overall training algorithm is shown in Algorithm 1.

Algorithm 1 Training of RotatS}; the number of epoch n;the negative sample rate p.{Input:The train set TKG = H,R,T,Γ Output:The minimum loss on the train set.1: s,r,o,t ←Xavier_uniform(),lossmin = 0 //Initialize the embedding vector in 3D space 2: For each i ∈[1,n] do 3: for(s,r,o,t) ∈TKG do 4: Constructing negative samples D-5: If t is time point then calculate score function F by((H·R◁)☉QΓ)·WR - T 6: end if If t is time period then discretize t as t2,t1,…,tm;calculate score function F by((H·R◁)☉QR)·WR - T 8: end if 9: Loss= lossD+ + lossD-10: end for 11: lossmin = min(lossmin,loss)12: Update parameters s,r,o,t with Adagrad optimizer 13: end for 14: Return lossmin 7:

3 Experiments

In experiments,this paper evaluates the performance of the model RotatS on four standard datasets and compared it with some baselines.

3.1 Datasets

Frequently used TKGs include ICEWS14,ICEWS05-15, YAGO11k and Wikidata12k.Among them, ICEWS14 and ICEWS05-15 are event-based datasets,both derived from ICEWS[31],corresponding to events in 2014 and events from 2005 to 2015,respectively,both of which record are the timestamps when each event occurs; YAGO11k and Wikidata12k are derived from YAGO[1]and Wikidata[3],respectively,and these two datasets record the the time interval in which each event occurs.

3.2 Baselines

This paper compares the RotatS model with baselines,including static KGC models and temporal KGC models.

The static KGC models compared in this paper are translation based models and tensor decomposition based models,including TransE[5],DistMult[6],ComplEX[8],RotatE[7]and QuatE[32].These models ignore temporal information when performing KGC.

The temporal KGC models compared in this paper include TTransE[9], TA-TransE[11], TA-DistMult[11],HyTE[10],DE-SimplE[33],ATiSE[12],TeRo[13],Temp-Caps[14],TKGC[15]and HTKE[16].These models incorporate temporal information to improve the accuracy of KGC.

3.3 Evaluation methodology

This paper evaluates the RotatS model by testing the task on link prediction.The goal of the task is to reason about missing values in (∗,R,T,Γ) and (H,R,∗,Γ),where ∗represents the missing element.For a test quadruple,by substituting eitherHorT,this paper generates candidate quadruplets, calculates scores for all possible quadruplets,and finally ranks these quadruplets in ascending order.

In this experiment,two common evaluation metrics are used: mean reciprocal rank (MRR) and Hits@k.MRR refers to the average reciprocal rank of all correctly predicted entities.Hits@krefers to the proportion of correctly predicted entities among the topkentities.

3.4 Experimental setup

This paper evaluates the RotatS model on TITAN Xp,and refers to the experimental setup described in TeRo[13].The model is implemented using Pytorch,and the best model is selected by early stopping on the validation set according to MRR.The experiment chooses embedding sizekas 500,batch sizebas 512,and uses Adagrad as the optimizer.Both real part and imaginary parts of the entity embedding are initialized using uniform initialization.

The time granularity parametersuandthreare assigned different values in different datasets.This paper lists the parameters in different datasets[13]:lr= 0.1,gamma= 60,thre= 10 on YAGO11k;lr= 0.3,gamma= 20,thre= 200 on Wikidata12k;lr= 0.1,gamma= 110,u= 1 on ICEWS14;lr= 0.1,gamma= 120,u= 2 on ICEWS05-15.

3.5 Complexity

As shown in Table 1,the space complexity and scoring functions between the porposed model and several baselines are compared.In scoring function[13],Proj(·) represents the temporal projection of the embedding[11]; LSTM(long short-term memory) is a neural network model[10]that deals with time series problems;→and ←represent a time-specific temporal and non-temporal parts of diachronic entity embedding,r-1 represents the inverse relationship ofr[11];DKL()represents theKLdivergence between two Gaussian distributions[33];Ps,t,Po,tandPr,trepresent the Gaussian embeddings ofs,r, andoat timet[12];standotrepresent entity embedding at a specific time.In Space Complexity[13],me,mr,mτandmtokenrepresent the number of entities,relations,timestamps,and temporal tokens,respectively;dis the dimension of the embedding.

Table 1 Comparison of space complexity

4 Results

4.1 Main results

As shown in Table 2 and Table 3,KGC experiments on the facts at a timestamp are conducted.The prediction accuracy of the performance of RotatS model is listed in table 2 and the benchmark model on the two datasets ICEWS14 and ICEWS05-15 are listed in the table.The results of HTKE,TempCaps,and TKGC are derived from Refs[16],[14] and [15],respectively; the results of other models in the table are derived from Te-Ro[13].In the experimental results,the RotatS model outperforms all baseline models.It can be seen that the effectiveness of the RotatS model in handling timestamp events.Compared with TeRo,the performance of RotatS model is improves by 2.3% and 4.5% on ICEWS14 and ICEWS05-15,respectively.

Table 2 Performance on the ICEWS14 dataset

Table 3 Performance on the ICEWS05-15 dataset

As shown in Table 4 and Table 5,KGC experiments on the facts in a certain time interval are conducted,and the results of all models on both the YAGO11k and Wikidata12k datasets are listed in the table.In the experiments,the dataset used for validation has few kinds of relations,which affects the identification of different quadruplets.To increase the variety of relations,this paper extends the relation set to a pair-dual relation set[13].In YAGO11k dataset, tail entity embedding with relation embedding is integrated,the model outperforms the TeRo model in MRR,Hit@1 and Hit@3; on the Wikidata12k dataset, the performance of RotatS model also improves over the TERO model on Hit@3 and Hit@10.

Table 4 Performance on the YAGO11K dataset

Table 5 Performance on the Wikidata12k dataset

In terms of efficiency,RotatS has the same space complexity as TeRo,TTransE and HyTE.Regarding the consumption of video memory,this paper conducts comparative experiments on the ICEWS14 dataset for several recent models.RotatS with an embedding dimensionality of 500 has a storage size of 1886MiB,whereas TeRo and ATiSE with the same embedding dimensionlity have storage sizes of 1026MiB and 1350MiB.Meanwhile,on TITAN Xp device,RotatS with an embedding dimensionality of 500 is trained on ICEWS14, ICEWS05-15, YAGO11k and Wikidata12k,and each epoch takes 9.1, 49.7 s, 4.2 s and 8.1 s,respectively,however,ATiSE with the same embedding dimensionlity takes 7.8 s, 43.0 s, 29.1 s,and 33.7 s on the four datasets,respectively.

4.2 Parameter analysis and model analysis

In this work,the effects of scaling and embedding size on the RotatS model are analyzed.

Fig.1 shows the link prediction performance on ICEWS14 for different embedding dimensions.As the embedding dimension increases,the prediction performance gradually improves.When the embedding dimension is set to 500,the performance is better and the embedding dimension is not high.A lower embedding dimension can reduce storage space and improve training speed; at the same time,it can also enable the model to be applied on large-scale knowledge graphs.

To analyze the effect of scaling in the RotatS model,this paper compares the link prediction performance of RotatS and RotatS without scaling on ICEWS14.

Fig.1 Effect of dimensionality of embeddings

Fig.2 shows that scaling can significantly improve the link prediction performance of RotatS.Because scaling can make the size of the head entity vector and the tail entity vector more similar,enhancing the training effect.

4.3 Comparative study

To further test the performance of the proposed TKG completion model in different dimensional spaces,this paper also conducts a series of experiments on rotation and scaling in four dimensional space,and compares this method with the rotation method RotatS in 3D spaces.The model in 4D space is as follows.

Fig.2 Effect of scaling

whereHandTrepresent the embedding of head entity and tail entity in the 4D space,respectively;Rrepresents the embedding of relation in 4D space;R◁andΓ◁are the unitization of relation embedding and temporal embedding,respectively.The head entityHcan be represented as (H1,H2,H3,H4) in 4D space; the tail entityTcan be represented as (T1,T2,T3,T4) in 4D space; the relationRcan be represented as (R1,R2,R3,R4) in the 4D space; and timeΓcan be represented as(Γ1,Γ2,Γ3,Γ4) in 4D space.

The ⊗denotes the Hamilton product.4D rotation can be modeled by the Hamilton product[32].WhenR◁is used,modulus length ofR◁is 1,and only rotation is performed in 4D space and no scaling is done;whenRis used,the modulus length ofRis not 1, and not only rotation but also scaling is performed in 4D space.In the paper,the time information is incorporated into the head entity and the tail entity by unitizing the Hadmard product between the temporal embeddingΓand the entity embedding.Then the entity quaternions after incorporating the time information are rotated.There are scaled rotations and unscaled rotations.The following are the experimental results of the two methods.

Experiments are performed on two datasets,YAGO11k and ICEWS14,respectively,using the rotation model in 4D space.Under the same experimental conditions as the RotatS model,this paper obtained the results in Table 6.By comparing the experimental results,it can be seen that the rotation model RotatS of quaternion in three-dimensional space performs better in all test cases than that in the 4D space.

5 Conclusion

This paper has presented a new TKG completion model named RotatS,which can integrate the semantic relationship between entities,relations,and time.In the experiments,the rotation and scaling of the head entity to the tail entity is modeled.Compared with previous approaches,the RotatS model has a significant improvement over the baseline model in solving the TKG completion task in sense of prediction accuracy including MRR,Hit@1,Hit@3 and Hit@10,under similar experimental conditions and space complexity.In the near future,the following research directions will continue to be explored: improving the model by combining different rotation and stretching methods.

Table 6 Rotation in 4D

High Technology Letters

2023年4期