Automated Multi-Document Biomedical Text Summarization Using Deep Learning Model

2022-08-23AhmedAlmasoudSiwarBenHajHassineFahdAlWesabiMohamedNourAnwerMustafaHilalMesferAlDuhayyimManarAhmedHamzaandAbdelwahedMotwakel

Computers Materials&Continua 2022年6期

Ahmed S.Almasoud,Siwar Ben Haj Hassine,Fahd N.Al-Wesabi,3,Mohamed K.Nour,Anwer Mustafa Hilal,Mesfer Al Duhayyim,Manar Ahmed Hamza,and Abdelwahed Motwakel

1Department of Information Systems,College of Computer and Information Sciences,Prince Sultan University,Saudi Arabia

2Department ofComputer Science,College of Science and Arts at Mahayil,King Khalid University,Saudi Arabia

3Faculty of Computer and IT,Sana’a University,Sana’a,Yemen

4Department of Computer Science,College of Computing and Information System,Umm Al-Qura University,Saudi Arabia

5Department of Computer and Self Development,Preparatory Year Deanship,Prince Sattam bin Abdulaziz University,AlKharj,Saudi Arabia

6Department of Natural and Applied Sciences,College of Community-Aflaj,Prince Sattam bin Abdulaziz University,Saudi Arabia

Abstract:Due to the advanced developments of the Internet and information technologies, a massive quantity of electronic data in the biomedical sector has been exponentially increased.To handle the huge amount of biomedical data,automated multi-document biomedical text summarization becomes an effective and robust approach of accessing the increased amount of technical and medical literature in the biomedical sector through the summarization of multiple source documents by retaining the significantly informative data.So, multi-document biomedical text summarization acts as a vital role to alleviate the issue of accessing precise and updated information.This paper presents a Deep Learning based Attention Long Short Term Memory (DLALSTM) Model for Multi-document Biomedical Text Summarization.The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing.Then,the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents.In order to tune the summarization performance of the DL-ALSTM model,chaotic glowworm swarm optimization (CGSO) algorithm is employed.Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset.Comprehensive comparative result analysis is carried out to showcase the efficiency of the proposed DL-ALSTM model with the recently presented models.

Keywords: Biomedical; text summarization; healthcare; deep learning; lstm;parameter tuning

1 Introduction

Automatic text processing tool plays a vital role in efficient knowledge acquisition in massive source of textual data in the field of health care and life science,namely,clinical guidelines/electronic health records and scientific publications[1].Automatic text summarization has been subregion of text mining and Natural Language Processing (NLP) which intends for producing a condensed form of more than one input documents by extracting the more important contents[2,3].Text summarization tool could assist clinicians and researchers’ resources saving and time by manually presenting and identifying the key concepts within long document,with no need to read the entire text[4].Initially,text summarization is based on frequency features to recognize the most relevant contents of textual documents.After,several summarization tools have integrated a broad range of heuristics and features into the procedure of content selection.The most commonly utilized feature includes the lengths of sentences,the position of sentences,keywords extracted from the texts,the existence of cue phrases,the title words, the centroid-based cohesion, the co-occurrence feature, the existence of arithmetical contents,etc[5].

In order to resolve this limitation,other strands of study investigated the evolution of technology which utilizes source of knowledge domain for mapping the texts into concept-based representations[6].It allows measuring the useful content of the text regarding the semantics and context behindhand the sentence,instead of shallow features.But there are few difficulties in utilizing biomedical knowledge sources in text analyses, especially in summarization [7].Maintaining, utilizing, and Building knowledge basis could be challenging.A massive amount of automatic annotation is required to widely determine the entities and concepts and to capture the relationships among them.The selection of relevant sources of knowledge domain is challenging which might seriously affect the performances of bio-medical summarization[8].Another problem is how to measure the useful content of sentences relying on qualitative relationships among methods.Deep neural network (DNN) based language methods[9]could be used to tackle most of the problems related to knowledge domain in context aware bio-medical summarization.In deep language (DL) algorithm is pre-trained on massive quantities of text information and learns how to characterize 4 units of text, mostly words, in a vector space[10].The pre-trained embedding could be finetuned on down-stream tasks or straightly utilized as an arithmetical feature.

[11]proposed a deep-reinforced,abstractive summarization method which can able to read biomedical publication abstract and produce summary by means of a title or one sentence headline.They present a new reinforcement learning (RL) reward metrics based biomedical expert systems,namely MeSH and UMLS Metathesaurus also shows that this method can able to produce abstractive,domain-aware summaries.[12]presented a new text summarization method for documents with Deep Learning Modifier Neural Network(DLMNN)classification.It produces an enlightening summary of the document-based entropy values.The introduced DLMNN architecture contains 6 stages.Initially,the input document is preprocessed.Next,the feature can be extracted with preprocessed information.Then, the most relevant feature can be elected by the improved fruit fly optimization algorithm(IFFOA).The entropy values for each selected feature are calculated.In [13], the major focuses on application of ML methods in 2 distinct sub-regions that are associated with medical industry.The initial applications are Sentiment Analysis (SA) of user narrated drug reviews and the next is engineering in food technology.Since ML and AI methods enforce the limitations of scientific drug detection, ML methods are chosen as another technique for 2 main factors.Initially, ML method includes distinct learning approaches and also its feasibility for numerous NLP operations.Next,its inherent capacity to model various features that capture the features of sentiment in text.

[14] trying to address this limitation by suggesting a novel method with topic modelling, unsupervised neural networks,and documents clustering for building effective document representations.Initially, a novel document clustering method with the Extreme learning machine (ELM) method is implemented on massive text collection.Next,topic modelling is employed in the document collection for identifying the topic existing in all the clusters.Then, all the documents are characterized in a concept space using a matrix in which column represents the cluster topics and row represents the document sentence.The created matrix can be trained by numerous ensemble learning algorithms and unsupervised neural networks for building abstract representations of the document in the topic space.[15]designed a new biomedical text summarization method which integrates 2 commonly used data mining methods: Frequent itemset and clustering mining.Biomedical paper can be stated as a group of biomedical topics with the UMLS metathesaurus.The K-means method is utilized for clustering analogous sentences.Subsequently, the Apriori method is employed for discovering the frequent itemsets amongst the clustered sentences.Lastly,the relevant sentence from all the clusters is elected for building the summary with the detected frequent itemset.

[16] integrated frequent itemsets and sentence clustering mining for building an individual biomedical text summarization model.A bio-medical document is denoted as a set of UMLS topics.The generic concept is rejected.The vector space method is applied for representing the sentence.The K-means clustering method is employed for semantically clustering analogous sentences.The frequent itemset is extracted between the global clusters.The detected frequent itemset is applied for calculating the score of sentence.The topmost N high scoring sentences are elected for representing the last summary.[17]resolve this problem in terms of biomedical text summarization.They measure the efficiency of a graph-based summarizer with distinct kinds of contextualized and context-free embeddings.The word representation is generated by pretraining neural language methods on massive amount of bio-medical texts.The summarizer modes the input texts as graphs where the strength of relationships among the sentences are evaluated by the domain specific vector representation.

1.1 Paper Objective

The objective of this study is to design novel deep learning based Multi-document Biomedical Text Summarization model with hyperparameter tuning process.

1.2 Paper Contributions

This paper presents a Deep Learning based Attention Long Short Term Memory(DL-ALSTM)Model for Multi-document Biomedical Text Summarization.The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing.Then,the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents.In order to tune the summarization performance of the DLALSTM model, chaotic glowworm swarm optimization (CGSO) algorithm is employed.Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset.

1.3 Paper Organization

The remaining sections of the paper are arranged as follows.Section 2 offers the proposed DLALSTM model and Section 3 discusses the performance validation.Finally, Section 4 draws the conclusion of the study.

2 The Proposed Biomedical Text Summarization Technique

In this study,an effective DL-ALSTM model has been presented for Multi-document Biomedical Text Summarization.The proposed DL-ALSTM model performs pre-processing,summarization,and hyperparameter optimization.The detailed working of these processes is offered in the following sections.

2.1 Pre-processing

The summarization procedure begins with running a pre-processed step.Initial,individual’s part of an input document which is discarded to inclusion under the overview have been removed, and the essential text was taken.The redundant parts contain the title,abstract,keywords,author’s data,headers of sections and subsections,figures and tables,and bibliography section[18].It can be assumed that parts are unnecessary as it doesn’t perform under the technique summary which is utilized for evaluating the amount of outline.The removal phase is customized dependent upon the framework of an input text and user’s preference.When it can be chosen for including the title of section and subsection under the summary, further data has been saved together with all the sentences for specifying the section of text which sentence goes to.

As the input of feature extraction scripts of BERT are text files where all sentences perform in distinct lines,and all sentences are tokenized,the pre-processed step remains with splitting an input text as to distinct sentences and tokenization step.Utilizing the Natural Language ToolKit(NLTK),the summarizers split the essential text into groups of sentences,and signify all the sentences as groups of tokens.Afterward these pre-processed functions, an input sentence is ready that mapped as to contextualized vector representation.

2.2 Biomedical Text Summarization

The preprocessed data is fed into the DL-ALSTM model to summarize the multi-document biomedical text.The LSTM cell is comprised of 5 essential components: Input gatei, forget gatef, output gateo, recurring cell statec, and hidden state outputh.During this difference of LSTM cell,Apaszke utilized a varying manner to computeht.During the concrete,at all the time stepst,an internal memory cellct∈Rnhas been added to computeht.Mostly,at all the time stepst,the LSTM cell utilizes the earlier hidden stateht-1, an input statextfor producing the temp internal memory statec-int, afterward utilizing the internal memory statect-1and temp internal memory statec-intfor producing the internal memory statect.With utilizing the more of gradient under this LSTM cell for minimizing the gradient explosion(GE).During the presented technique has been utilizing in the LSTM cell as fundamental units to combine encoding as well as decoding elements[19].

Based on the subsequent explains the fundamental calculations in LSTM cell:

1.Gates

2.Input transform

3.State Update

The trained stage purposes for learning the parametersWx*,Wh*forxandhcorrespondingly;σ(·)refers the generallysigmoidfunction;tanh(·)represents the hyperbolic tangent function;⊙implies the multiplication operators,andbsignifies the bias.

In this case,it can utilize stacked LSTM layer on the vertical way in which input of present LSTM layers utilizes resultant of preceding layer.Fig.1 illustrates the structure of LSTM model.

Figure 1:LSTM structure

Noticeably, this technique is exchanging classical RNN by LSTM unit.Besides the initial layer,this technique carries out passing the hidden state of preceding layerto input of present layer,in whichlrefers the layer.So,the activation of layerlhas demonstrated as:

This method has been simulated in Google neural machine translation method.It has 3 modules:Encoded, decoded, and attention networks.During the PCA-LSTM technique, the encoded utilizes stacked LSTM layer that contains 1bi-directional LSTM layer and 3uni-directional LSTM layers.In the bi-directional LSTM encoded,the data needed to paraphrase particular words from the resultant side is act somewhere on source side.Frequently the source side data was around left-to-right,same as the target side,however,based on the language pair the data to a specific resultant word is distributed and also to be separate in particular area of input side.Next,the last hidden statehito all input unitsxihas been the concatenation of forward as well as backward hidden states.The decoded that present technique utilizes only a typical LSTM which is utilized to create paraphrase ordery=y0,...,yLwith calculating order of hidden state(...,in which the context of present created paraphrase unit is encoder fromsL-1.Perfectly,typical attention process has been demonstrated to compute the relevance scoreαtito all hidden statehithat are utilized to compute the context vectorctas:

The value of hidden state significance scoreαtirepresents the most significant source units for focusing on and is calculated as:

whereetiis named as alignment technique and computed as utilizing NNfas follow:

wherefgenerally utilizestanhfunction with 2 input parametersst-1,hi,in case,it can be also regarded that default value ofβis 1.When it is assumed that word equivalent to the minimal value oftanhfunctions to have no role from creating paraphrase then nearly words containing a play in the typical attention method.Actually,any words are no play from paraphrasing of PG issue(it can be distinct in neural machine translations).For addressing this problem,it can be adding a novel parameter to the technique.It is obvious that the play ofβontanhfunctions.It is calledβhas penalty coefficient(PC)on alignment technique.An objective ofβhas for suppressing the play of words(equivalent to value oftanhfunction is -1) from the source order to the present word under the paraphrase order.This directly modifies the value of attention weight so it can be called this technique was Penalty Coefficient Attention (PCA).So, with utilizing the earlier hidden statest-1, the very relevant source contextsct,and earlier created textual unitsyt-1for computing the hidden statestof decoded:

wheregrepresents the GRU units.During this encoder-decoder structure dependent upon PCALSTM approach,the calculating of created paraphrase ordery=y0,...,yLis also dependent upon conditional distributionsPas:

2.3 Design of CGSO Based Hyperparameter Optimization

In order to effectually adjust the hyperparameters involved in the DL-ALSTM model,the CGSO algorithm is utilized [20].In the fundamental GSO, thenfirefly (FF) individuals are arbitrarily distributed fromD-dimension search space, and all fireflies (FFs) transmits luciferaselu.The FF individuals produced a specific count of fluorescence, it can be interrelated nearby individual FF,and its individual decision making area0≤rs).The luciferase size of individual FFs are compared with objective function of their individual place,the superior fluorescein,an optimum the brighter FF signify in their location, an optimum target, and conversely target is worse.The size of decision-making area radius is moved by the amount of individuals under the neighborhood,the lesser density of neighborhood FFs,FF’s decision area radius is improved for finding further neighbors;on the other hand, the decision area radius of FFs are shrinking.At last, one of the FFs collects from several places.The primary FFs,all FFs separate transmits the similar luciferase concentrationsl0and perception radiusr0.

Fluorescein update:

whereJ(xu(z))refers the main function value that is equivalent to all FFsufrom the placexu(z)oftth iteration;lu(z)defines the present value of FF luciferase;γindicates the fluorescein upgrade rate.

Probability selection:vth individual probabilitiespuv(z)has been chosen near and within the regions setNu(z).

In particular,the region set isNu(z)= {v:duv(z) ＜implies the radius of FFs individual perception.Fig.2 demonstrates the flowchart of GSO technique.

Figure 2:Flowchart of GSO

Location update:

wheresrepresents the moving step.

The dynamic decision area radius upgrade:

The GSO technique contains a primary distribution of FFs, luciferase upgrade, FF progress,and decision-making area upgrade.To improve the performance of the GSO algorithm, the CGSO algorithm is derived by integrating the concepts of chaotic theory.

The chaos method is a subdivision of mathematics that performs on nonlinear dynamic process.Nonlinear denotes that it can be inconceivable for predicting the system’s response with respect to the input,and dynamic mean alters from the system in one state to another over time.The chaos purposes signify the dynamical system using deterministic formula.However, based on the initial condition,chaotic function is divergent feature performances and generated wildly unpredictable.Thus,the chaos functions are improving the diversification and intensification of enhanced methods i.e., avert local optimum solution and alter neighboring global optimum.This purpose follows easier principle and has some interconnecting portions;but,in all iterations,the created value was depending on the primary condition and earlier values.

In this case,it is performed 3 different chaotic maps such as iterative mapping,tent mapping,and logistic mapping with power exponents(p)and sensory modality(c)computation from the BOA.The chaos purpose has been determined to exhibit high efficacy associated with another chaos purpose.

Logistic map:

Now,xtmeans the value from iterationt, as well asrrepresents the rate of growth i.e., proceed value in[3.0–4.0].

Iterative map:

During the iterative map,the values ofPare chosen among zero and one,as well the resultxthas been chaotic parameter that takes values in[0–1].

Tent map:

The tent map has 1D map which is same as logistic map.Now, the resultxthas been chaotic variable which takes values in[0–1].

3 Experimental Validation

The proposed DL-ALSTM model has been validated using PubMed dataset[21],which comprises the instances in json format.The abstract, sections, and body are all sentence tokenized.The json objects includes several parameters such as:article_id,abstract_text,article_text,section_names,and sections.

The accuracy graph of the LSTM model is depicted in Fig.3.The figure reported that the LSTM model has attained increased training and validation accuracies with an increase in epoch count.At the same time,it is noticed that the LSTM model has resulted in higher validation accuracy compared to training accuracy.

Fig.4 reports the loss graph analysis of the LSTM model on the test dataset applied.The figure showcased that the LSTM model has accomplished reduced loss with a rise in epoch count.It is observed that the LSTM model has resulted to reduced validation loss compared to training loss.

Figure 3:Accuracy analysis of LSTM model

The accuracy graph of the DL-ALSTM manner is demonstrated in Fig.5.The figure stated that the DL-ALSTM technique has attained improved training and validation accuracies with a higher epoch count.Besides,it can be clear that the DL-ALSTM approach has resulted in superior validation accuracy compared to training accuracy.

Fig.6 illustrates the loss graph analysis of the DL-ALSTM system on the test dataset applied.The figure outperformed that the DL-ALSTM technique has accomplished lesser loss with an increase in epoch count.It is demonstrated that the DL-ALSTM approach has resulted in decreased validation loss related to training loss.

Fig.7 showcases the Rouge-1 analysis of the LSTM model on the applied dataset.The figure shows that that the LSTM model has attained moderately considerable ROUGE-1 values.For instance,the LSTM model has attained Rouge-1 of 0.7471 under the execution run-1.In addition,the LSTM Model has resulted in a Rouge-1 of 0.7354 under the execution run-2.Similarly,the LSTM technique has resulted in a Rouge-1 of 0.7470 under the execution run-4.Along with that,the LSTM manner has resulted in a Rouge-1 of 0.7528 under the execution run-8.Finally,the LSTM algorithm has resulted in a Rouge-1 of 0.7304 under the execution run-10.

Figure 4:Loss analysis of LSTM model

Figure 5:Accuracy analysis of DL-ALSTM model

Figure 6:Loss analysis of DL-ALSTM model

Figure 7:Result analysis of LSTM model in terms of Rouge-1

Fig.8 depicts the Rouge-2 analysis of the LSTM manner on the applied dataset.The figure displayed that the LSTM manner has reached moderately considerable ROUGE-2 values.For instance,the LSTM technique has gained Rouge-2 of 0.3449 under the execution run-1.Similarly,the LSTM Model has resulted in a Rouge-2 of 0.3223 under the execution run-2.Likewise,the LSTM approach has resulted in a Rouge-2 of 0.3216 under the execution run-4.In addition,the LSTM methodology has led to a Rouge-2 of 0.3307 under the execution run-8.At last,the LSTM Model has resulted in a Rouge-2 of 0.3562 under the execution run-10.

Figure 8:Result analysis of LSTM model in terms of Rouge-2

Tab.1 illustrates the result analysis of LSTM and DL-ALSTM models with different runs.

Table 1: Result analysis of proposed model with varying runs

Fig.9 portrays the Rouge-1 analysis of the DL-ALSTM manner on the applied dataset.The figure exhibited that the DL-ALSTM approach has reached moderately considerable ROUGE-1 values.For instance,the DL-ALSTM manner has reached Rouge-1 of 0.7692 under the execution run-1.Followed by,the DL-ALSTM approach has resulted in a Rouge-1 of 0.7369 under the execution run-2.At the same time,the DL-ALSTM technique has led to a Rouge-1 of 0.7624 under the execution run-4.In line with,the DL-ALSTM algorithm has resulted in a Rouge-1 of 0.7886 under the execution run-8.Eventually, the DL-ALSTM methodology has resulted in a Rouge-1 of 0.7640 under the execution run-10.

Figure 9:Result analysis of DL-ALSTM model in terms of Rouge-1

Fig.10 displays the Rouge-2 analysis of the DL-ALSTM method on the applied dataset.The figure outperformed that the DL-ALSTM model has attained moderately considerable ROUGE-2 values.For instance,the DL-ALSTM model has attained Rouge-2 of 0.3561 under the execution run-1.Besides,the DL-ALSTM manner has resulted in a Rouge-2 of 0.3231 under the execution run-2.In the meantime,the DL-ALSTM algorithm has resulted in a Rouge-2 of 0.3533 under the execution run-4.Simultaneously,the DL-ALSTM technique has resulted in a Rouge-2 of 0.3357 under the execution run-8.Finally, the DL-ALSTM approach has resulted in a Rouge-2 of 0.3712 under the execution run-10.

Figure 10:Result analysis of DL-ALSTM model in terms of Rouge-2

A brief comparative results analysis of the DL-ALSTM model with recent approaches takes place in Tab.2.Fig.11 investigates the Rouge-1 analysis of the DL-ALSTM model with existing techniques.The figure shows that the Bayesian BS,BERT-Base and SUMMA approaches have obtained reduced Rouge-1 values of 0.7288, 0.7257, and 0.7098.At the same time, the BioBERT-pubmed, CIBS, and BioBERT-pmc techniques have obtained slightly increased Rouge-1 values of 0.7376, 0.7345, and 0.7309 respectively.Moreover, the LSTM and BERT-Large techniques have resulted in reasonable Rouge-1 values of 0.7528 and 0.7504 respectively.However,the proposed DL-ALSTM technique has accomplished superior performance with the maximum Rouge-1 of 0.7886.

Table 2: Comparative analysis of DL-ALSTM model with existing approaches

Figure 11:Comparative analysis of DL-ALSTM model interms of Rouge-1

Fig.12 examines the Rouge-2 analysis of the DL-ALSTM manner with existing algorithms.The figure outperformed that the Bayesian BS, BERT-Base and SUMMA manners have reached minimal Rouge-2 values of 0.3143, 0.3110, and 0.3022.Likewise, the BioBERT-pubmed, CIBS, and BioBERT-pmc algorithms have gained slightly enhanced Rouge-2 values of 0.3203, 0.3187, and 0.3164 correspondingly.Moreover,the LSTM and BERT-Large techniques have resulted in reasonable Rouge-2 values of 0.3562 and 0.3312 correspondingly.But,the presented DL-ALSTM technique has accomplished higher efficiency with the maximal Rouge-2 of 0.3712.

Figure 12:Comparative analysis of DL-ALSTM model interms of Rouge-2

By looking into the above mentioned tables and figures, it is apparent that the DL-ALSTM technique is found to be an effective tool for biomedical text summarization process.

4 Conclusion

In this study,an effective DL-ALSTM model has been presented for Multi-document Biomedical Text Summarization.The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing.Then,the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents.In order to tune the summarization performance of the DL-ALSTM model, CGSO algorithm is employed.Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset.Comprehensive comparative result analysis is carried out to showcase the efficiency of the proposed DL-ALSTM model with the recently presented models.In future, the performance of the DL-ALSTM model can be improvised by the use of advanced hybrid metaheuristic optimization techniques.

Acknowledgement:The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges(APC)of this publication.

Funding Statement:This work is funded by Deanship of Scientific Research at King Khalid University under Grant Number(RGP 1/279/42).www.kku.edu.sa.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua

2022年6期