COVID-19 Public Sentiment Insights:A Text Mining Approach to the Gulf Countries
2021-12-16SalehAlbahliAhmadAlgshamShamsulhaqAerajMuathAlsaeedMuathAlrashedHafizTayyabRaufMuhammadArifandMazinAbedMohammed
Saleh Albahli,Ahmad Algsham,Shamsulhaq Aeraj,Muath Alsaeed,Muath Alrashed,Hafiz Tayyab Rauf,Muhammad Arif and Mazin Abed Mohammed
1Department of Information Technology,College of Computer,Qassim University,Buraydah,Saudi Arabia
2Department of Computer Science,Univesity of Gurjat,Pakistan
3School of Computer Science,Guanzghou University,Guangzhou,510006,China
4College of Computer Science and Information technology,University of Anbar,11,Ramadi,Anbar,Iraq
Abstract:Social media has been the primary source of information from mainstream news agencies due to the large number of users posting their feedback.The COVID-19 outbreak did not only bring a virus with it but it also brought fear and uncertainty along with inaccurate and misinformation spread on social media platforms.This phenomenon caused a state of panic among people.Different studies were conducted to stop the spread of fake news to help people cope with the situation.In this paper,a semantic analysis of three levels(negative,neutral,and positive)is used to gauge the feelings of Gulf countries towards the pandemic and the lockdown,on basis of a Twitter dataset of 2 months,using Natural Language Processing(NLP)techniques.It has been observed that there are no mixed emotions during the pandemic as it started with a neutral reaction,then positive sentiments,and lastly,peaks of negative reactions.The results show that the feelings of the Gulf countries towards the pandemic depict approximately a 50.5%neutral,a 31.2%positive,and an 18.3%negative sentiment overall.The study can be useful for government authorities to learn the discrepancies between different populations from diverse areas to overcome the COVID-19 spread accordingly.
Keywords:COVID-19;sentiment analysis;natural language processing;twitter;social data mining;sentiment polarity
1 Introduction
In December 2019,COVID-19 has caused a significant amount of hospitalizations and a state of global panic.In this environment,people converged to social media to communicate and to express their views.Some people,as well,used social media for spreading misinformation and push their agendas.As such,social media has established itself as an essential and integral part of modern society.While it has been an important tool for businesses,advertising and broadcasting ideas and news,it has become a powerful tool to spread and form opinions,obtaining feedbacks,and creating virtual bonding with the masses.Notably,sentiment analysis,sometimes known as opinion mining is a process that involves performing text analysis.As people like to share their thoughts and ideas regarding present situations and diverse social phenomena,like hazards,cultural,social trends,etc.,opinion mining deals with extracting information about people’s thoughts and feelings from thecorpusof text data by using a technique called Natural Language Processing (NLP).
Due to the tremendous amount of people present on social media and sharing their reviews,social media has become the main source of getting information from traditional news agencies.Because of the ease of access to social media and the presence of peers,people rely on social media for information.Among those people,there are only a few who rely on validated sources before believing the news.That aside people also show their views on social media and hence their emotions.During these difficult times,it is of utmost importance to analyze the behavior of people in their homes,as this might help the government make policies that can help fight the pandemic in a better way.Importantly,Twitter is one of the main sources of communication in the Arab world and there is little-known work done on studying it in times of crisis [1].Consequently,this study aims to fulfill this gap by investigating and studying the role of social media in keeping the masses aware of the present situation and teaching them how to respond to the COVID-19 pandemic [2].
1.1 Novel Corona-Virus(COVID-19)and Its Effects on Social Media
1.1.1 Fake News and Misinformation
When articulating in general terms,COVID-19 possesses a lot of misinformation on social media.With the increase of population on social media due to easy access,everybody has an open right to manifest emotion on social media [3].While looking at it from one aspect it becomes the most popular way to transmit several sources of information.However,when looking at it from another aspect,anyone can write any wrong information.Due to this fact,there are many conspiracies theories present on the internet and many people are believing in this wrong information.Mis-information is deadlier than the epidemic itself and it also spread faster than the pandemic itself [4].
Many people are fighting this misinformation in their ways.Brennen et al.[5]conducting a study highlighting the fact that why people spread false information.They tested an intervention to increase the truthfulness on social media.Ferrara [6]study the quality of information on Twitter to end up with increasing the belief of misinformation which leads to misleading claims that fail to get corrected.Sharma et al.[7]analyzed 255 pieces of misinformation based on their types,sources,and claims.They reported casual clarifications all to tend to improve the effectiveness of countering misinformation.Gao et al.[8]confirmed the use of bots at the start of the pandemic to promote political conspiracies in the United States.Öztürk et al.[9]worked on sentiments of people regarding policies during pandemic using Twitter with two aspects positive and negative.They build a dashboard to show transparency into data discussions surrounding the pandemic.Accordingly,other methods beyond debunking are required.
1.1.2 People’s Sentiment during the Pandemic
There have been various studies to perform analysis of the mood of communities in response to a global stimulus.Ferrara [6]present a study that analyzes the factors affecting the journalist’s use of Twitter using a framework called the theory of planned behavior.In Sharma et al.[7],authors perform sentiment analysis of people in response to the global issue of terrorism.Similarly,there are various studies conducted to study the moods of the population towards natural disasters,such as floods,earthquakes,and hurricanes [8-10].It is noticeable that sentiment analysis cannot only be performed on those areas which are affected by a certain event.Contrary,there are various research performed to analyze sentiments of people who have only known about it through social,electronic,or print media and are not physically affected by the event.Abdul-Mageed et al.[11]show the behavior of people living in Turkey towards the Syrian refugee crisis since 2016.The authors also present the mood of people residing in Turkey in both languages Turkish and English.The analysis of both regions towards the same event provides additional information about the reaction of people living in different regions.
Table 1:Comparison table of previous methodologies
However,taking inspiration from the studies presented in [8,9,11],we present the sentiment analysis of Gulf countries towards the global pandemic of COVID-19.Our study aims at performing an analysis of People’s moods towards this event.We perform this study using the tweets extracted from Twitter using the Tweepy API.Then,data has been analyzed using TextBlob_ar and classify the tweets into three categories namely,positive,negative,and neutral.This study can help in analyzing the state of minds of the general population.Consequently,the analysis of this study can assist authorities in the implementation of various guidelines and rules in a particular region.Besides,this study can also highlight the needs of people during this global event,specifically,psychological assistance to cope with the prevailing conditions.Overall,the literature review of methods to perform sentiment analysis of the tweets posted during the COVID-19 pandemic in the Gulf States is summarized in Tab.1.
1.2 Contribution
• This paper as its significant contribution,proposes the pandemic mood of the people in Gulf countries,i.e.,Saudi Arabia,United Arab Emirates,Kuwait,Oman,Bahrain and Qatar.
• It deals with analysis of twitter data to find trends of people’s sentiment to track the rise of fear and panic that has been caused by rapid spread of COVID-19 infection.
• There have been studies conducted that explored custom studies as well as automated studies of sentiment analysis and there is a tremendous potential in this field.
• There have also been studies that uses deep learning for sentiment analysis [10,11]and studies that deals with sentiment analysis of Arabic language [11-13].
In this regard,our major contributions are performing the semantic analysis of the tweets that are posted from Gulf regions;study the pandemic mood of people;track the rise of fear and panic that has been caused by the rapid spread of COVID-19 infection,and examines behaviors and the new lifestyle that shifted to something new and decreases of information amplification upon hearing the news.Furthermore,discussion and conceptual insights are provided in this study with descriptive text mining and data visualization.The rest of the paper is organized as follows.Section 2 theory components and related works are outlined and how our methods are leveraged from other works.In Section 3,we present the result and analysis of the proposed work.We also discuss and present a comparison of the results in Section 4.Finally,we conclude the paper and provide further research areas.
2 Materials and Methods
Any community’s opinion can become the driving force of any cause.Therefore,the data derived by conducting sentiment analysis on the data collected by social media can not only provide insight into what people think but can also assist in the development of various policies.In [9],one of the most significant works performed in such a domain is discussed.Sentiment analysis of tweet data on the Syrian crisis was performed by the authors in [9].This study effectively presents the Turkish community’s reaction to this issue.In our work,however,we conduct a sentiment analysis of Arabic tweets posted with Arabic tweets during the COVID-19 pandemic.Only the gulf countries,i.e.,Saudi Arabia,Oman,Bahrain,Qatar,and the United Arab Emirates (UAE),were concerned with the data used for this work.The goal of this work is to examine people from a specific area towards a global pandemic.This work is significant because it captures the reaction of the community towards a very rare phenomenon and people have never seen anything like it before.Fig.1 shows the architecture of our sentiment analysis including data extraction and pre-processing.
2.1 Data Extraction
The data used in this work for semantic analysis is gathered from Twitter over a span of almost two months,i.e.,between 1 March 2020 and 28 April 2020.The position philtre was also specified,i.e.,only the tweets of the aforementioned Gulf countries were extracted.
Figure 1:Pipeline of the sentiment analysis including data extraction and pre-processing
Using Python’s tweepy package and Twitter API,this data collection was performed programmatically [14].The resulting data is collected in CSV files,where various files for each country in the Gulf region are listed.The CSV files also contained information about the tweet,date,and country of the person posting.The keywords corona,coronavirus,COVID,COVID-19,and sarscov22 were used to gather a total of 60,000 tweets.
2.2 Data Preprocessing
In machine learning,data preprocessing is an essential step [15,16].The data that is accessed using the Twitter API is not a live stream.Instead,knowledge is a series of tweets from a given time interval.Therefore,if we repeatedly hit the endpoints of Twitter’s REST API,our data might contain duplicate tweets.We delete all redundant tweets from our data to ensure our data is safe and thorough.Furthermore,we also screen out spam tweets,but this is achieved in the next step of data cleaning.With the reduced dataset containing 33496 tweets from all countries together,the data cleaning resulted in the reduction of our dataset.
There is a need for more cleaning to eliminate the unwanted content of the dataset after cleaning the data in the first level.First of all,in this stage,we remove all spam tweets and tokenize the text data.The method of separating the text into smaller units is Tokenization.There are different text tokenizers,and each has its own definition concerning the smallest unit.In certain instances,the smallest units or tokens are called words,phrases,numbers,and even punctuation.We delete hashtags,mentions,numbers,hyperlinks,non-Arabic words,emojis,punctuation marks,and single characters after we apply tokenization.Please note that,in addition to text details,the emojis provide additional sentiment information [17].For this job,however,the use of emojis for conducting semantic analysis is out of reach.It is noticeable that the information includes tweets that are posted in various regions by users.Thus,the text of the tweet is highly dependent on the region’s aesthetics and design.The data obtained is therefore not homogeneous.To make data homogenous,we,therefore,apply data transformations.In the later stages of our work,this helps us to standardize and normalize the text.In Arabic language,most letters have contextual letterforms,for instance ( ),so in this case we normalize to ().Similarly,the diacritics ( )that help to pronounce the letter in proper way are not required and are therefore removed as well.
In the last step,we remove the stop words.Stop words are the most common word that have no meaning,but to help sentence structure like ( ).We also remove these words before starting the analysis of our data.In addition,after the removal of stop words,we perform stemming of the clean data.Stemming Is the process of reducing the word stems in root form,e.g.,(,) are converted to ( ) after stemming.The algorithm we used for stemming the Arabic data is ISRIStemmer available in the NLTK library.
Figure 2:A visual model of TextBlob_ar classification
2.3 Word Tagging
After ensuring the text has been cleaned,each word of every tweet is looked to calculate the sematic score.The sentiment score of each word is then combined to compute the semantic score of the particular tweet.
In this work,each word’s semantic score is calculated using three levels of semantics,i.e.,negative,neutral,and positive as shown in Fig.2 TextBlob is a machine learning library for classifying the word’s sematic of English language.In this work,we focus on Arabic language tweets from the gulf region.Therefore,we make use of TextBlob_ar.TextBlob_ar performs semantic analysis on the words of Arabic language and it is built on the top of TextBlob and NLTK.Tab.2 presents the pros and cons of using the TextBlob for semantic analysis.
Table 2:Comparison of pros and cons of TextBlob_ar used in this work for performing tweet’s word tagging
3 Results and Analysis
Python3 is used to develop all our codes for the execution of semantic analysis.Moreover,we use following Python libraries.
• Pandas (for data manipulation and analysis)
• NLTK (for language processing)
• Tweepy (for accessing twitter REST API)
• TextBlob:(for natural language processing and word tagging)
3.1 Results
The sentiment property of the library TextBlob_ar for each word returns a tuple representing a sentiment,which includes polarity and subjectivity.The polarity score ∈(−1.0,1.0),i.e.,−1.0,0.0 and 1.0 represent the negative,neutral,and positive sentiment of the word,respectively.Similarly,the subjectivity score ∈(0.0,1.0),where 0.0 and 1.0 represent the objective and subjective opinion,respectively.Tab.3 presents a few instances of the polarity and subjectivity tests.
In Tab.4,we present a quantitative comparison of positive,negative,and neutral sentiments extracted from tweets using TextBlob_ar.Similarly,we also present these results visually in Figs.3-8.
Table 3:Examples of polarity and subjective score computation using TextBlob_ar
Table 4:Quantitative comparison of positive,negative,and neutral sentiments
Figure 3:A ratio of positive,negative,and neutral sentiments of tweets of UAE
As featured in Figs.3-8,the most data obtained or gathered is the tweets with a neutral feeling.For example,Oman has the lowest number of neutral tweets in the period from 1 March 2020 to 28 April 2020,i.e.,44.84% of the total tweets are neutral.On the other hand,within the same range,Qatar represents the largest number of neutral tweets,i.e.,54.73.The negative tweets from the lowest sentiment on the start of the covid19 pandemic in each of the gulf countries are visible from the figures.Oman is the country of the gulf region that represents the greatest number of negative tweets,i.e.,20.99%,whereas Qatar represents the lowest number of tweets with negative sentiments,which is 15.38%.
Figure 4:A ratio of positive,negative,and neutral sentiments of tweets of Saudi Arabia
Figure 5:A ratio of positive,negative,and neutral sentiments of tweets of Qatar
Figure 6:A ratio of positive,negative,and neutral sentiments of tweets of Oman
Similarly,United Arab Emirates tweeted the lowest number of tweets that are identified as tweets with positive sentiments.These tweets form 28.27% of the total data collected between 1st March 2020 to 28th April 2020 from the country.Contrary,Oman represent the highest number of positive tweets in the same time,i.e.,34.16%.
Figure 7:A ratio of positive,negative,and neutral sentiments of tweets of Kuwait
Figure 8:A ratio of positive,negative,and neutral sentiments of tweets of Bahrain
Fig.9 presents the total number of tweets with positive,negative,and neutral sentiments.These tweets are made in the time interval of 1st March 2020 to 28th April 2020 from the six countries of Gulf region.It is noticeable that neutral tweets form the major portion of the collected dataset,followed by the tweets with positive sentiment.The negative tweets represent the lowest amount of dataset,i.e.,18.3% of all tweets from Gulf region.
Figs.10-15 represent the wordclouds of each of the Gulf country.A wordcloud or tag cloud is a visual representation of text data.The wordcloud depicts the words that are most commonly used in the target dataset and provide information regarding the keywords of the text.As presented in figures,the most dominant word occurring in the wordcloud all Gulf countries iswhich mean “corona” or “COVID” in English.Another most dominant word being used iswhich means “virus.” Another noticeable word in the tag cloud iswhich mean pandemic or epidemic and refers to the COVID-19 pandemic in this case.
Fig.10 presents the wordcloud of Saudi Arabia.The most common term of this wordcloud arewhich translates to “stay home”,“curfew,” “permission,” “pandemic” and“Health ministry,” respectively.Fig.11 presents the wordcloud of UAE.The most common termsin the wordcloud of UAE are which means “Corona,” “virus,”“Saudi Health,” “new cases,” respectively.Fig.12 presents the wordcloud of Qatar.The most frequent terms of the word cloud aremeaning “month,” “Ramadan,”“Corona” “Application” and “Prince or Sheikh,” respectively.
Figure 9:A combined ratio of positive,negative,and neutral sentiments of tweets from Gulf region
Figure 10:The wordcloud of Saudi Arabia Emirates obtained from the tweet’s dataset
Figure 11:The wordcloud of United Arab Emirates obtained from the tweet’s dataset
Fig.13 represents the wordcloud of Oman tweets.The most common words in Oman’s wordcloud arewhich translates to “Corona,” “Ministry,” “new cases,”“Sultanate of Oman” and “virus,” respectively.
Figure 12:The wordcloud of Qatar obtained from the tweet’s dataset
Figure 13:The wordcloud of Oman obtained from the tweet’s dataset
Fig.14 presents the wordcloud of Kuwait’s tweet dataset.The most common words in Kuwait’s tweets arewhich translates to “Ministry,” “Kuwait,” “Pandemic,” “Corona” and “crisis,” respectively.Fig.15 presents the wordcloud of Bahrain’s tweets.The most frequent words in this wordcloud arewhich translates to“new cases,” “deaths,” “virus,” “Corona,” and “reason,” respectively.
Figure 14:The wordcloud of Kuwait obtained from the tweet’s dataset
It is evident from these results that the main topic on the minds of people during the interval of data collection,i.e.,1st March 2020 to 28th April 2020 are related to pandemic.The results also make it evident that the most common sentiment observed in the people of Gulf region during the aforementioned period is neutral.
Figure 15:The wordcloud of Bahrain obtained from the tweet’s dataset
4 Discussion
For sentiment analysis of the masses in Gulf countries,an analysis of the Twitter dataset was performed.Every nation’s data has been individually analyzed and the findings are collected both separately and collectively.Collectively speaking,a neutral assessment of the COVID-19 situation has been acknowledged by more than half of the population.There are positive reviews by 30 percent of the population and negative reviews by almost 20 percent of the population.This illustrates that people manage the situation nicely.Although 20 percent of the population with adverse feelings is concerned,the government should take action to deal with these individuals as conditions such as this can cause mental health issues.Hotlines and online consultations should be provided by the government as this could help manage the situation in a better way.Another way to deal with this problem is to monitor disinformation on social media sites,and some fake news could scare people.The government should hold interventions to stop the spread of this misinformation.Another factor of interest is to understand how people in Gulf countries are feeling towards the COVID-19 pandemic.After checking the timeframe of retrieved data since the pandemic hit the Gulf countries,we recognize the following:
(1) Firstly,at the beginning of the virus (March 2020),individuals show a neutral reaction that is very strange from any pandemic that can trigger widespread concern,panic attacks,apprehension,and tension as a natural feeling for any pandemic.In Gulf countries,this may reflect large preventive measures such as early closure of borders,social distance regulation,movement is limited most of the time during curfew hours,virtual work model to monitor the spread of COVID-19,and much more.
(2) Secondly,after attempting to flatten the virus curve from those nations,a positive response comes after the neutral feeling.Therefore,reducing new cases and deaths every day leads to people’s optimistic feelings as the virus will be a more regulated one.
(3) Finally,since mid-April,a greater concern for certain people has been adversely described as a result of growing deaths and cases.
Overall,it is observed that there are no mixed emotions in Gulf countries during the pandemic.Moreover,the positive feelings and satisfaction regarding society in Gulf countries are on the rise in most of the time.
5 Conclusion and Future Studies
This research comprises a semantic study of the Gulf countries during the period from 1 March 2020 to 28 April 2020.As never before,the prevailing coronavirus pandemic has infected the world.The main objective of this work is to observe the answer to this pandemic of the citizens of the Gulf countries,i.e.,Saudi Arabia,the United Arab Emirates,Kuwait,Oman,Qatar,and Bahrain.This is achieved by examining the tweets semantically posted from these regions.The knowledge about the tweets is gathered manually from the tweepy Twitter API.To delete inappropriate content,such as emojis,single characters,hashtags,re-tweets,spam tweets,hyperlinks,etc.,the collected data is then pre-processed.The data is often processed to extract and stem the stop words so that the resulting information can be properly analyzed.To perform the semantic analysis of Arabic knowledge,we employ TextBlob ar.The knowledge is divided into three semantic categories:positive,negative,and neutral.The findings indicate that the Gulf countries’tweets show roughly 50.5% neutral,31.2% positive,and 18.3% overall negative opinion.Also,by making the word clouds of the collected data,we analyze the collected data.The analysis reveals that the most common topic of discussion among the people of gulf countries during the considered time interval is a coronavirus,which is related to the ongoing pandemic.
We expect more study to strengthen our research by contrasting the semantics of the interval from 1 March 2020 to 28 April 2020 with the previous three months,i.e.,January to March,and the next three months,i.e.,May to July.This research would help to explain the complexities of the Gulf countries’semantics,which view the pandemic as a stimulus.This study may help to explain the different events caused by the pandemic.Therefore,by performing this study,we can estimate which semantic prevailed among the people due to this event.For instance,the world economy has been affected significantly by the coronavirus pandemic [12,13].Considering that the economy has been greatly affected,we want to look at the semantics of the Gulf countries.To undertake this analysis,the current data is not adequate.Moreover,until the symptoms of this pandemic hit the general populations of the area of concern,this research will not be conducted.We are also intended to recognize the weaknesses and to observe the response of the citizens of the major countries of COVID-19,i.e.,the USA,Spain,Italy,and others.
Acknowledgement:We would like to thank the Deanship of Scientific Research,Qassim University for funding the publication of this project.
Funding Statement:The author(s) received no specific funding for this study.
Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.
杂志排行
Computers Materials&Continua的其它文章
- Recognition and Detection of Diabetic Retinopathy Using Densenet-65 Based Faster-RCNN
- Adaptation of Vehicular Ad hoc Network Clustering Protocol for Smart Transportation
- Computational Microfluidic Channel for Separation of Escherichia coli from Blood-Cells
- A Fractal-Fractional Model for the MHD Flow of Casson Fluid in a Channel
- Simulation,Modeling,and Optimization of Intelligent Kidney Disease Predication Empowered with Computational Intelligence Approaches
- Prediction of Time Series Empowered with a Novel SREKRLS Algorithm