APP下载

Environmental complaint insights through text mining based on the driver, pressure, state, impact, and response (DPSIR)framework: Evidence from an Italian environmental agency

2023-10-24FinMANSERVISIMiheleBANZITomsoTONELLIPoloVERONESISusnnRICCIDminoDISTANTEStefnoFARALLIGiuseppeBORTONE

区域可持续发展(英文) 2023年3期

Fin MANSERVISI , Mihele BANZI Tomso TONELLIPolo VERONESI Susnn RICCI Dmino DISTANTE,Stefno FARALLI, Giuseppe BORTONE

a Regional Agency for Prevention, Environment and Energy of Emilia-Romagna, Via Po 5, Bologna, 40139, Italy

b Department of Law and Economics, University of Rome Unitelma Sapienza, Piazza Sassari 4, Rome, 00161, Italy

c Department of Computer Science, Sapienza University of Rome, Viale Regina Elena 295, Rome, 00161, Italy

Keywords:Environmental complaints Text mining approach Term Frequency-Inverse Document Frequency(TF-IDF)Driver, pressure, state,impact, and response(DPSIR) framework Semantic network analysis Regional Agency for Prevention,Environment and Energy (Arpae)

A B S T R A C T

1.Introduction

The relationship between citizen and government has become a strategic focus area of modern public administrations.In this context, the interaction between these social actors is thought to provide a vector for the better provision of public service and efficiency in public institutions (Secchi, 2009; Ponte, 2015).Citizen may provide information to public administrations through different channels (e.g., online platforms, postal mail, call centers, and in-person services); and public administrations rely on such information to enhance their public services.Particularly, the modern citizen is online, which has created a culture of participation and involvement (egovernance), substituting for that of unilateral action.Recent research showed that governance supports transparency by improving administrative efficiency, promoting ethical behavior, and increasing trust and confidence in public institutions (Twizeyimana and Andersson, 2019).This new orientation towards citizen encourages public entities to improve and increase their access to service demands by implementing customer(citizen) relationship management (CRM) models, which aim at reducing operational costs, processing complaints,and shortening response times (Richter and Cornford, 2007; Chen, 2010).On the other hand, citizen is becoming more pro-active, demanding high-quality public services and a timely response rate from public authorities.In particular, complainants often expect public authorities to remediate the issues they raised, and this might be understood as a key driver of their satisfaction.In this sense, CRM models may contribute to changing the traditional and self-referential approach of government into one that is more citizen-centered with improved service quality and efficiency to meet the needs of citizen (Kannabiran et al., 2004; Al-Khouri, 2012).At the same time, the rapid development of information technology and growing environmental awareness of citizen have increased the number of environmental complaints, which represent the expressions of these grievances (such as poor air quality,illegal waste disposal, water contamination, and general pollution).A highly intelligent assessment of citizen’s demands and complaints may contribute to detecting problems within a city or area, fostering dialogue between decision-makers, reallocating resources, improving service efficiency, enhancing service coordination, and predicting citizen’s demand.Furthermore, the management of environmental complaints may also enhance the relationship between government and citizen, and increase citizen’s satisfaction (Hartmann et al., 2017).

Rapid growth in the quantity of user-generated data makes big data analysis more necessary.Big data analysis provides a valuable tool for understanding realities by condensing a large amount of data into a small amount of information (Ghodousi et al., 2019).Recent progress in algorithms extends the application of big data analysis to even unstructured text data, including customer suggestions, complaints, and feedback (Katir et al., 2020; Lucini et al., 2020).As a result, many researches aim at using text mining approach to identify customer needs, in order to guide the development of market-oriented products (Zhan et al., 2009; Aguwa et al., 2017; Joung et al., 2019).

Since knowledge acquisition is the primary need for the continuous adjustment of programming and planning,government authorities must receive and redress grievances in a timely and efficient manner.In this sense, highquality complaint-management systems may not only contribute to environmental compliance assurance, but also be considered as the drivers of “public value”, which can satisfy the actual and future needs of citizen (Deidda Gagliardo, 2002).From the perspective of environmental management, this means protecting the environment for its own sake and ensuring that the interests of future generations are considered through sustainable environmental solutions (Sami et al., 2021).

The primary step in this process is understanding the content of environmental complaints, which reflect local environmental problems or issues, with the expectation that complaint-handling body will intervene to address the situation.The driver, pressure, state, impact, and response (DPSIR) framework (Smeets and Weterings, 1999) is an important tool for understanding the interaction between people and environment, and identifying opportunities for developing sustainable and healthy communities.The framework recognizes that the state of environment (air,water and soil quality, nature, and landscape) is influenced by certain pressures, such as chemicals, waste,electromagnetic radiation, and noise.These pressures are influenced by economic and social driving forces(industry, population, and transport).Furthermore, the state or quality of the environment has impacts on health,well-being, businesses, and economy.For this reason, policy-makers may adopt different interventions or responses,such as controls, pressures, and actions, to mitigate impacts.

The DPSIR framework is based on the pressure-state-response framework initially proposed by Rapport and Friend (1979); and it was adapted and largely promoted by the Organization for Economic Cooperation and Development (OECD) for environmental reporting (OECD, 1993).Several international organizations, including the United States Environmental Protection Agency (EPA, 1994), United Nations Environment Programme (UNEP,1994), and the EU have adopted the framework, with the latter describing it as the most appropriate way of structuring environmental information (European Environmental Agency (EEA), 2005a).Indicators of the state and impact fall under the remit of the EEA (European Commission, 1999), which is required to communicate the state of the environment to policy-makers.The DPSIR framework is widely used to solve problem by both natural and social scientists, who further refine and apply the DPSIR framework and its derivatives in an on-going process tailored to a wide range of applications (Patrício et al., 2016).

The literature reports many cases, in which the application of the DPSIR framework highlighted essential connections and relations to promote a more complete understanding of human-ecological systems (Cook et al.,2014).For instance, the framework has been used in several aquatic and marine ecosystem assessments to bridge the gap between science and decision-making (Gebremedhin et al., 2018; Mozumder et al., 2019), and it has also been applied to determine economic drivers and activities, contributing to coastal erosion and vulnerabilities(Tscherning et al., 2012; Lewison et al., 2016; Semeoshenkova et al., 2017).Mostly, the DPSIR framework has been used to develop indicators (Gabrielsen and Bosch, 2003); however, it has also been applied to organize the information contained in management plans and the communication between stakeholders (Khunanake et al, 2018).Finally, the framework has been used for new models and decision support tools to evaluate and compare decision outcomes (Cormier et al., 2013).

Decision-making happens “within the context of a social system that includes different levels of capacity,commitment, economics, political mandates and pressures, and cultural and traditional frameworks” (Loomis and Paterson, 2014).Thus, decision-making should be based on the best available evidence from a wide range of sources, including citizen reports.To support this, policies and processes must be comprehensively and systematically improved (Brewer, 2007).In particular, a complaint system that categorizes the contents of complaints according to the DPSIR framework might provide a systemic approach to responding to unknown or underappreciated drivers, pressures, and impacts (European Commission and Directorate-General for Environment,2020), while providing crucial information for decision-makers to monitor policies and respond to unexpected events.

However, few studies have applied the DPSIR framework to categorize various perspectives and perceptions of stakeholders in order to foster a better understanding of sustainability challenges in different geographical contexts.A case study in Australia traced the link between local residents’ perceptions of a range of water quality issues and the institutions responsible for managing the water sources drawing on the DPSIR framework to guide the data collection and establish response measures (Larson and Stone-Jovicich, 2011).Another study used focus group discussions and key informant interviews with the local community members to identify the drivers and impacts of land use or land cover and climate change on the environment, water, and livelihoods in the Lake Kyoga Basin(Uganda) (Obubu et al., 2022).In doing so, they gain insight into areas of convergence and divergences between residents’ perceptions and institutional responsibilities and responses.

Although the DPSIR framework is a formal structure, some adaptations can be identified.Currently, there are at least 25 derivative schemes of the DPSIR framework and a widespread and increasing use of this framework to structure and analyze information for management and decision-making across ecosystems (Patricio et al., 2016).Since differences in terms and definitions may generate misunderstanding and ambiguity in the interpretation of the results, a declaration of the adopted terms is required (Bruno et al., 2020).The present study relies on the rigorous system of definitions developed by the EPA, which distinguishes among social, economic, and environmental factors.This decision is made to reduce confusion regarding the meaning of the DPSIR framework and ensure that the process could be easily replicated and applied to different systems, topics, and decision contexts.Figure 1 presents the hierarchy of the DPSIR framework identified by the EPA and represents the basis for the classification of keywords applied by the framework (Bradley, 2015).

The present study applied a text mining approach to identify the relevant keywords (denoting issues) within environmental complaints made by different stakeholders in nine provinces of the Emilia-Romagna Region, Italy.The keywords revealed by text mining approach might represent a starting point for finding repeated patterns in the environmental complaints, thus providing the basis for automated support of complaint processes.Second, the DPSIR framework was used to categorize complaints into driver, pressure, state, impact, and response, with the aims of supporting the planning and organization of information, and identifying stakeholder concerns and priorities, as well as measuring and evaluating indicators.

Overall, by applying a hybrid of powerful analytical capabilities based on the DPSIR framework, the present study proposed both quantitative and qualitative approaches to investigate pressing environmental complaints.

2.Materials and methods

This section reports the materials and methods employed in the present study.Figure 2 shows the progressive steps taken to analyze the environmental complaints.The process began with the collection of data from the Regional Agency for Prevention, Environment, and Energy (Arpae) in Emilia-Romagna Region, Italy.Local challenges were identified by analyzing the location of each complaint and determining regional differences between stakeholder perceptions of environmental complaints.

2.1.Study area

Fig.1.Hierarchy of the driver, pressure, state, impact, and response (DPSIR) framework identified by Environmental Protection Agency (EPA).

Arpae is an environmental and technical body that supports local authorities to protect people health, ecosystems,and territorial safety in the Emilia-Romagna Region of Italy.Its activities cover all aspects of environment,including the monitoring of environmental components, the surveillance of human activities and their impacts, the assessment of environmental impacts on plans and projects, and the creation and management of environmental information systems.In addition, since 1 January 2016, Arpae became the permitting and inspection authority in the Emilia-Romagna Region.Typically, a permit refers to an environmental authorization, which establishes limits for pollutant emissions into air and water, and for the generation and management of waste, together with other environmental conditions that are specific to individual installation.Following receipt of a valid application, Arpae consults with other competent authorities to gather facts and opinions that might contribute to the determination of the application, including those of the public.This is consistent with initiative advocated by the Integrated Environmental Permitting Guidelines for Eastern Europe, Caucasus and Central Asia (EECCA) Countries to increase stakeholders’ involvement in environmental affairs (OECD, 2005).Then, regular environmental inspections are conducted to verify compliance with environmental conditions.As the number of environmental complaints increases, the frequency of such inspections may increase.

The stakeholders include: (1) members of the public (including citizen associations) who may file complaints after discovering cases of non-compliance or maladministration and propose requests for information; (2)companies, as the main applicant for environmental authorizations; and (3) local authorities, who may request technical support or pass on complaints about matters falling outside their areas of competence.

Fig.2.Progressive steps assigned to analyze the environmental complaints.

The Emilia-Romagna Region is home to 4.40×106residents, of whom 8.9% live in its capital of Bologna City(Italian National Statistical Institute (ISTAT), 2023).The region is divided into nine provinces: Piacenza, Parma,Reggio-Emilia, Modena, Bologna, Ferrara, Ravenna, Forlì-Cesena, and Rimini.The population density is 200 persons/km2in 2019, which is close to the national average.Moreover, the population is evenly distributed.There is no dominant large city, but an axis of medium-sized cities along the Via Emilia (Italy), where two-thirds of the population and the majority of the industrial production are concentrated.

The Emilia-Romagna Region is well known for its automotive and motorcycle industries, food production, and ceramic industry, which have developed in the form of large and successful industrial districts spreading over the entire region.The manufacturing sector dominates the regional economy.As shown in Table 1, the biggest segments with manufacturing including: (1) food products, which are mainly distributed in Parma, Modena,Bologna, and Forlì-Cesena provinces; (2) tobacco products, which are only scattered in Bologna Province; (3)textile and wearing apparel industries, which are least distributed in Modena Province; (4) wood and products of wood and cork, straw, and plaiting materials, which are least distributed in Forlì-Cesena Province; (5) basic pharmaceutical products and pharmaceutical preparations, which are mainly distributed in Piacenza Province; (6)other non-metallic mineral products (including ceramics), mostly distributed in Ferrara and Forlì-Cesena provinces;and (7) machinery and equipment factories (including automotive and motorcycle industries), which are mainly distributed in Bologna, Reggio-Emilia, and Modena provinces.

Agriculture and farming (especially the pig and poultry sector) and related agro-industrial activities are critical for the Emilia-Romagna Region.Looking more specifically at the agricultural sector, the most interesting crops are fruit (grapes), horticultural crops, tuber plants, legumes, wheat, maize, and sorghum.Nevertheless, the dominant activity is the tomato processing, with the Emilia-Romagna Region representing the second largest producer in Italy.Even regarding the dairy sector, the Emilia-Romagna Region is among the most productive regions in Italy, with regional cow milk production of approximately 1.77×106t.Since the milk of the Emilia-Romagna Region is largely used to produce Parmigiano Reggiano and Grana Padano hard cheeses with relatively low whey, and the production of whey and buttermilk is also very high, approximately 1.45×106t (Motola et al., 2009).

2.2.Customer relationship management in Regional Agency for Prevention, Environment and Energy (Arpae)

The Public Relations Office (Italian acronym for “Ufficio Relazioni con il Pubblico (URP)”) represents the entry point for all individuals who wish to connect with Arpae.It collects complaints, petitions, reports, suggestions, and claims aimed at facilitating the customer-agency relationship to guarantee the rights of information as enshrined in the Italian Law 241/90 on the transparency of the Public Administration, and other laws.The responsibilities of the URP are defined by Italian Law 150 on 7 June 2000 as “Discipline of the information and communication activities of the Public Administration”.Arpae has four URPs, one for each territorial area (west, central, metropolitan area,and east).Individuals can contact these URPs for the following reasons: requesting information on Arpae’s services or environment; registering disservices and complaints; filing environmental incident reports; and citing events that may cause harm or potential harm to an environmental receptor (e.g., air, water, land, wildlife, and local habitat).In this case, the URP manually assigns the report to environmental prevention area of Arpae (territorial services) of the relevant territory covered by the report to address the problem.

The individual reporting a complaint, disservice, or environmental incident, or requesting information or data,typically need fill out online form titled “Contact Arpae”, which is available on Arpae website (www.arpae.it).Graphic elements help individuals navigate, and the form assists them structure their complaint to include all the relevant information with recourse to drop-down lists.In particular, individuals must complete the following fields:identity of the complainant, identification and location of the complaint, and the reason for complaint.When a complaint is filed, the system is configured to notify the URP operators by email.The URP operators take charge of the complaint within five working days.Currently, all the incoming user data are processed via a pipeline that is fully human-supervised by the URP operators.This costly and time-consuming effort includes human analysis by the means of reading text data.Subsequently, the URP operators process each claim or general request for information according to an own classification.Indeed, on the basis of human-supervised ticket tagging, the URP operators manually assign the request to a specific department of Arpae.

Table1Socio-economic context of the Emilia-Romagna Region reported by the Italian national National statistical Statistical institute Institute (ISTAT).

Additionally, Arpae provides a 24-h emergency service for environmental incidents throughout the region,concerning pollution phenomena that may cause serious and immediate damage to the environment.To activate environmental emergency intervention, individuals must contact Arpae by phone.Arpae’ operators who receive the call then enter the data into the “Environmental incident reporting” portal to identify the individual making the claim and the characteristics of their report.The personal data provided by users are processed in accordance with regulation of the EU No.2018/17251 of the European Parliament and of the Council of 23 October 2018 on the protection of natural persons with regard to the processing of personal data by the institutions, bodies, offices, and agencies of the EU.

2.3.Data sources and processing

The proposed approach is based on a keyword extraction and semantic analysis of mined data regarding environmental complaints submitted to Arpae by different stakeholders (public, citizen associations, local authorities, and companies) in the Emilia-Romagna Region of Italy.Data were retrieved for one year (from 1 January 2021 to 31 December 2021) from two online complaint submission systems: Arpae’s online claim submission system (“Contact Arpae”) and Arpae’s internal platform for environmental pollution (“Environmental incident reporting portal”), which are managed by Arpae environmental technicians.

These sources represent the main communication channels between stakeholders and Arpae.Each record in the complaint dataset extracted from the “Contact Arpae” (in CSV format), which contained the date, complaint ID,province, municipality, personal contact information, user classification, claim topic, claim content, URP response,and response date (Table 2).Some of these fields were prompted by Arpae through drop-down menus within the contact form (e.g., user classification and claim topic).This online reporting platform was established to facilitate individuals’ filing of complaints regarding environmental pollution and general information requests.

Table 2Example of complaint records from the online claim submission system “Contact Arpae”.

Each record in the complaint dataset extracted from the “Environmental incident reporting portal” (in CSV format) contained the date, complaint ID, province and municipality, exact location of the incident (address),position on the map (geolocalization), environmental incident classification, intervention level, territorial area responsible for managing the incident, and incident severity.These reports were filled in by Arpae’ operators on the basis of calls.

In total, 732 valid records are from “Contact Arpae” and 1745 valid records are from the “Environmental incident reporting portal”.Records with missing or mismatched geographical information were excluded.Ultimately, 2477 records were evaluated (requests or claims that were unrelated to environmental issues were excluded).We classified environmental complaints based on the reported information, which includes environmental categories(air pollution, water pollution, noise pollution, waste, odor, soil, weather-climate, sea-coast, and electromagnetic radiation), stakeholders’ claim topic, and the spatial distribution of complaints, using the Microsoft Power BI tool(Microsoft Power Platform, Washington, the United States).

Appropriate precautions were taken to ensure anonymity by removing identifiable information and aggregating data by provinces or municipalities and stakeholder groups.The selected regular expression was developed to identify several identifiers including addresses, emails, phone numbers, and zip codes.A privacy statement explains how Arpae uses and shares users’ information and must be accepted before filling a request or complaint to the agency.

2.3.1.Keywords extraction and the classification of the driver, pressure, state, impact, and response(DPSIR)framework

Natural language processing was used to extract keywords from the datasets.For this task, the open-source tool of Orange (University of Ljubljana, Ljubljana, Slovenia) (Demsar et al., 2013) was employed to define the workflow depicted in Figure 3.Each component (widget) embedded a data retrieval, preprocessing, visualization,modeling, or evaluation task.Data preprocessing was performed using the Orange widget Preprocess Text to Corpus which takes the text file in a specific data format as input.This preprocessing phase included transformation,tokenization, filtering, and normalization steps as follows:

Transformation means that transformed the input corpus to lowercase.

Tokenization broke up documents (words, n-grams, and phrases) into elements called tokens, which were used as input for text mining.For this process, “Regexp” was used to split the text by a given regex.

Filtering reduced the number of corpus features to those that were most significant by removing so-called stop words.A bespoke stop words dictionary was employed to filter out all terms considered irrelevant to the research topic.

Normalization further simplified the corpus features using a lemmatization process that involved a full morphological analysis to accurately identify the lemma of each word.Lemmatization removed inflection suffixes and compresses words into a lemma, defined as the canonical form of a term, and stripped of most conjugation,suffixes, and transformations.

N-grams ranging createdn-grams from tokens, by means of a sliding window of sizen.A range of one-two grams was used to allow for high granularity in the basic unit of data.

After preprocessing, a document-term matrix was built using the bag of word widget.Subsequently, data were input to the t-distributed Stochastic Neighbor Embedding method (t-SNE) widget to generate a document map.We selected a subset of documents relative to each environmental category, and extracted the first 10 keywords for each environmental category using the extract keyword widget based on the Term Frequency-Inverse Document Frequency (TF-IDF) (Eq.1) (Salton and Buckley, 1988).The TF-IDF computed the significance of a termtto documentdby combining two scores (Eq.2), which is the frequency of termtin documentd(where TF is the Term Frequency and IDF is the number of documents in the corpus containingt, regardless of the frequency) (Eq.3).If a termtappears frequently in a documentd, but less frequently in other documents, the word can be considered more significant for distinguishing documentdand expressing its core content.

wheref(t,d) is the number of times termtappears in a document;Dis the total number of documents; and |d/tϵd| is the number of documents that include the termt.

Fig.3.Orange’s workflow for keyword extraction.T-SNE, t-distributed Stochastic Neighbor Embedding.

We classified the keywords with a high TF-IDF values according to the categories and sub-categories of the DPSIR framework to define the interconnections between environmental related aspects within complaints (Bradley, 2015).

2.3.2.Semantic network analysis of keywords in the DPSIR framework

Keywords were used to generate a semantic network illustrating the link between the DPSIR frameworks.For the semantic network analysis, text data were coded as nodes (words) and node pairs (word co-occurrences within a sentence or paragraph), which generate ann×nmatrix.

No state-of-art ontology with explicit directed or undirected semantic relations was reported in the literature.The creation of such an ontology would require different approaches, possibly leveraging sophisticated ontology learning algorithms (Khadir et al., 2021) and human expertise.Instead, according to the distributional hypothesis(Harris, 1954), the present work only focused on co-occurrence graphs, which are basic knowledge that can be carried out, highlighting the semantic relations between the pairs of co-occurring words.Co-occurrence graphs were used to evaluate the centrality measures of semantic network analysis, providing a ground for further analysis.

In the present study, the semantic network of the corpus was created using the Orange widget Corpus to Network.We mined the top 160 nouns, adjectives, and verb keywords (with a frequency of 40 or more) according to a frequency analysis.Subsequently, the Orange widget Network Explorer was used to visualize the network graph(Fig.4).

Fig.4.Orange’s workflow for the semantic network analysis.

A one-way mode symmetric matrix was derived.At this stage, the Pajek tool (Batagelj and Mrvar, 1998) was used to identify the structure of the links between words and analyze centrality, thus quantifying the degree of relationships.This degree centrality was analyzed to identify keywords centered on the network among the extracted keywords; it indicated the number of nodes (keywords) connected to other nodes, with high values indicating many connective relationships between nodes.Additionally, betweenness centrality was calculated to identify the degree to which nodes are located between other network nodes.Keywords with greater betweenness centrality played an important role in facilitating connective flows within a network, thereby helping the network run seamlessly.For this reason, betweenness centrality was a useful metric for identifying central concepts in a corpus (Roche, 2011).Finally, 160 keywords were manually clustered into the five DPSIR framework categories.In the Pajek tool, we created random partition and selected five clusters and one-way mode.Then, we assigned cluster numbers to each vertex (1=driver; 2=pressure; 3=state; 4=impact; and 5=response) on the basis of the DPSIR framework.We finally drew the network and relative partition, and observed the relationships among five categories of the DPSIR framework.

3.Results and discussion

3.1.Stakeholders’ environmental complaints at the regional scale

We analyzed the differences in stakeholders’ claim topic among the public, citizen associations, local authorities,and companies based on their self-classification through a drop-down menu on the online claim complaint system“Contact Arpae” and the filling out of the “Environmental incident reporting portal” managed by environmental technicians of Arpae.The public represented the main stakeholder group, showing 84.0% of all registered complaints, followed by companies (9.0%), and local authorities including municipalities, police department, fire department, and forest service (5.0%).Finally, 2.0% of the environmental complaints were registered by citizen associations.

Table 3 shows stakeholders’ claim topic on environmental complaints.The public was more attentive to air pollution, which accounted for 32.8% of their registered complaints, followed by water and noise pollution,accounting for 17.5% and 16.8%, respectively.Companies demonstrated the most balanced focus, with 28.9% of their complaints focusing on water pollution, likely due to corporate requirements for sewage discharge standards,12.4% referring to the weather-climate, and 11.9% pertaining to information requests related to environmental authorizations.Local authorities showed the greatest concern for noise pollution (34.4%), reflecting the fact that,they were the reference authorities in the Emilia-Romagna Region for noise pollution complaints.Indeed, the upon receipt of a complaint, municipalities opened formal administrative proceedings for the alleged acoustic pollution and request a phono-metric investigation from Arpae.The same applied for electromagnetic radiation, which accounted for 12.5% of local authorities’ complaints.Local authorities also paid close attention to water (21.9%)and air (17.2%) pollution.Citizen associations represented people, groups, and organizations with environmental concerns in the local community.Although the proportion of environmental complaints from citizen associations was low, their grievances focused mainly on air (25.0%) and noise (20.0%) pollution, possibly reflecting local situations that require a response from Arpae.

Table 3Stakeholders’ claim topic on environmental complaints.

3.2.Distribution of environmental complaints

The location and content of environmental complaints were analyzed to investigate spatial differences between stakeholder groups in their attention to different environmental complaints (Tables 4-7).With regard to the public,air pollution complaints were mostly associated with Modena Province, while noise and water pollution complaints were predominantly connected to Bologna and Parma provinces, respectively (Table 4).Companies located in Ravenna Province ranked water pollution as their first concern.In contrast, companies requesting data on the weather-climate or information related to environmental authorizations were mainly located in Bologna Province(Table 5).Most complaints forwarded by local authorities were from the eastern Emilia-Romagna Region, with Rimini, Forlì-Cesena, and Ravenna provinces submitting the most noise pollution complaints.Local authorities located in Rimini Province also gave the greatest attention to air and water pollution (Table 6).Citizen associations in Parma Province were most attentive to air pollution, while those in Modena Province expressed more grievances about odor.Of note, citizen associations were also concerned about electromagnetic radiation (Table 7).

Table 4Regional distribution of environmental complaints raised by the public.

Table 5Regional distribution of environmental complaints raised by the companies.

Table 6Regional distribution of environmental complaints raised by the local authorities.

Table 7Regional distribution of environmental complaints raised by the local authorities.

3.3.Keywords of environmental complaints

The TF-IDF values were calculated to extract the most important keywords associated with eight categories of environmental complaints (air pollution, water pollution, waste, noise pollution, electromagnetic radiation, soil,odor, and weather-climate and sea-coast), indicating the most recurrent user-selected claim topics.Table 8 shows the top ten keywords for each category.Keywords related to air pollution complaints included typical words such as“odor”, as well as sensory words such as “strong” (smell) and burnt plastic, and words indicating probable pollution sources (“factory”).Among the keywords of water pollution complaints, the pollution receptor “canal” ranked the first, followed by “foam”, “spillage”, and “stream”.Waste complaint keywords with the highest TF-IDF score were“littering”, “asbestos”, “material”, and “fire”.Within noise pollution complaints, the most important keywords were“request”, “factory”, “check”, and “trouble”.The words “house”, “condominium”, and “bar” also scored highly,mainly reflecting neighbor noise.The most critical keywords within electromagnetic radiation complaints comprised “request”, “radiobase stations”, “monitoring”, and “measurement”.For soil complaints, “spreading manure” and “sludge” from “farm” emerged as the most relevant.The keywords of odor complaints were strongly associated with words related to the duration of the annoyance (“hours”, “days”, and “night”) and the suspected cause, such as “spreading manure” from near “farm” and industrial complexes (as demonstrated by the keyword“factory”).Regarding the keywords of weather-climate and sea-coast complaints, citizen was most interested in receiving “data” (“rainfall data”).

Table 8Top 10 keywords within each environmental complaint category according to term frequency-inverse document frequency (TF-IDF) score.

3.4.Keywords of the DPSIR framework

Keywords with TF-IDF value greater than 0.001 were manually grouped into the categories of driver, pressure,state, impact, and response for each environmental complaint.The following sections summarized the main environmental complaints on the basis of the keyword selection.

3.4.1.Air pollution and odorkeywords

Economic activities such as factories were perceived as the most important causes of air pollution and odor.The distribution of air pollution complaints was closely related to population and specific industrial sectors.For example,the words “foundry cooperatives” and “ceramics” were most associated with the Modena Province, which hasceramic districts within its eight municipalities.According to environmental complaints, the agricultural sector was another important driver of air pollution.In particular, the “farm” sector (“cattle” and “dairy farm”) was considered as a key contributor to odor.The occurrence location of the word “farm” was distributed in Ferrara, Forlì-Cesena,Ravenna, Modena, Parma, and Rimini provinces, in decreasing order of relevance, which aligns with the widespread production of animal products (e.g., Parma ham, Bologna mortadella, and Parmigiano Reggiano cheese) in these provinces.Furthermore, complainants considered that “incinerators” and “biogas plant” is causally linked to air pollution.In general, individuals living close to foundries, ceramic factories, farms, incinerators, and biogas plants complained air pollution and odor.Pollutants induced changes to the environment and resulted from the operations of industry (“smoke”, “fumes”, “emissions”, and “dusts”), human activities (biomass “burning”), or agricultural practices (“spreading slurries” and “spreading manure”).Alterations in air quality (“burnt plastic”, “acrid smell”,“bitumen smell”, “chemical smell”, “ammonia smell”, and “unbreathable air”) were considered as the environmental triggers of “sore throat”, “headache”, “eye irritation”, and “respiratory problems”.Indeed, unpleasant odor could be a warning sign of potential risks to human health.Air pollution and odor also affected the quality of life, leading to “not sleeping” and “awakening”.As temperatures rose in summer, odor complaints gained in frequency, with individuals complaining of not being able to close their windows and having to avoid sitting outside.Furthermore, odor frequency (“hours”, “days”, “night”, and “evening”) was an important predictor of odor (Table 9).

Table 9Keywords and TF-IDF scores of air pollution and odor in the DPSIR framework.

3.4.2.Water pollution keywords

“Factory” and “farm” were considered as the main economic drivers of water pollution.The formation of “foam”in freshwater ecosystems such as “canal” and “river” was reported several times.Complainants often referred to land-spreading activities associated with “spreading slurries” and “gas oil spillage” as sources of human behavior pressure on water quality.Leaks or “spillage” could also emerge from machinery inside factories, or from breaks in containers and pipes holding dangerous liquids and gasses (“gas oil spilling”).Oil slicks eventually moved toward the shore, harming aquatic life (causing the death of “fish”) and damaging recreational areas.High quantities of nutrients in water from industrial crop fertilizers and animal waste might cause eutrophication and “algae” blooms that depleted oxygen levels in water and kill marine life.Among their cited impacts, complainants also reported“strong smell” and “nauseating smell” (Table 10).

Table 10Keywords and TF-IDF scores of water pollution in the DPSIR framework.

3.4.3.Waste keywords

“Littering”, “asbestos”, “plastic”, “abandoned rubbers”, and “demolition waste” were the most frequent triggers of waste complaints.Similar to the water pollution complaints, waste complaints referred to altered ecosystems(“river” and “soil”), particularly in the event of “spillage”.A “bad smell” and perceived “danger” emerged as the main reasons of waste complaints (Table 11).

Table 11Keywords and TF-IDF scores of waste in the DPSIR framework.

3.4.4.Noise pollution keywords

As expected, the major drivers for noise pollution were commercial establishments (“bar”, “hotel”, “restaurant”,and “supermarket”), as well as store animals (“barking of dogs”).Noise caused by loud “music” could be linked to commercial establishments, parks or playgrounds, streets or sidewalks, residences, and religious establishments(“bells”).Noise pollution associated with street traffic (“motorway”), transport (“train”), and construction (“yard”)presented annoyances and, at worst, impacted the quality of life.Heating, ventilation, and “air conditioning system”were reported as drivers of noise pollution.Noise, especially when persistent, or all day long or continuous could be distressing and reducing the quality of life (“trouble”, “discomfort”, and impact on “rest”) (Table 12).

Table 12Keywords and TF-IDF scores of noise pollution in the DPSIR framework.

3.4.5.Electromagnetic radiation keywords

Sources of electromagnetic radiation (“radiobase stations”, “mobile phone”, “antenna”, “cell tower”, and “cab”)were thought to expose local residents to low but continuous levels of electromagnetic radiation (Table 13).Complainants were often concerned about the health effects of this exposure.Noticeably, within the entire dataset,the word “health” was reported in 48 complaints, and more than 30.0% of these related to electromagnetic radiation.Concerns expressed in complaints reflected a high risk perception linked to this issue.

Table 13Keywords and associated TF-IDF scores of electromagnetic radiation in the DPSIR framework.

3.4.6.Soil keywords

The soil keywords showed many similarities with the water pollution keywords, except for the TF-IDF score ranking of some words.In particular, complainants attributed the main soil contamination pressures to “spreading slurries”, “spreading manure”, and “zootechnical effluents” (Table 14).

Table 14Keywords and associated TF-IDF scores of soil in the DPSIR framework.

3.4.7.Keywords of weather-climate and sea-coast

With regard to weather-climate and sea coast issues, Arpae messages were not environmental complaints, but requests for “information” and “data”.Indeed, citizen mainly asked for data to support their personal “study”“research project”, “analysis”, and “insurance”.Among the impacts, we classified “modeling elaborations”, with regard to environmental impact assessments supporting ecosystem services (Table 15).

Table 15Keywords and associated TF-IDF scores of weather-climate and sea-coast in the DPSIR framework.

With regard to responses, complainants requested typical regulatory approaches in response to all environmental issues (“inspection”, “check”, and “controls”) or self-reporting to ensure compliance with regulations, with violations subject to penalties (fines and compensation payments).Additionally, some complainants suggested management actions such as the direct regulation of “waste disposal” and waste recovery and transport, or statebased responses such as the “remediation” of contaminated waters (e.g., rivers and aquifers), even if such measures were long-term and expensive and, in some cases, not even feasible.

3.5.Semantic network analysis of environmental complaints

Table 16 ranked the 30 most salient keywords according to both degree and betweenness centrality.The keywords of “odor”, “report”, “request”, “presence”, “municipality”, and “hours” were the most influential and meaningful concepts, as demonstrated by their high degree and betweenness centrality values.Thus, complainants were primarily concerned with odor, for which they sought a specific response (they made a “request” or submitted a “report”).The word “request” was extremely relevant to Arpae’s CRM, which aimed at responding quickly,succinctly, and accurately to complaints and requests for general information about policies, practices, and procedures.The word “municipality” represented another central hub, indicating two different semantic concepts:the first referring to the geographical content of the complaints (often very precise and local); and the second referring to the local authorities responsible for managing requests and complaints.In some cases, Arpae was not the competent authority to handle a complaint, and it instead delegated responsibility to different authorities or local municipalities.Duration or occurrence, in terms of the complaint topic, may be understood as a proxy for the magnitude of the concern raised (e.g., the keyword “hours” may have referred to the “presence” of a persistent or recurring compliance problem).Keywords such as “strong” and “odors” represented local hubs, with relatively high degree centrality but low betweenness centrality.This suggested that these words were primarily related to odor and air pollution complaints.Finally, bridging concepts, with relatively high betweenness centrality but low degree centrality, included “site” and “data”.The keyword “site” referred to both the location of the event and Arpae website, where users searched for data or information regarding environmental topics.Furthermore, the keyword“data” were connected with many environmental issues (air and water quality, weather forecasts, rainfall,monitoring stations, etc.).The semantic network analysis provided a visual and quantitative method of representing and analyzing the hybrid connections among environmental driver, pressure, state, impact, and response.Indeed,environmental processes were visually interconnected and the all set of interactions were more diffuse than hierarchical (Cañas et al., 2012).

Table 16Most salient keywords according to degree and betweenness centrality.

Table 17 reported the top 30 interrelationships between the environmental complaint keywords in the DPSIR framework.Keywords connecting odor (classified as impacts) and air pollution (classified as state) were the most important (such as “odor-burnt plastic” and “odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern, possibly related to two main drivers: “odor-factory” and “odors-farms”.In addition, odor was related to pressure arising from industrial activities (“odor-smoke”) and agricultural practices (“odor-spreading manure” and “odor-spreading slurries”).It was noteworthy that, the grievances referred to an odor smelling like“burnt plastic”, might be related to specific productive sectors that potentially can produce olfactory annoyance such as ceramics with digital printing system.Indeed, olfactometric studies performed by Arpae, revealed that color application performed using an organic carrier, during the following firing, will turn into substances characterized by a low olfactory threshold, which are released through the ovens’ atmospheric emissions, often causing odorous issues (“burnt plastic” and “acrid”) in the surrounding area.For the Emilia-Romagna Region, such a manufacturing sector represented a key economic sector but, at the same time, might influence environmental pressures,particularly aldehydes emission, characterized by a high odorous charge (Capuano et al., 2018).Water complaints emphasized interconnections between the biotic state of the environment as a living habitat and pressures (“waterdischarge”, “stream-foam”, “canal-spillage”, and “canal-foam”).The main water-related impacts on human wellbeing were identified as odor complaints: “water-smelly”, “smelly-fumes”, “foam-odor”, and “canal-odor”.Noise pollution complainants primarily requested intervention strategies to be executed by local authorities(“municipality-noise”), aimed at mitigating or eliminating the reported issue (“report-trouble”).

Table 17Top 30 interrelationships between environmental complaint keywords in the DPSIR framework.

4.Theoretical and practical implications

The results of the present study have several theoretical and practical implications.In terms of the theoretical implications, the proposed methodology can be adopted to improve environmental management and engage citizen in policy-related activities.

First, keyword extraction applied to environmental complaints by TF-IDF helps to summarize the content of texts and recognize the main topics discussed.This is of relevance for public administration in the framework of the CRM.

Specifically, keyword extraction based on TF-IDF might provide an accurate entry point for the identification of proactive measures.For example, the present text mining revealed multiple reasons for air pollution or odor complaints, including facility emissions, waste management, livestock processing, and agriculture practices.Thus,local authorities responsible for regulating the impact of odor emissions (as well as Arpae) might define certain odor mitigation measures at the planning stage, including: (i) site assessment and building design for odor control; (ii)stock density planning; and (iii) the selection of different farms or fields at a geographical distance, to prevent the same receptors from being constantly affected and mitigate cumulative odor impacts from the routine spreading offarm waste.For these reasons, a visualizing map, positioning the location of complaints together with data collected by local authorities on a regular basis, might provide new and novel insights at a lower cost and with greater accuracy than new data collection approaches.

Furthermore, semantic network analysis might contribute to the identification of key nodes with a large number of connecting arcs, representing useful indicators that are likely to bear on a large number of issues (Niemeijer and de Groot, 2008).Practically, the identification of central hub keywords in a semantic network might support the selection and investigation of a set of keywords related to decision-making processes in the DPSIR framework.For example, the present analysis revealed that the central hub keyword “odor” was associated with the spreading of manure or slurries.Thus, relevant environmental complaints might relate to farming practices and management,including the use, production, and withdrawal or import of manure, as well as emissions indicators (ammonia,methane, and nitrous oxide) (FAO, 2003; EEA, 2005b; OECD, 2006).Furthermore, because the DPSIR framework hinted at the dynamics of the system, it can provide the conceptual foundation for the development of mathematical simulation models for forecasting the effects of alternative decision-making scenarios on long-term sustainability and health of communities (Salem et al., 2021; Roy et al., 2022; Roy et al., 2023).Such analyses may support the development of a more integrated environmental knowledge base, as a prerequisite for a wider exploitation of analytics for relevant decision-making processes and management activities.

Second, the study represents the first to integrate a validated systems-thinking framework (the DPSIR framework) with environmental complaint data obtained by using text mining.Computational models for complex systems are inherently more complicated than a conceptual framework, but the DPSIR framework can serve to highlight key variables and relationships, for which parameter estimates and functions will be needed, and to identify areas where existing models may be appropriate or where new models are needed (Malmir et al., 2021).Addressing environmental challenges requires a process of problem structuring, to transform unstructured problems into ones that can be effectively addressed with sound evidence about ecological and social system structure and function (Lewison et al., 2016).This process also requires information about how citizen perceives and defines the issues.By helping to structure the analysis of complex systems, the DPSIR framework can act as a heuristic tool forcomplex system analysis.Theoretically, framing the environmental complaints within the DPSIR framework encourages the decision-makers to adopt a system approach and think about challenges to the problem within the larger system, even if in some cases the relationship among driver, pressure, state, impact, and response might appear obvious.

An example of a decision-making process associated with complaint handling refers to foundry cooperatives in Modena Province.As previously reported, despite regular compliance monitoring, many complaints regarding air pollution and odor issues in the present dataset revealed a relationship between these economic drivers (foundry cooperatives) and increased environmental pressures (emissions and fumes).From 2017 to 2022, citizens of Modena Province have taken several courses of action to find an effective remedy for their pollution-related complaints, including petitioning enforcement authorities and higher levels of government, appealing to the media,and engaging in collective action (demonstrations).Arpae took the frequency, duration, intensity, and hedonic tone of odor complaints into account, applying a dispersion modelling of odor emission rates to determine the number of hours of odor exposure to receptors in one year.Following the failure of other enforcement tools, in March 2022,Arpae decided to revoke permits to foundry cooperatives in Modena Province, thus definitively closing the installations (Arpae Emilia-Romagna, 2022).This case provides a clear illustration of public involvement in decision-making processes.

However, the proposed framework is unable to identify all possible interactions among variables, as anthropogenic and natural systems are extremely complex, and they can be hardly represented by a set of keywords and indicators.Moreover, the categorization of keywords is a subjective process that has not yet been codified, so the keywords selected to represent environmental variables may differ from those determined by other methodologies.Furthermore, the DPSIR framework has biases toward the perspectives and concerns of several stakeholder groups, and social and economic aspects (Svarstad et al., 2008).It has been suggested to couple the DPSIR framework with analytical methods like multiple-criteria decision-making (MCDM) (Malekmohammadi and Jahanishakib, 2017), Analytic Hierarchy Process (AHP) (Neshat et al., 2014; Sun et al., 2016; Liu et al., 2018), and the structural equation model (Sun et al., 2018).

Nonetheless, the proposed approaches (methods and tools) to environmental issue monitoring and decisionmaking support can be used by different organizations (e.g., public, local authorities, companies, citizen associations, etc.) in different contexts, with minimal human effort, through the customization of application dictionaries and domain-specific keywords and topic clusters.

5.Conclusions and future research

The present study established a framework for the text mining of Italian environmental complaints and analyzed 1-year environmental complaints recorded by Arpae in the Emilia-Romagna Region.The DPSIR framework was applied as a conceptual model to cluster the main keywords describing environmental complaints, using a “bottomup” approach (from citizen to policy-makers and decision-makers).The proposed approach recognized that local communities (represented by municipalities) are critical to sustainable environmental management measures, as they determine what and when they will adjust.The present results showed that the majority of complaints were made by citizen and associated with air pollution and odor.The important drivers of environmental complaints included factories (particularly foundries and ceramic industries) and farms, which mainly affect human well-being.Inspections, controls, and management actions represent the main response activity requested by complainants.

Despite the significance of the findings, a limitation of the research is that the data cannot necessarily represent the environmental concerns of citizen who is not active in filing complaints to Arpae.Furthermore, demographic variables were not included to control for bias in the geographical distribution of complaints, and the study analyzed data for only one year.Consequently, changes in the occurrence of complaints over time could not be addressed.For this reason, attention should be taken when generalizing the results, as the data might not be representative of all environmental complaints within the Emilia-Romagna Region or even in Italy, more broadly.Additionally,environmental complaints may be influenced by many factors (complainants’ education quality, digitalization,geographical location, age, gender, etc.).

Future work should aim at exploring how stakeholder perceptions change in particular communities, as strategies are incorporated to increase awareness and address environmental public health concerns.Over time, a greater understanding of environmental public health issues (such as those discussed in this paper) might improve communication to key segments of the public.The DPSIR framework might further allow Arpae to refine and focus its communication strategies and information dissemination to reach out to stakeholders who may be particularly impacted by environmental management.

The development of mathematical simulation models requires integration and coordination across several disciplines including social, ecological, and health sciences.When fully parameterized, the DPSIR framework can support economic cost-benefit analysis of various responses (management actions) and the value of the impacts(ecosystem goods and services).

As the maturity of text mining tools, they are being combined with structured data analysis and reporting to better integrate into decision support processes and systems.Advanced analytics tools and well-designed platforms supporting multiple channels for the purpose of accepting requests can now rapidly identify and visualize public concerns at a global scale, and thereby address unknown, unappreciated, or underestimated drivers, pressures, and impacts.

Future implementations on Arpae’s complaint-handling process could be based on rules trained on classification and training rules on complaint keywords revealed by text mining, to realize the automation (or partial automation)of the complaint process.The goal is to train a machine learning approach that provides automated support for finding recurring patterns in the environmental complaints, thus providing a human employee with a starting point for ticket (claim) tagging, by classifying a record as a request for information, data, complaints, disservices, or environmental incidents, each of which should be handled separately and individually.We can use machine learning algorithms to automatically categorize and prioritize environmental complaints based on their content and other factors.This can help to reduce the workload of URP operators and ensure that tickets are handled more efficiently.Furthermore, machine learning algorithms might generate automated responses to environmental complaints based on the content of the ticket.This can help to reduce response times and improve the overall customer service experience.

To this end, the present research could be used as a valuable historical reference case.Future research could also explore topic-based sentiment analysis to monitor public opinion, study related discourse, and understand the impact of environmental issues on human health and well-being.Moreover, inspired by the literature on domainspecific aspect-based analysis (Distante et al., 2021; Faralli et al., 2021), sentiment analysis could be further developed by leveraging users’ emotions, as abstracted from their claims.This could automatically identify and distinguish complaint, which may express emotions such as fear or anger, or may include neutral emotions.In conclusion, the present study represents a preliminary step toward a better understanding of environmental complaints.Arpae is interested in exploring this line of research, with the aim of enhancing citizen participation in decision-making and promoting sustainable development.

Finally, in order to make the research replicable and reproducible, the developed Orange projects and some project data are available at https://github.com/fabimanservisi/ORANGE-ARPAE-/.

Authorship contribution statement

Fabiana MANSERVISI, Damiano DISTANTA, and Stefano FARALLI: conceptualization, methodology, writing- original draft, and editing.Michele BANZI, Giuseppe BORTONE, Susanna RICCI, Tomaso TONELLI, and Paolo VERONESI: conceptualization, methodology, writing - review & editing.

Declaration of competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.