APP下载

Practice of marine data sharing:a case study of online sharing for marine data in private network environment

2023-02-20CAOShengwenSONGLiliJIANGXiaoyiYUEXinyang

Marine Science Bulletin 2023年2期

CAO Shengwen,SONG Lili,JIANG Xiaoyi,YUE Xinyang*

1.Technology Innovation Center of Marine Information,MNR,Tianjin 300171,China;

2.National Marine Data and Information Service,Tianjin 300171,China

Abstract: Marine data is characterized by its wide variety and large volume.A large part of the data has not yet been shared, because the classification and grading standards for marine data have not been well-established.For non-public data, how to make it available to users without being downloaded has become one of the keys of marine data-sharing services.This paper has carried out the practice of a one-stop marine data online sharing service and realized the data online use, downloading, and visualization functions.The virtual terminal is used to build a data application platform for users, which initially solves the contradiction between data confidentiality and data sharing.Finally,some suggestions are made for the current situation of data sharing in the marine field.

Keywords:marine data,open access,online sharing,virtualization

1 Introduction

Marine science dates back to the 1840s, as a knowledge system that studies the natural phenomena, properties and changes of the ocean, as well as those related to the development and exploitation of the ocean.Marine science is an important part of earth science, and its research object is the ocean which covers 71% of the earth's surface,including seawater,dissolved and suspended substances in seawater, living organisms in the sea, seabed sediments and seabed lithosphere, as well as the atmospheric boundary layer on the sea surface and estuarine coastal zone.The research fields of marine science include basic research on physical, chemical, biological and geological processes in the ocean, as well as applied research on the development and exploitation of marine resources and military activities at sea.With the development of modern technology,marine science has gradually developed from the traditional category of physical geography into an independent discipline, forming a comprehensive modern marine science system consisting of many disciplines including physical oceanography,biological oceanography,marine geology,etc[1].

Since its emergence, marine science has made continuous development and breakthroughs in basic theories, research methods, observation systems, scientific research and other aspects[2],and various branches and disciplines have penetrated each other and deeply integrated.The Internet of Things, cloud computing, big data, network communications, satellite remote sensing and other technologies have promoted the in-depth development of ocean three-dimensional observation, data analysis and mining and other fields.The marine field has entered the era of big data.The comprehensive,continuous, multi-source and three-dimensional observation has made the marine data resources reach the EB level[2], and the daily increase also reaches the TB level, and marine data has the typical characteristics of 5V(Volume,Velocity,Variety,Value,Veracity)of big data.In the era of big data,it is important to promote the open sharing of scientific data to strengthen the utilization and enhance the application value of these data[3].Marine data contains great value and provides unprecedented rich information for humans to perceive,understand and control the physical world more deeply[4].As an important national strategic resource, promoting marine data sharing is of great significance for humans to care about,understand and manage the ocean.

There are a wide variety of marine data and a large amount of data.However,due to the lack of classification and grading standards for marine data,a large part of the data has not been shared[5], which makes it difficult for ocean-related organizations, scientific research institutes and enterprises to make full use of existing data in scientific research and business.For public marine data, there are many data-sharing platforms that provide data-sharing services[6-16].For non-public data, how to make it available to users without being downloaded has become one of the keys of marine data-sharing services.As a marine data management center,National Marine Data Information Service has shared the collected marine data with the ocean-related institutes, to maximize the value of data and improve the exchange and service capacity of marine data.

2 Data planning and management

2.1 Data planning and classification

The online sharing service of marine data practiced in this paper mainly includes marine environmental data such as marine surveys, operational observation and monitoring, international marine data operation, international cooperation and exchange,basic geographic and remote sensing data products, analysis and forecast products,marine management data, atlas reports and information service interfaces.Among them,the marine survey data is based on the comprehensive survey and evaluation of China's offshore(908 project), including 9 disciplines, such as hydrology, meteorology, biology,chemistry, optics, geology (bottom material), topography, geophysics, and water remote sensing, operational observation data include station, buoy, radar and cross-section survey data, international operation marine data include GTSPP, WOD, GTS, IMMA,DBCP, etc, international cooperation and exchange data include 14 survey projects, such as the atmospheric survey, the China-USA air-sea interaction survey, and the China-USA joint survey of the Yangtze River Estuary; The analysis and forecast products include the western Pacific regional real situation analysis products,reanalysis products and statistical analysis products, with the grid resolution of 0.125 ° and 0.5 °, and the elements cover temperature,salinity, density, sound velocity, geostrophic current,etc.

2.2 Data organization and database table design

In this paper, the above data are managed at the data set granularity, and the data organization mode is‘metadata+data list+entity file’or‘metadata+data list+entity file+element information’.The level of data organization depends on the type of data and the requirements of users.If the entity data file of a dataset is a standardized data format,such as NC format, there is a need for splitting or visualization, and the entity data file can continue to be parsed and stored in the corresponding element table.

The division of data sets is flexible.Generally, the same or similar data can be classified into the same data set according to the specific acquisition means,disciplines,or elements.For example, the observation data from 14 public stations in China can be classified as one data set.If the number of data files of the same category is large and the dataset is large, the data set can be split according to the time, such as the seawater temperature data set (2010-2019) and the seawater temperature data set (2000-2009).The metadata table records the information of all data sets loaded by the platform,including the list table name, data size, list number, data description, data format description, data time, corresponding element table name, creation time, update time,copyright and other information corresponding to the data set.The list table records the list information of all files in a data set,including file name,file size, update frequency, update time, file format and file path.Theoretically, one list record corresponds to one entity file,and the information in the list table will be slightly according to the type of data set.For example, the list table of the Chinese station observation data set will include the code of stations to distinguish different stations.The element table is used to read and parse the entity file of a data set and extract some of the information needed to display the application separately, so that the front-end application of the platform can be called directly, reducing data parsing time, and improving user experience.The relationship between the metadata table, data list table, element information table, and entity file is shown in Fig.1.

Fig.1 Data organization and database table design diagram

3 System function and architecture

3.1 Service architecture design

The marine data online sharing service system mainly includes three parts, they are portal service system,approval management system,and data download and upload tools,as shown in Fig.2.The portal service system mainly includes data retrieval,data collection,data order, data export, virtual machine application and other functions.All operations related to data applications are performed in the portal service system.The approval management system is mainly responsible for the approval of user-related applications,including user applications, data applications, data preparation, virtual machine applications, and data export approval.The approval results can be viewed through the portal system.The download and upload tools are mainly used in virtual machines,used to download the data and uploaded local files that users have applied for online use to the virtual machine,as well as to upload the results they want to export and wait for approval.

Fig.2 Function module diagram of marine data online sharing service system

The online sharing service system uses virtual terminals as a bridge to build a data usage platform for users.The virtual machine is similar to an ordinary PC, but it does not support users to copy data to local, to prevent illegal data transmission.In the process of using the virtual machine, users will be involved in the import and export of data, that is,how to export the data requested by users to local and how to upload the user's personal information to the virtual machine.All application operations of users are carried out in the portal service system, including data application and data export functions.Since the data application and achievement application involve system approval, based on the security perspective, the intermediate machine is needed for the data transit.Fig.3 shows the relationship between the virtual machine,intermediate machine and portal system.

Fig.3 Schematic diagram of the relationship between virtual machine,intermediate machine and portal system

Fig.4 The data flow diagram of data export

Fig.5 The data flow diagram of user data upload

3.2 Service function design

The marine data online sharing service system provides catalog services, data sharing, data collection and data order submission, virtual machine online use and other services.The submission of data orders involves the filling and submission of corresponding application approval forms, which is convenient for the review of the approved management system to avoid the mailing of previous relevant data applications and greatly save approval time.

3.2.1 Catalog service

The data catalog service is to use the above data products to organize the metadata into catalog lists according to the standards, to provide metadata catalog service for data exchange and help users to discover, retrieve and locate spatial data.The catalog service is mainly based on the metadata and the data set list, and organizes data resources into data and product catalogs according to unified standard rules, which facilitates users to locate, query, download data and products, and supports the dynamic configuration of catalogs.The catalog service mainly realizes the registration,query,maintenance,release,navigation of metadata, user management and log, as well as ocean data extraction and other functional services.

3.2.2 Data sharing

Data-sharing methods include data retrieval, online download, online use,point-to-point distribution and visual display.Data retrieval is to provide corresponding data query and retrieval conditions according to the characteristics of marine data, including arbitrary query, fuzzy matching query, providing data details and online preview function.Online download and online use are two sharing methods for public data and non-public data, respectively.Public data allows users to download to local directly, while non-public data needs to be used in virtual machines.Point-to-point distribution is the distribution and sharing function for users with special needs for specific data, such as regular data distribution and other operational data.According to the types of marine data and products,and the attributes of subject elements, visual display provides statistical analysis, chart visualization,multi-dimensional data visualization and other services to visually display the characteristics of marine data resources.

3.2.3 Collections and orders

The data order module provides users with application approval services for online use of internal data and requests data, downloading of public data, and providing management of application approval orders.Users submit the data they required,apply for online use or download the data they are interested in, and then go to the data order module.

The data order module is similar to the ‘Purchased Product’ function of Taobao,which allows users to view the status and details of the orders submitted, including the details of various data under each order,and whether the order is processed or the data is approved, as well as the deadline for data usage.You can also cancel orders that have been submitted and have not yet been processed.

3.2.4 Virtual machine

The virtual machine is a virtual terminal for online data usage and adopts a virtualization security policy to provide users with an access portal to the virtual working environment to prevent illegal user connection, data interaction between local and virtual working environment is prohibited, and illegal data transmission is prevented.Users can upload their data to the virtual machine, or export the request data to local after approval.The data requested by users in the virtual machine need to be uploaded to the intermediate machine through the upload and download tool (transmission client).After approval, you can view the current data export application in the web portal system.If the application is passed,you can click the link to download it.

3.2.5 Data export

Data export is used to export the user's requested data in the virtual machine to local.The data requested by users in the virtual machine need to be uploaded to the intermediate machine through the transmission client.After approval, you can view the current data export application in the web system.If the application is approved, you can click the link to download it."Upload"and"Download"are actions that need to be triggered by users.

Users can query the applied data and view the approval status.If the current application is approved,the"download link"of the data will appear.Click to download.The data download process is the same as the order data download process,and the file copy publishing service is also called.

3.2.6 Data upload

Since users are not allowed to copy local files directly to the virtual machine,the data upload is the only entry for users to import personal data into the virtual machine in the portal system.Users can upload their data through this portal, then show the uploaded records in the client of the virtual machine, and select to download them to the virtual machine.Data upload does not need to be approved.‘Upload’and ‘Download’are actions that need to be triggered by users.

4 Conclusion

The marine data online sharing service system is deployed on the marine information and communication network and provides data-sharing services for the ocean-related institutes of the Ministry of Natural Resources.Since 2015,the system has been providing marine environment observation data and coastal remote sensing data after quality controlled to North China Sea Bureau, East China Sea Bureau, and South China Sea Bureau every quarter.The entire data service has simplified the former previous process of paper submission of data use applications and facilitated users.

This paper has carried out a one-stop marine data online sharing service practice for the application service needs of the private network user group for marine data resources,realized the online use, download and visualization functions of data, and built a data use platform for users in the way of virtual terminals, initially solving the contradiction between data confidentiality and data sharing.

Acknowledgments

This work was supported by the independent research projects of Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)(No.SML2021SP102)and China Knowledge Center for Engineering Science and Technology Project(Grant No.CKCEST2022-1-4).