APP下载

Research on the Construction and Application of Russian Abbreviations Corpora

2020-12-19

School of Foreign Language,Dalian Maritime University,Dalian,China Email:wyxydxm@dlmu.edu.cn

Li Yaoyao

School of Foreign Language,Dalian Maritime University,Dalian,China Email:739832466@qq.com

Liu Xiao

School of Foreign Language,Dalian Maritime University,Dalian,China Email:445232329@qq.com

Zhang Yimin

School of Foreign Language,Dalian Maritime University,Dalian,China Email:196038629@qq.com

[Abstract]With the development of computer technology,corpus linguistics,due to its accuracy,dynamism and quantitative feature,has gradually gained the attention of scholars.The construction and application of corpus in various fields has become a trend,especially under the artificial intelligence.Russia is our important partner.In the new era,the China-Russia comprehensive strategic partnership of cooperation is becoming more mature.Therefore,the relevant expressions in fields of cooperation and exchange between two countries in Russian language,especially abbreviations,is worthy of study,and will complement relevant researches in corpus linguistics.Setting Russian abbreviations in Russian mass media as research objects,this paper aims to analyze characteristics of these abbreviations,the necessity of corpus construction,the design overview and basic composition,etc.,and elaborate their application value and practical significance.

[Keywords]Corpus Linguistics;Russian Abbreviations;Construction and Application of Corpus;China-Russia relationship

Introduction

As an important driving force in the new round of scientific and technological revolution and industrial transfor⁃mation,artificial intelligence not only brings new opportunities for scientific development,but also provides new methods for language research.With the development of Internet technology and machine learning,natural language processing is becoming more mature as well.Corpus approach serves as a milestone in this area,and its practical val⁃ue has been proven in many application systems.A modern corpus has been regarded as“an integration of comput⁃er-processable linguistic data containing large amounts of information.”(Yi et al.,2004)Establishing a corpus can help deal with a range of data and provide language researchers with data resources,improving the authenticity,openness and reusability of language research.

The abbreviation is a very important part of the lexical system.Abbreviations are compact in structure,concise in form and clear in meaning,while covering a wide range with a huge amount.Therefore,building a corpus of abbre⁃viations in various fields can help us collect and complement abbreviations and examples of their use,and explore their rules of development more accurately and intuitively.

Therefore,based on the Russian-Chinese linguistic background,the authors of the paper chose Russian abbre⁃viations in various fields of cooperation and exchange between China and Russia as the research object,aiming to build a Russian abbreviation query system and to explore the construction and usage features of the vocabulary it⁃self,and to elaborate the application value of this corpus,etc.

Corpus Linguistics and Russian Abbreviations

Corpus linguistics is an emerging interdiscipline of contemporary linguistics and computer science,which is re⁃garded as“a language research based on real-life examples of language use”.(McEnery et al.,1996)Russian Cor⁃pus Linguist В.П.Захаров defined“корпус”more concretely as“an authoritative integration of structured elec⁃tronic language data through a considerable scale of unified tagging,aiming to solve out specific language prob⁃lems”.(Анатольевна,2012,p.159)Thus,corpus linguistics is a method of studying language by the computer-op⁃erated corpus,and is a major development in modern linguistics.The study of corpus linguistics in Russian language began in the 1970s.However,China attaches less importance to corpus linguistics in Russian language,and not many schools and colleges have established disciplines in this field.

Russian abbreviations developed from the end of the 19th century.Russian abbreviations emerged in large num⁃bers especially after the October Revolution,and reflected a trend of diversified development at the end of the 20th century.(Yang,2013,p.81)“Abbreviation is one of the most effective ways of forming words in modern society,which is very common in both spoken and written language.They are mostly found in the forms of countries,political parties,institutions,scientific literature,media,etc.,and they continue to penetrate into people's daily lives and dis⁃course.”(Блох,2014,p.187)Many of the significant researches in the field of Russian abbreviations are represent⁃ed in the form of abbreviation dictionaries,such as the Russian abbreviation search site www.sokr.ru and so on.Due to the huge amount and strong variability of Russian abbreviations,it is necessary to classify them into fields.More⁃over,the variation of Russian abbreviations reflects the process of development of various fields of Russian society and is a part of the data that should not be neglected.

Construction and Usage Features of Russian Abbreviations

Russia is China’s largest neighbor to the north.After 70 years of changeable international situations,the rela⁃tionship between two countries has become more stable and resilient.Standing at a new historical starting point and facing new historical opportunities,both countries are jointly planning to advance the China-Russia comprehensive strategic partnership of cooperation into a new era.Therefore,many news reports in Russia’s domestic mass media discussed cooperation and exchanges between the two countries in various fields,including the strategies of major conferences in China,international head of state meetings,Russian-Chinese cooperation in a particular field,speeches by national leaders,etc.Many abbreviations appear because many names of countries,organizations,insti⁃tutions and policy programs are involved.If we rely only on traditional manual search methods,we cannot meet the realistic needs of massive information processing,and with the formation of new policies,new organizations and insti⁃tutions,a new batch of words and concepts also emerges,which requires us to discover and grasp the laws and char⁃acteristics of their construction in a timely manner,and promote the standardization of the use of abbreviations in this field.

The characteristics of usage are mainly reflected in the following aspects:

Economical principle.The economical function is one of the basic functions of abbreviations,whether in news re⁃ports or spoken language.The use of abbreviations makes articles more concise and compact,saving space and pro⁃moting conversations easier.

Contextual meaning-based.For example:Причём-опираясь на основополагающие для АТЭС принципы и вне зависимости от какой-либо геополитической конъюнктуры.(Moreover,regardless of the geopolitical situation,we should abide by the basic principles of APEC.)“АТЭС”in this sentence corresponds to two full forms in dictionary:“атомная теплоэлектростанция”(nuclear thermal power plant)and“Азиатско-Тихоокеанское экономическое сотрудничество”(Asia-Pacific Economic Cooperation).But in the context of this sentence,“АТЭС”can only refer to Asia-Pacific Economic Cooperation.

Easy delivery or spread.Relevant policy documents need abbreviations that are easy to spread and understand.Its conciseness and clearness provide great convenience in actual use.

Fast update.As continuous policies are introduced and new organizations are created,new abbreviations are generated as well.Some abbreviations may have new meanings in new eras.

It can be reflected that these abbreviations in amounts,updated in real time,make communications,and transla⁃tion work by researchers,more difficult.Therefore,if we build up a digital,computer-operated corpus system,it will greatly facilitate our natural language processing problems.

Although the classification of the basic structures of Russian abbreviations in traditional linguistics is not uni⁃fied,it basically follows the G.K.Zipf’s law for word frequencies,i.e.,the principle of simplicity for the speaker and the principle of proximity for the listener.(Yi et al.,2013,pp.18-21)One of the relatively comprehensive classifica⁃tions combines the methods in Russian Grammar by the Academy of Sciences of the USSR in 1980s and Encyclopae⁃dic Dictionary of Linguistics in 1990s,which are two representative monographs in linguistics.It divides Russian ab⁃breviations into six types:initial type,syllabic type,mixed type,compound noun type,oblique case type,and initialfinal type.(Сюй,2012)Russian abbreviations in this field are mostly initial type,which are mainly divided into the following categories:

The first category is acronym.The initial phonemes of every original word can be combined to form a common word for pronunciation,for example:

МИД-Министерство иностранных дел(Ministry of Foreign Affairs)

Инициатива“Один пояс,один путь”не является долговой ловушкой,факты свидетельствуют о значительных выгодах для всех стран- участниц инициативы,которые активно ее поддерживают,заявил на пресс-конференции в пятницу глава МИД КНР Ван И.(RUSONLINE.ORG 19.04.2019)

The second type is initialism.The initial letters of every original word form a new word,but the word is pro⁃nounced by the sound of every single letter,for example:

ЭПШП-Экономический пояс Шелкового пути(The Silk Road Economic Belt)

В связи с этим Си Цзиньпин призвал применять на пространстве Евразии новые модели сотрудничества,общими усилиями формировать ЭПШП и предложил пять необходимых для этого мер.(РИА НОВОСТИ 14.05.2017)

Third one is the hybrid composition,for example:

ЕАЭС-Евразийский экономический союз(Eurasian Economic Union)

Россия и Китай договорились активнее работать по интеграции процессов в рамках ЕАЭС и китайской инициативы «Один пояс — один путь».Об этом сегодня,3 октября,заявил президент РФ Владимир Путин на пленарном заседании дискуссионного клуба«Валдай».(EADaily 03.10.2019)

Fourth,it also includes some composition types that are special and not fixed special,for example,loanwords:

АСЕАН-Ассоциация государств Юго-Восточной Азии(Association of Southeast Asian Nations,ASEAN)

А пять лет к стратегии «Один пояс — один путь»,объединившей предложения по интеграции,присоединились Россия,другие страны ЕАЭС,АСЕАН,Саудовская Аравия и другие государства.(ИЗВЕСТИЯ 06.11.2018(ИЗВЕСТИЯ 06.11.2018)

ССАГПЗ-Стратегический диалог Китай-Совет сотрудничества стран Залива(China-Gulf Cooperation Council Strategic Dialogue)

Главным стимулом создания организации стали политические цели,включая окончание ираноиракской войны.ССАГПЗ стал первым союзом в сфере обеспечения региональной безопасности.(Мировая экономика 6.2014 С.88)

In summary,as abbreviations in fields of cooperation and communication between two countries are mainly the names of countries,organizations and institutions,cooperation alliances,etc.,their meanings are basically fixed and their formations are relatively simple,which is convenient to build up a corpus and process the data,and is also ben⁃eficial to study the relationship between abbreviated forms and full forms,and abbreviation prediction,etc.

Construction of Russian Abbreviations Corpora:Steps and Challenges

The overall design of the corpus requires to plan and consider the corpus from an all-round view.Combining the author's professional background and the practical needs of building this corpus,the Russian abbreviations cor⁃pus is defined as a Russian-Chinese parallel corpus,which is built according to the following steps:

Data collection,entry and storage.“The main phase of building up a corpus is collecting texts,including encod⁃ing(manual typing,scanning and format conversion)and proofreading.”(Салчак,2013,p.408)This is a stage where the workload is relatively large.Data collection should follow the principles of diversity,completeness and ac⁃curacy,based on the original text.However,due to the wide range of sources and huge amounts,it is difficult to cov⁃er all the data.In addition,the high repetitive content of news reports makes it difficult to extract useful texts and lat⁃er sort these texts.

Alignment processing.Corpus alignment refers to the alignment of texts.To build up a Russian-Chinese parallel corpus,it is necessary to align Russian and Chinese texts at sentence level,and the main tool is Déjà Vu.However,it is difficult for this tool to achieve the complete accuracy when processing text.Some texts will be lost,while the or⁃der of some texts may change.This requires large workload of manual assistance and spend more time.

Tagging rules and tagging.A tagging system should be set up according to the characteristics of texts.The texts,including the pronunciation,construction,concrete origin and formation of the vocabulary,should be tagged.In addition to specific information,different classifications can be made according to practical needs.For example,abbreviations can be divided into political,economic,cultural,military,diplomatic types and so on,according to the fields they belong to.However,abbreviations are dynamic,open language elements,and their forms and meanings are in constant change,so it is difficult to develop a tagging system that is completely fixed.How to tag them while avoiding ambiguity is also one of the difficulties.Moreover,according to different research objectives and application needs,multi-functional processing software still needs to be developed for better corpus processing and handling.

Timely updating of information in the corpus.After building up the corpus,it is still necessary to update,main⁃tain and manage the corpus.Corpus linguistics is a discipline formed at the intersection of linguistics and computer science,requiring researchers to have professional knowledge of computers and mathematics.However,many lan⁃guage researchers lack knowledge in such fields.

Construction of Russian Abbreviations Corpora:Values and Significance

One of the key reasons why the corpus is flourishing is that it has great application values and practical signifi⁃cance.As the applied linguist McCarthy said“the corpus represents advanced technology and methods,and it em⁃bodies a dramatic change in linguistic research.This change will alter the conceptions of the role of teachers,cultur⁃al background in education and technology existed in our mind for a long time.”(McCarthy,2001)In addition,as a language phenomenon,abbreviations are related to cooperation between Russia and China in multi fields.The con⁃struction of this corpus is of great significance both for the study of language research itself and for the understand⁃ing towards the cooperation between two countries.

From the perspective of lexicology,it is possible to build a lexical resources database,providing rich data for the compilation of professional terminology dictionaries,machine translation,professional glossaries and even the construction of multilingual parallel corpus.

From the perspective of linguistics,a mechanism for sharing the basic resources of a corpus can be established.The linguistic and rhetorical functions of abbreviations,or different features of abbreviations at different stages of co⁃operation between Russia and China in a certain period can be studied under such a mechanism to improve the rele⁃vant linguistic theories.

From the perspective of researchers,the corpus can not only provide foreign language learners with a large amount of real language materials,build a reliable data platform for vocabulary learning and thesis writing,but also offer an empirical research method based on corpus data to foreign language researchers,making their research more scientific.

From the perspective of cross-cultural communication,“the construction of a corpus can provide rich linguistic data for diplomacy and foreign language broadcast and translation,which greatly improves the quality and efficiency of foreign language translation.”(Zhang,2018,p.43)Language is a part of culture.Grasping,translating and broad⁃casting these terms well can help the world understand China’s position and culture in a more comprehensive and objective way.

Conclusion

The development of information technology has matured the natural language processing and provided new cor⁃pus-based methods for linguistic research.A corpus method that can process data on a large scale can transform seemingly chaotic linguistic phenomena into regular and meaningful linguistic knowledge,enabling us to study lan⁃guage deeply and efficiently,and explore new unknown areas.We should seize the opportunity of scientific and tech⁃nological development,apply them to existing research and learning,seek new ideas and methods for linguistic re⁃search,and explore more convenient and reliable technical support for corpus linguistics research in the new era.

Acknowledgements

This paper is part of the project of National Social Science Foundation of China,Research on compilation of Russian-Chinese academic dictionaries based on parallel corpus(grant# 17BYY220)and the project of Liaoning Province Key Research and Development Plan and Guidance Plan,Sino-Russian Cooperation on Ice Silk Road and Cultivation and Construction of Arctic Waterway Talents(grant#2018401030).