在大数据时代,通过对海量数据的定量分析来揭示人类文化演变趋势的研究被称为“文化组学”(culturomics)。该概念源自哈佛大学的J.-B.Michel和E.L.Aiden研究小组于2011年在《科学》杂志(Science)上发表的《基于数以百万计数字化图书的文化定量分析》一文。之后,Aiden和Michel再度合作,于2013年出版了《未知世界:透过“大数据”棱镜窥探人类文化》一书①,详细介绍了“文化组学”研究及其应用。“文化组学”研究促成了自然科学和人文科学的联姻,促进“数字人文”(Digital Humanities)这一新领域的形成。本文旨在对该书进行简要述介,以期引起学界对“文化组学”领域的关注,从而把握大数据时代人文科学研究的新趋势。


全书共分七章。第一章总体介绍“文化组学”的定义,即利用大数据对人类文化进行定量研究。著者认为,大数据将改变人文科学,改造社会科学,重新界定象牙塔内外世界的关系(8)。研究以“谷歌图书语料库”为基础,该语料库收录的是16世纪以来出版的、包含英、法、德、西、俄、汉和希伯来语等7种语言的3000万册图书的电子化文本,总计达5千亿词,占人类有史以来出版书籍的6%。谷歌图书语料库的文本纵贯5个世纪,故能反映出人类行为模式的变化、文化的变迁乃至文明的兴替,因此,它不仅是“大数据”,更是“长数据”(long data)。然而,由于受图书版权之限,研究者无法直接利用图书内容进行研究,为此著者开发了“谷歌图书N-gram②阅读器”(Google Books N-gram Viewer,以下简称N-gram Viewer),该阅读器可将语料库中的词汇每年的使用频率变化以曲线图形式进行可视化呈现。因此,它就像一面棱镜,借此可窥探人类文化的演变。




第三章通过语料库探索词典编纂的“盲区”,即未被词典所收录的词汇。首先,研究发现,大部分英语词典仅收录高频词汇,而占词汇总量52%的低频词则未能进入词典,它们构成了词库中的“暗物质”(lexical dark matter)④。由此,著者认为,英语词汇在某种程度上仍是一片“未被发现的大陆”(76)。其次,由统计可得,1900年前后,英语词汇总量已逾55万词,至1950年仅增至60万,到2000年则增至100万词,现今每年新增8400词左右,可见词汇呈加速增长趋势。此外,研究还发现,词典学家虽竭力追踪新词,但词典仍无法及时反映英语词汇的最新变化。以2000年出版的《美国传统词典》第四版为例,它收录的新词有mesclun、netiquette、amplidyne等,但借助N-gram Viewer可知,mesclun和netiquette两词在1992年时的频率就已达到被该词典收录的标准,而amplidyne早在1950年就已达到频率峰值,在2000年则已成为旧词。由此可知,通过N-gram Viewer可定位词汇的“兴衰”,促进词典的更新,探索词汇的“未知世界”。

第四章通过N-gram Viewer来计算名气。如果将人的名气视作是其名字在谷歌图书中出现的频率,则名气可加以计算。总体而言,谷歌图书中人名频率曲线呈现某种共性,即都包含初次成名、快速增长、达到巅峰以及缓慢衰落这4个阶段。著者通过以下5个具体方面来测算名气:(1)初次成名时的年龄;(2)名气翻倍所用的时间;(3)名气达到巅峰时的年龄;(4)名气的半衰期;(5)名气与职业的关系。研究发现:人的名气达到巅峰时,其年龄一般稳定在75岁,但其他方面则有历时变化。以1800年和1950年作为先后考察时间点,人们初次成名的年龄从43岁降至29岁,名气翻倍所需时间从8.1年减至3.3年,名气半衰期从120年跌至71年。简言之,现代人出名更早,成名更快,但被人遗忘也更快了。就名气与职业的关系而言,研究也有惊人发现。数据显示,演员成名一般在30岁左右,成名最早;作家成名在40岁左右,最终名声更盛,且持续时间更长;政治家成名在50岁左右,成名虽晚,但名声最盛;科学家成名则在60岁前后;艺术家和数学家成名几率最小。由此可见,N-gram viewer将名气这一主观化事物进行定量化和客观化测算了。事实上,Veres和Bohannon(2011)已通过定量研究对4000多位科学家的名气进行排序,并在《科学》杂志上发表了“科学名人堂”一文,本章可视作是对该文的拓展。

第五章展示如何通过N-gram Viewer追踪出版审查制度和政治压制。假设语料库中的某些词汇或人名在某一时段内突然“销声匿迹”,则很可能是因为这些词汇或人名在书籍中被禁用。著者通过比对德语和英语的谷歌图书来考察纳粹德国时期的审查制度和政治压制。谷歌图书显示:犹太画家Marc Chagall在1910年前后开始成名。但是,在英语图书中,其名气持续上升,而在德语图书中,其名气在1936年至1944年期间却跌至低谷,显然这是因纳粹德国对犹太人的迫害而导致该画家被“消音”。在历史上,有些政治压制规模大,涉及人数多,被压制者虽被列入“黑名单”,但却未必记录在案,譬如斯大林时期的苏联大清洗运动以及美国“好莱坞十君子”事件中的政治审查。然而,借助N-gram viewer对词语或人名频率变化的考察,可以自动监测到某个人或某种思想是否遭受过审查或压制。

第六章是通过大数据研究集体记忆和集体遗忘。著者指出,像集体记忆这样的概念以往通常被排除在科学调查之外,而通过N-gram Viewer对其进行研究也并非难事(153)。著者以年份数字为例来探究集体记忆的特点,通过该年份数字的频率变化来观察该年度的事件是如何被人们所记忆的。研究表明,人们对某一年份的遗忘速度呈现先快后慢的特点,符合艾宾浩斯遗忘规律。然而,随着社会发展,人们遗忘的速度越来越快,很快便对过去的事物失去兴趣。譬如,1872这一年份数字的半衰期为24年,而1973年份的半衰期仅为10年。著者也考察了与集体遗忘相对的“集体学习”的形成过程,即新事物如何进入人的“集体意识”。著者以维基百科全书中147项发明专利为例来观察新事物被大众接受的过程,统计发现,在19世纪初,先进技术需要经过65年左右才能被主流文化所接受,而到20世纪初,仅需26年即可,可见人们对新事物的接受速度越来越快。




第一,沟通科学和人文,促进“数字人文”发展。早在几年前,哈佛大学的Gary King教授就曾预言,随着大数据的出现和使用,整个社会科学研究的实证基础将会出现重大的变化,甚至会加速定性与定量研究的大融合(King 2009)。本书借助定量分析,探索了语法演变、词典编纂、名气测算、审查压制以及集体遗忘和集体记忆这些人文社科领域的重要话题。在传统的观念看来,这些领域很难开展定量研究,但本书通过庞大的数据库较为客观地将其加以呈现。可以说,“文化组学”为人文科学研究提供了一种全新的研究方法,促进了“数字人文”学科的发展。短短两三年来,国外已有学者采取“文化组学”视角探索情感挖掘、冲突预测、大学排名变化、气候演变、复杂关系测算等多个领域的研究,相关论文不下百篇,可见该书影响之巨大,意义之深远。



第一,基于N-gram Viewer的研究脱离语境,有时不免以偏概全。N-gram Viewer过分倚重词汇频率分析,而无法考察词汇所在的语境。譬如,在探讨名气时,谷歌图书中人名的出现频率只能衡量名气的大小,而无法判断名气的好坏。此外,单纯用词频来代表文化影响力虽是一种易于操作的办法,但仅通过曲线难以判断该变化是否具有显著性。如果能辅以一些统计方法对这些N-gram viewer数据进行深加工,研究则可进一步深化,如Acerbi等人(2013)结合情感词库(WordNet Affect)和波特算法(Porter’s Algorithm)对20世纪英语谷歌图书中的情感表达变化进行研究,即为一例。




English Education: Needs and Mission, by YE Xingguo, p.1

This speech begins with the exploration of the evolution of English education in China, probes into the relationship between English education and state’s needs, analyses the significant contribution of the English education to the realization of the state strategies, and concludes that a university shall, at the juncture of promulgation ofNationalCriteriaofTeachingQualityforBachelorDegreeForeignLanguagePrograms, find “niche” or specific state needs and aim at satisfying them through working out its own criteria of English teaching quality.

On Collaborative Innovation of Translation Education under the New Normal, by YE Xingguo, p.5

The speech sets forth the new normal of the translation circles, analyses the different value orientations of the subjects of the collaborative innovation, namely the relevant circles of administration, enterprises, education, research and clients, and points out the six main problems and their solutions.

Innovation of English Teaching under the New Normal, by YE Xingguo, p.9

The speaker talks about how the new domestic needs,international situation and ICT development are challenging English teaching, why new ideas, standards and methods shall be applied and what the new normal of English teaching is, and emphasizes the importance of keeping up with the times and teaching innovation.

An Overview of Linguistic Landscape Study in China and the Prospect, by ZHANG Baicheng, p.14

Linguistic landscape study in China dates back to 1980s.In the past forty years, Chinese scholarsv have achieved remarkable progress in this domain, and the numerous studies mainly cover three themes: (1) Linguistic landscape translation and the norms; (2) Features of domain-specific linguistic landscapes; (3) Theory and methodology in linguistic landscape study.The studies investigate many types of linguistic landscapes including public signs/labels, publicizing language, slogans, street/road/store/institutional names, and couplets.The limitations of the studies lie in the four aspects: emphasizing description but ignoring interpretation, inadequacy of theoretical and methodological explorations, and not paying enough attention to multimodal signs per se.Future study can be furthered through focusing on five aspects, including shifting the research focus, exploring the theoretical and methodological issues and so on.

On the Nature of Middle Verbs and Middle Constructions, by YANG Yongzhong, p.19

Middle constructions are a well-studied topic in linguistics.Based on a summary of the properties and features of middle verbs, this paper proposes that middle constructions are composed of two verbs, of which the first verb, serving as the predicate, denotes an action characteristic of conventional property or features, while the second verb, serving as a complement clause, denotes result.The combination of the two verbs denotes a complete event.Based on this, it is argued that all middle verbs must be of this nature in terms of underlying structure.Once this has been accepted, many long-standing puzzles related to middle constructions are solved quite readily.

A General Review of Dynamic Assessment and Second language Learning, by WANG Hua, p.25

In dynamic assessment, important information about a learner’s abilities and changes can be learned during the assessment.Dynamic assessment is a procedure for simultaneously assessing and promoting development of learners’ cognitive procedure and ability, which is confirmed and applied in the research on second language teaching and learning.This study is a brief review and comment on the research of dynamic assessment and its application in second language learning based on a wide retrieval of literature.

Innovation or “Old Wine in a New Glass”—On the Use of Neologism in Skinner’sVerbalBehavior, by JIANG Daohua, p.31

Verbalbehavior, from the functional perspective, analyzes the cause-and-effect relations of human verbal behavior, in which the key to understand its theoretical framework is on the use of neologistic terms.Taking it as the point-of-departure, the paper discusses the misunderstandings of Skinner’s behavioral theory initiated by Noam Chomsky and points out the great innovation and insightfulness of Skinner’s work.

A Study on English Majors’ Pragmatic Awareness in English Gratitude Context: Sex Roles and Social Situations, by CAI Chen & WANG Yinyin, p.46

In this artide, we found that more and more males and females show an androgynous characteristics and that the masculinity has a higher sensitivity on pragmatic awareness than the femininity.Participants show different pragmatic awareness on social situations and demonstrate significant difference on the perception of the burden of kindness.Meanwhile, sex roles still have different perceptions on the same social situation.The results reveals that participants construct their pragmatic awareness in the communication process.A successful communication requires the participants to improve their sensitivity on the differences of sex roles and social situations, so the intercultural communication teaching shall concentrate on cultivating students’ critical inter-cultural communicative competence.

Role of Information Grounding in Literary Translation for Discourse Structuring: A Study of Three English Translations of a Chinese Prose “Zuiwengting Ji”, by LI Ming, p.60

Any discourse, a conglomeration of different sentences, features background information as well as foreground information both at the clause level and at the discourse level.The information which knits the thread of a discourse and which moves the discourse forward is called foreground information and the information which does not immediately and crucially contribute to the speaker’s goal, but which merely assists, amplifies, or comments on it is background information.Information grounding theory holds that an acceptable discourse results from the modulation of both foreground information and background information.The present paper, by taking three translations of the first paragraph of the Chinese literary discourse “Zuiwengting Ji” as an instance and through extracting and back-translating into Chinese their respective foreground information, aims to make readers fully aware of the important role that foreground information plays in achieving global coherence in discourse structuring.

On Mental Access between Topic and Subject in Text Translation, by ZHONG Shuneng, YOU Liping & ZHANG Yunxia, p.65

It is revealed that a topic finds its way in a Chinese text by means of an NP, a pronoun or a zero-form and works as a subject, on the one hand.On the other hand, a topic is embodied in a corresponding English counterpart in the form of either an NP or a pronoun and functions as a subject.The present paper indicates that a topic chain is established by means of metonymy when a topic accesses itself to a series of subjects.It is concluded by claiming that the topic chain plays a crucial role in developing a naturally coherent text.

Exploration on the Compilation Method of Special Dictionaries Based on the English-Chinese Parallel Corpus, by ZHANG Yushuang & GUAN Xinchao, p.69

This article describes the compilation method of special dictionaries based on the English-Chinese parallel corpus.In comparison with the traditional compilation method, the corpus-based method can improve dictionary’s systematization and standardization, whatever its size is.How to choose corpus texts and how to do word frequency statistics etc are the key of compilation.The meanings of general words in this kind of dictionary will contribute to learning functions and are useful for understanding of specialties.The determination of true special terms depends upon the compilation goal and dictionary users etc.The corpus-based dictionary can also provide a linking service for special dictionaries.Certainly, there lies disadvantages by compiling the dictionary in this way and should be treated carefully during the compilation process.



