How Big Is the Internet, Really?互联网究竟有多大?
2018-11-28斯蒂芬妮帕帕斯周臻
文/斯蒂芬妮·帕帕斯 译/周臻
By Stephanie Pappas
互联网是一个繁忙之所。国际实时统计项目的网站“互联网实时统计”显示,每秒大约有6000推特更新,4万多谷歌搜索,200多万电邮发送。
[2]但这些统计数据只暗示了网络大小。截至2014年9月,互联网上有10亿个网站,其数量随着每分钟有网站消失和诞生而波动。互联网不停变化,某种程度上却可量化——就在这人人熟知的互联网之下,是谷歌和其他搜索引擎都未索引的“深网”。深网的内容可以与在线数据库的搜索结果一样无害,也可以像那些使用特殊Tor软件才能访问的黑市论坛一样神秘。(使用Tor是人们为了某种理由需要在网上匿名,而不仅仅是为了非法活动。)
[3]结合“表层”网络的不断变化和深网的无法量化,很容易看出为什么估算互联网的大小是一项艰巨的任务。不过分析师们认为,网络规模庞大且越来越大。
数据驱动
[4]除了大约10亿个网站,网络还是更多个人网页的家园。www.worldwidewebsize.com是其中之一,旨在通过互联网顾问莫里斯·德昆德尔的研究来取得数字量化。德昆德尔和他的同事们于2016年2月在《科学计量学》杂志上发表了他们的研究方法。为了取得估算结果,研究人员在谷歌和必应上批量搜索了50个常用词。(雅虎搜索和Ask.com曾经被纳入,但因为它们不再显示结果总数而被排除。)研究人员知道这些词在普通印刷品中的出现频率,他们便能基于词汇引用的多少来推算出页面总数。搜索引擎索引出的页面会互相重复,因此该方法还需要估计和去除可能的重叠部分。
[5]根据这些计算,截至2016年3月中,至少有46.6亿个网页在线。但是,该计算仅涵盖可搜索的网络,不包括深网。
[6]那么互联网存有多少信息呢?据加州大学戴维斯分校通信系教授马丁·希尔伯特说,有三种方法来审视这个问题。
[7]“互联网存储信息,互联网传播信息,互联网计算信息。”希尔伯特如是说。他表示,互联网的通信能力,可以通过在任何给定时间内能够传输多少信息或实际传输多少信息来衡量。
[8] 2014年,研究人员在《超级计算前沿和创新》杂志上发表了一份研究报告,估算互联网的存储容量为10的24次方字节,即100万埃字节。1个字节是一个包含8个比特的数据单元,相当于您正读到的1个单词中的单个字符。1个埃字节是100亿亿字节。
[9]估算互联网通信能力的一种方法是测量互联网的流量。根据思科可视网络指数计划,互联网正处于“泽字节时代”。1个泽字节等于十万亿亿个字节或1000埃字节。根据思科推断,截至2016年底,全球互联网流量将达到每年1.1泽字节,到2019年,全球流量预计将达到每年2泽字节。
[10]思科思维领袖总监小托马斯·巴内特在2011年的一篇博客中写到了公司的发现,1个泽字节相当于长达3.6万年的高清视频,也相当于播放Netflix的整个目录3177次。
[11] 2011年,希尔伯特和他的同事在《科学》杂志上发表了一篇论文,以带宽测量估算出互联网的通信能力,为每秒3兆千比特。这是基于硬件的能力,而不是任何时刻实际传输的信息量。
[12]在一项特别不寻常的研究中,一个匿名黑客通过计算使用了多少个IP(互联网协议)来测量互联网的大小。IP是数据通行于互联网的起点,每个在线设备至少有一个IP地址。据该黑客估计,2012年在线的IP地址有13亿个。
[13]互联网大大改变了数据格局。希尔伯特及同事发现,2000年,在互联网应用无所不在之前,电信容量为2.2个完美压缩的埃字节。2007年,这个数字为65。这个容量包括电话网络和语音呼叫,以及庞大的互联网信息库接入。然而研究者们发现,2007年移动网络上的数据流量已经超过了语音流量。
物理网络
[14]如果感觉所有这些位元和字节有点抽象,别担心:2015年,研究人员尝试了用物理术语来表达互联网的大小。他们在《跨学科科学课题》杂志上发文称,据估计,需要用2%的亚马孙热带雨林制造的纸张来打印出整个网络(包括暗网)。对于这项研究,他们做出了一些关于网上文本的大胆假设:一张普通网页估计需要30页A4纸(8.27×11.69英寸)。基于这个假设,打印互联网上的文本将需要1360亿页之多。(后来,《华盛顿邮报》的一名记者想要提升估算的准确率,他认为一张网页的平均长度更接近6.5页,因而估算出需要3055亿页来打印整个互联网。)
[15]当然,用文本形式打印出来的互联网不会包含大量在线的非文本数据。根据思科的调查结果,2015年,视频的IP传输量为每月8000拍字节,而网页、电邮和数据传输每月则为约3000拍字节。(拍字节是100万吉字节或2的50次方字节。)据了解,该公司估计,视频占当年大部分互联网流量,达到3.4万拍字节。文件共享排在第二,达1.4万拍字节。
[16]希尔伯特及同事采取了自己的方式,将全世界的信息可视化。在发表于2011年《科学》杂志的文章里,他们计算出,全世界模拟和数字存储的信息容量为295个完美压缩埃字节。研究人员写道:若用光盘存储295埃字节,需要的光盘将摞到月球(238900英里,即384400公里),接着再垒起地球到月球的四分之一距离。总距离为298625英里(480590公里)。到2007年,94%的信息是数字化的,意味着如果存储在光盘上,仅世界上的数字信息就会冲过月球,延伸280707.5英里(451755公里)。
[17]希尔伯特说,互联网的规模在不断变化,而其增长呈跳跃式。这些汹涌而来的信息,只有一个可取之处:比起存储的数据量,我们的计算能力增长更快。
[18]希尔伯特说,全世界存储容量每三年翻一番,但全世界计算能力每一年半翻一番。2011年,人类可以用其所有计算机每秒执行64艾条指令——相当于人脑每秒的神经脉冲数。5年后,计算机的能力将大致达到8个人类大脑的水平。当然,这并不意味着一个房间里的8个人就可以超越全世界的电脑。从许多方面讲,人工智能已经胜过人类的认知能力(尽管人工智能还远未能模拟普通的类人智力)。在线上,人工智能决定了你能看到的脸书帖子、谷歌搜索内容,甚至80%的股票交易。希尔伯特说,线上数据爆炸式增长唯一有用的地方就是计算能力的扩展。
[19]他说:“我们正从信息时代进入知识时代。” □
The Internet is a busy place. Every second, approximately 6,000 tweets are tweeted; more than 40,000 Google queries are searched; and more than 2 million emails are sent, according to Internet Live Stats1该信息来源于http://www.internetlivestats.com/one-second/。本文写于2016年,数据已不准确。有兴趣, a website of the international Real Time Statistics Project.
[2] But these statistics only hint at the size of the Web. As of September 2014,there were 1 billion websites on the Internet, a number that fluctuates by the minute as sites go defunct and others are born.And beneath this constantly changing (but sort of quanti fiable) Internet that’s familiar to most people lies the “Deep Web2这里参照了浅海和深海的概念。,”which includes things Google and other search engines don’t index. Deep Web content can be as innocuous as the results of a search of an online database or as secretive as black-market forums accessible only to those with special Tor33 Tor是The Onion Router的缩写,是第二代洋葱路由(onion routing)的一种实现,用户通过Tor可以防范流量过滤、嗅探分析,在互联网上实现匿名交流。software.(Though Tor isn’t only for illegal activity,it’s used wherever people might have reason to go anonymous online.)
[3] Combine the constant change in the “surface” Web with the unquanti fiability of the Deep Web, and it’s easy to see why estimating the size of the Internet is a dif ficult task. However, analysts say the Web is big and getting bigger.
Data-driven
[4] With about 1 billion websites, the Web is home to many more individual Web pages. One of these pages, www.worldwidewebsize.com, seeks to quantify the number using research by Internet consultant Maurice de Kunder. De Kunder and his colleagues published their methodology in February 2016 in the journal Scientometrics4由Springer发行的学术期刊,关注科学和科学研究中的量化方法和特征研究。. To come to an estimate, the researchers sent a batch of 50 common words to be searched by Google and Bing. (Yahoo Search and Ask.com used to be included but are not anymore because they no longer show the total results.) The researchers knew how frequently these words have appeared in print in general, allowing them to extrapolate the total number of pages out there based on how many contain the reference words. Search engines overlap in the pages they index,so the method also requires estimating and subtracting the likely overlap.
[5] According to these calculations,there were at least 4.66 billion Web pages online as of mid-March 2016.This calculation covers only the searchable Web, however, not the Deep Web.
[6] So how much information does the Internet hold? There are three ways to look at that question, said Martin Hilbert, a professor of communications at the University of California, Davis.
[7] “The Internet stores information,the Internet communicates information and the Internet computes information,”Hilbert said. The communication capacity of the Internet can be measured by how much information it can transfer,or how much information it does transfer at any given time, he said.
[8] In 2014, researchers published a study in the journal Supercomputing Frontiers and Innovations estimating the storage capacity of the Internet at 1024bytes, or 1 million exabytes. A byte is a data unit comprising 8 bits, and is equal to a single character in one of the words you’re reading now. An exabyte is 1 billion billion bytes.5有必要列表一下各种字节单位的换算:1B(byte 字节)=8bit(比特),1KB(Kilobyte千字节)=1024B,1MB(Megabyte 兆字节,简称“兆”)=1024KB,1GB(Gigabyte吉字节,又称“千兆”)=1024MB,1TB(Terabyte 万亿字节,太字节)=1024GB,1PB(Petabyte 千万亿字节,拍字节)=1024TB,1EB(Exabyte 百亿亿字节,埃字节)=1024PB,1ZB(Zettabyte 十万亿亿字节,泽字节)= 1024EB,1YB(Yottabyte 一亿亿亿字节,尧字节)= 1024ZB。
[9] One way to estimate the communication capacity of the Internet is to measure the traffic moving through it.According to Cisco’s Visual Networking Index initiative, the Internet is now in the“zettabyte era.” A zettabyte equals 1 sextillion6根据国际单位制,一个sextillion相当于10的21次方。bytes, or 1,000 exabytes. By the end of 2016, global Internet traffic will reach 1.1 zettabytes per year, according to Cisco, and by 2019, global traffic is expected to hit 2 zettabytes per year.
[10] One zettabyte is the equivalent of 36,000 years of high-definition video,which, in turn, is the equivalent of streaming Net flix7全球最大的在线电视电影节目付费收’s entire catalog 3,177 times, Thomas Barnett Jr., Cisco’s director of thought leadership, wrote in a 2011 blog post about the company’s findings.
[11] In 2011, Hilbert and his colleagues published a paper in the journal Science estimating the communication capacity of the Internet at 3 × 1012kilobits per second, a measure of bandwidth. This was based on hardware capacity, and not on how much information was actually being transferred at any moment.
[12] In one particularly offbeat study,an anonymous hacker measured the size of the Internet by counting how many IPs (Internet Protocols) were in use. IPs are the wayposts of the Internet through which data travels, and each device online has at least one IP address. According to the hacker’s estimate, there were 1.3 billion IP addresses used online in 2012.
[13] The Internet has vastly altered the data landscape. In 2000, before Internet use became ubiquitous, telecommunications capacity was 2.2 optimally compressed exabytes, Hilbert and his colleagues found. In 2007, the number was 65. This capacity includes phone networks and voice calls as well as access to the enormous information reservoir that is the Internet. However,data traffic over mobile networks was already outpacing voice traf fic in 2007,the researchers found.
The physical Internet
[14] If all of these bits and bytes feel a little abstract, don’t worry: In 2015,researchers tried to put the Internet’s size in physical terms. The researchers estimated that it would take 2 percent of the Amazon rainforest to make the paper to print out the entire Web (including the Dark Web), they reported in the Journal of Interdisciplinary Science Topics. For that study, they made some big assumptions about the amount of text online by estimating that an average Web page would require 30 pages of A4 paper (8.27 by 11.69 inches). With this assumption, the text on the Internet would require 1.36 × 1011pages to print a hard copy. (A Washington Post reporter later aimed for a better estimate and determined that the average length of a Web page was closer to 6.5 printed pages, yielding an estimate of 305.5 billion pages to print the whole Internet.)
[15] Of course, printing out the Internet in text form wouldn’t include the massive amount of nontext data hosted online. According to Cisco’s research,8,000 petabytes per month of IP traf fic was dedicated to video in 2015, compared with about 3,000 petabytes per month for Web, email and data transfer.(A petabyte is a million gigabytes or 250bytes.) All told, the company estimated that video accounted for most Internet traffic that year, at 34,000 petabytes.File sharing came in second, at 14,000 petabytes.
[16] Hilbert and his colleagues took their own stab at visualizing the world’s information. In their 2011 Science paper, they calculated that the information capacity of the world’s analog and digital storage was 295 optimally compressed exabytes. To store 295 exabytes on CD-ROMS would require a stack of discs reaching to the moon (238,900 miles, or 384,400 kilometers), and then a quarter of the distance from the Earth to the moon again, the researchers wrote. That’s a total distance of 298,625 miles (480,590 km). By 2007,94 percent of information was digital,meaning that the world’s digital information alone would overshoot the moon if stored on CD-ROM. It would stretch 280,707.5 miles (451,755 km).
[17] The Internet’s size is a moving target, Hilbert said, but it’s growing by leaps and bounds. There’s just one saving grace when it comes to this deluge of information: Our computing capacity is growing even faster than the amount of data we store.
[18] While world storage capacity doubles every three years, world computing capacity doubles every year and a half, Hilbert said. In 2011, humanity could carry out 6.4 × 1018instructions per second with all of its computers—similar to the number of nerve impulses per second in the human brain. Five years later, computational power is up in the ballpark of about eight human brains. That doesn’t mean, of course,that eight people in a room could outthink the world’s computers. In many ways, artificial intelligence already outperforms human cognitive capacity(though A.I. is still far from mimicking general, humanlike intelligence). Online, artificial intelligence determines which Facebook posts you see, what comes up in a Google search and even 80 percent of stock market transactions.The expansion of computing power is the only thing making the explosion of data online useful, Hilbert said.
[19] “We’re going from an information age to a knowledge age,” he said. ■