APP下载

AlphaGo: Using Machine Learning to Master the Ancient Game of Go阿尔法围棋:机器学习掌握围棋这项古老技艺

2017-02-08德米什哈萨比斯庄晓旭闫冉审订

英语世界 2017年9期
关键词:棋局阿尔法棋盘

文/德米什·哈萨比斯 译/庄晓旭 闫冉/审订

By Demis Hassabis1

围棋起源于中国,至今已有2500多年的历史。孔子曾为围棋作文,它也是中国文人骚客必需掌握的四艺之一。全世界的围棋手总数超过4000万,围棋的规则简单:棋手在棋盘上行白子或黑子,努力吃掉对方的棋子或在棋盘上围地。下围棋主要靠个人的直觉与感觉,其美妙、精微与蕴含的智慧,让几千年来的人们为之神往。

[2]虽然围棋规则简单,下起来却极其复杂。可能的棋位多达1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000种,这比宇宙中的原子数量都要多,比国际象棋大10的100次方倍。

The game of Go originated in China more than 2,500 years ago.Confucius wrote about the game, and it is considered one of the four essential arts required of any true Chinese scholar. Played by more than 40 million people worldwide, the rules of the game are simple: Players take turns to place black or white stones on a board, trying to capture the opponent’s stones or surround empty space to make points of territory. The game is played primarily through intuition and feel, and because of its beauty, subtlety and intellectual depth it has captured the human imagination for centuries.

[2] But as simple as the rules are,Go is a game of profound complexity.There are1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 possible positions—that’s more than the number of atoms in the universe,and more than a googol times larger than chess.

[3]围棋的复杂性使其对计算机有很大难度,也正因此围棋成为人工智能研究者渴望征服的挑战。这些研究者们以各类游戏为实验,以发明出可以解决问题的智能、灵活的计算程序,有时解决的方式与人类相似。电脑可以胜任的第一个游戏是“井字游戏”(又叫作“一字棋”),时间是1952年。1994年,掌握了跳棋。1997年,电脑“深蓝”因战胜棋王加里·卡斯帕罗夫而闻名。计算机的战绩并不局限于棋牌游戏——2011年IBM的沃森在《危险边缘》节目中,击败该节目的两位冠军;2014年,我们通过原始像素输入开发出掌握雅丽达游戏几十种玩法的计算机程序。但到现在为止,人工智能工程师依旧不能开发出百战百胜的围棋计算机程序。

[4]传统的人工智能方法是构建覆盖所有可能位置的搜索树,而这并能不在围棋中实现。因而当我们着手征服围棋的时候,采取了不同的方法。我们建立了名为“阿尔法围棋”(A l p h a G o)的体系。该体系结合了高级树形检索与深度神经网络,我们给这些神经网络中输入棋局并用含有数百万类神经元连接的1 2个不同的网络层对其处理。一个神经网络,即“策略网络”,可以选择下一步棋的走法;另一个神经网络,即“价值网络”,则可预测棋局的赢家。

[3] This complexity is what makes Go hard for computers to play, and therefore an irresistible challenge to artificial intelligence (AI) researchers,who use games as a testing ground to invent smart, flexible algorithms that can tackle problems, sometimes in ways similar to humans. The fi rst game mastered by a computer was noughts and crosses2noughts and crosses是一种在3×3格子上进行的连珠游戏,由于棋盘一般不画边框,格线排成井字故得名。两个玩家,一个打圈(○),一个打叉(×),轮流在3乘3的格上打自己的符号,最先以横、直、斜连成一线则为胜。因而又叫“一字棋”。(also known as tic-tac-toe)in 1952. Then fell checkers in 1994. In 1997 Deep Blue famously beat Garry Kasparov at chess. It’s not limited to board games either—IBM’s Watson bested two champions at Jeopardy3美国一档智力竞赛电视节目。in 2011, and in 2014 our own algorithms learned to play dozens of Atari games just from the raw pixel inputs. But to date, Go has thwarted AI researchers.

[4] Traditional AI methods—which construct a search tree over all possible positions—don’t have a chance in Go.So when we set out to crack Go, we took a different approach. We built a system, AlphaGo, that combines an advanced tree search with deep neural networks. These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like connections. One neural network, the “policy network,” selects the next move to play. The other neural network, the “value network,” predicts the winner of the game.

[5]我们用人类专家围棋比赛中的3000万个走法强化这套神经网络系统,直到它可以预测57%的人类落子(在阿尔法围棋之前,这个纪录是44%)。但我们的目标不是模仿人类选手,而是要战胜他们。要实现这个目标,阿尔法围棋掌握了如何为自身发现新战略,即在神经网络中对棋局进行成千上万次计算,并运用试差法调整系统间的连接(这一过程又叫强化学习)。当然,上文种种都要求强大的运算能力,所以我们也大量使用了谷歌云平台。

[6]在种种强化之后,我们开始让阿尔法围棋参与实战。首先,我们举办了阿尔法围棋与其他顶级计算机围棋程序间的锦标赛。阿尔法围棋在5 0 0场竞赛中只输了一场。接着我们邀请了蝉联三届欧洲围棋冠军的樊麾——他从1 2岁起就投身围棋,是职业选手中的精英。我们邀请他到伦敦的工作室来参加挑战赛。在2 0 1 5年1 0月的闭门比赛中,阿尔法围棋5∶0赢得了比赛。这是电脑程序第一次战胜职业围棋手。

[5] We trained the neural networks on 30 million moves from games played by human experts, until it could predict the human move 57 percent of the time (the previous record before AlphaGo was 44 percent). But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning. Of course,all of this requires a huge amount of computing power, so we made extensive use of Google Cloud Platform.

[6] After all that training it was time to put AlphaGo to the test. First, we held a tournament between AlphaGo and the other top programs at the forefront of computer Go. AlphaGo won all but one of its 500 games against these programs. So the next step was to invite the reigning three-time European Go champion Fan Hui—an elite professional player who has devoted his life to Go since the age of 12—to our London office for a challenge match.In a closed-doors match last October,AlphaGo won by 5 games to 0. It was the first time a computer program has ever beaten a professional Go player.

[7] We are thrilled to have mastered Go and thus achieved one of the grand challenges of AI. However, the most significant aspect of all this for us is that AlphaGoisn’t just an“expert”system built with hand-crafted rules;instead it uses general machine learning techniques to fi gure out for itself how to win at Go. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently,ultimately we want to apply these techniques to important real-world problems. Because the methods we’ve used are general-purpose4general-purpose 通用的。, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems,from climate modelling to complex disease analysis. We’re excited to see what we can use this technology to tackle next! ■

[7]我们很开心能够掌握围棋诀窍,攻破人工智能众多难点中的一个。但是,对我们来说最大的亮点在于,阿尔法围棋不是靠人工建立的“专家”系统,而是运用一般的机器学习技巧,自己赢得围棋比赛。虽然各类游戏是迅速高效地开发和检测人工智能计算程序的完美平台,但我们最终的目标是把这些技巧用于解决重要的现实问题。我们使用的方法是通用的,因而我们希望有一天能拓展这些方法来解决社会中一些最艰难、最紧迫的问题,比如气候模型和复杂疾病分析等。我们很希望看到,接下来我们可以用这项技术解决哪些问题。 □

猜你喜欢

棋局阿尔法棋盘
极狐阿尔法S HI版
“萤火虫-阿尔法”小型火箭首飞概述
ARCFOX极狐阿尔法S
旁观者
传祺海外新棋局
安凯运游棋局
西咸新棋局
棋盘人生
棋盘里的天文数字
阿尔法磁谱仪(AMS)