05
2018
08

DOTA 2 的OpenAI FIVE

2018.6.25

看8月5日电竞

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2. While today we play with restrictions, we aim to beat a team of top professionals at The International in August subject only to a limited set of heroes. We may not succeed: Dota 2 is one of the most popular and complex esports games in the world, with creative and motivated professionals who train year-round to earn part of Dota’s annual $40M prize pool (the largest of any esports game).

我们的五个神经网络团队OpenAI Five已经开始击败DOTA2的业余团队。虽然今天我们有很多限制,但我们的目标是在8月份击败一支顶级的专业球队,这只不过是一系列有限的英雄罢了。我们可能不会成功:DOTA2是世界上最流行和最复杂的电子竞技游戏之一,具有创造性和积极性的专业人才,他们全年都在训练,以赚取守卫遗迹每年40万美元奖金池中的一部分(这是任何一款电子竞技游戏中最大的)。


OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy Optimization running on 256 GPUs and 128,000 CPU cores — a larger-scale version of the system we built to play the much-simpler solo variant of the game last year. Using a separate LSTM for each hero and no human data, it learns recognizable strategies. This indicates that reinforcement learning can yield long-term planning with large but achievable scale — without fundamental advances, contrary to our own expectations upon starting the project.

OpenAI Five 每天玩180年的游戏,通过自我游戏学习。它使用在256个GPU和128000个CPU内核上运行的一个放大的版本优化训练,这是我们构建的系统的一个更大规模的版本,用于播放去年简单得多的游戏变体。使用单独的LSTM为每个英雄和没有人的数据,它学习可识别的策略。这表明,强化学习可以产生长期规划,但规模大,但可实现的规模没有基础进展,相反,我们自己的预期在启动项目。


To benchmark our progress, we’ll host a match versus top players on August 5th. Follow us on Twitch to view the live broadcast, or request an invite to attend in person!

为了评估我们的进步,我们将在8月5日举办一场比赛和顶级球员的比赛。跟随我们抽搐看现场直播,或请求邀请亲自出席!



问题

One AI milestone is to exceed human capabilities in a complex video game like StarCraft or Dota. Relative to previous AI milestones like Chess or Go, complex video games start to capture the messiness and continuous nature of the real world. The hope is that systems which solve complex video games will be highly general, with applications outside of games.一个AI里程碑是超越人类能力在一个复杂的视频游戏,如星际争霸或守卫遗迹。相对于以前的国际象棋,如国际象棋或围棋,复杂的电子游戏开始捕捉混乱和现实世界的连续性。希望的是,解决复杂的电子游戏的系统将是高度普遍的,在游戏之外的应用程序


Dota 2 is a real-time strategy game played between two teams of five players, with each player controlling a character called a “hero”. A Dota-playing AI must master the following:

守卫遗迹2是一个实时战略游戏,在五个玩家的两个团队之间进行游戏,每个玩家控制一个被称为“英雄”的角色。AI的守卫遗迹必须掌握以下内容:


Long time horizons. Dota games run at 30 frames per second for an average of 45 minutes, resulting in 80,000 ticks per game. Most actions (like ordering a hero to move to a location) have minor impact individually, but some individual actions like town portal usage can affect the game strategically; some strategies can play out over an entire game. OpenAI Five observes every fourth frame, yielding 20,000 moves. Chess usually ends before 40 moves, Go before 150 moves, with almost every move being strategic.

长时间的视野。DOTA游戏以每秒30帧的速度运行,平均每秒45分钟,结果每场比赛有80000分。大多数行动(比如命令一个英雄移动到一个地点)都有轻微的影响,但是一些个人行动,比如城市门户网站的使用,会影响游戏的策略,一些策略可以在整个游戏中发挥作用。OpenAI Five 观察每第四帧,产生20000个移动。国际象棋通常在40个动作前结束,在150个动作之前结束,几乎每一个动作都是战略性的。


Partially-observed state. Units and buildings can only see the area around them. The rest of the map is covered in a fog hiding enemies and their strategies. Strong play requires making inferences based on incomplete data, as well as modeling what one’s opponent might be up to. Both chess and Go are full-information games.

部分观测状态。单位和建筑只能看到周围的区域。地图的其余部分被掩盖在雾中隐藏敌人和他们的策略。强游戏需要基于不完整的数据进行推理,以及对对手的建模。棋棋和棋棋都是全信息游戏。


High-dimensional, continuous action space. In Dota, each hero can take dozens of actions, and many actions target either another unit or a position on the ground. We discretize the space into 170,000 possible actions per hero (not all valid each tick, such as using a spell on cooldown); not counting the continuous parts, there are an average of ~1,000 valid actions each tick. The average number of actions in chess is 35; in Go, 250.

高维空间,连续行动。在dota英雄可以把dozens,每个角色,一些角色或者另一个目标或地面单位的立场。我们可能在离散空间的行动负有对英雄(不确定的角色,使用在线冷冻术);不在场的连续部分,计数,平均1000个角色~有效行动。角色数值在象棋中是35;围棋中是250。


High-dimensional, continuous observation space. Dota is played on a large continuous map containing ten heroes, dozens of buildings, dozens of NPC units, and a long tail of game features such as runes, trees, and wards. Our model observes the state of a Dota game via Valve’s Bot API as 20,000 (mostly floating-point) numbers representing all information a human is allowed to access. A chess board is naturally represented as about 70 enumeration values (a 8x8 board of 6 piece types and minor historical info); a Go board as about 400 enumeration values (a 19x19 board of 2 piece types plus Ko).、

高维连续观测空间。DOTA是一个大型连续地图,包含十个英雄,几十个建筑,几十个全国人大单元,以及一个长尾巴的游戏功能,如符文,树和病房。我们的模型观察到守卫遗迹游戏的状态,通过阀门的BOT API作为20000(主要是浮点)的数字,表示人类可以访问的所有信息。象棋板自然地被表示为大约70个枚举值(8x8板6个类型和次要历史信息);围棋板大约400个枚举值(19X19板2个类型加Ko)。


The Dota rules are also very complex — the game has been actively developed for over a decade, with game logic implemented in hundreds of thousands of lines of code. This logic takes milliseconds per tick to execute, versus nanoseconds for Chess or Go engines. The game also gets an update about once every two weeks, constantly changing the environment semantics.                

DOTA规则也是非常复杂的——游戏已经被开发了十多年,游戏逻辑在成百上千的代码行中实现。这个逻辑需要每毫秒执行毫秒,而对于国际象棋或GO引擎来说,毫微秒。游戏还每两周更新一次,不断地改变环境。


这个AI是学习来的神经网络。不是直接计算出来的。只能说是接间计算。

« 上一篇 下一篇 »

发表评论:

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。

歌曲 - 歌手
0:00