A team of researchers at Google-owned DeepMind has achieved a new milestone in artificial intelligence by creating a program capable of an unlikely task: learning how to play video games. Quentin Hardy reports for The New York Times that a team of Google researchers has created a computer that can teach itself how to play and win a range of 1980s video games. While the achievement is similar to the development of IBM’s chess-winning computer DeepBlue or its Jeopardy-winning Watson, there’s a key difference: both DeepBlue and Watson were taught techniques that would help them within their games. Google’s program taught itself the rules to more than 49 Atari 2600 computer games, figuring out the strategies it needed to win. Demis Hassabis, the noted artificial intelligence researcher and computer game designer who led the project, described the program as a “single general learning system.”
Nicola Twilley, writing for The New Yorker notes that the Nature paper detailing the research, authored by Hassabis and colleagues Volodymyr Mnih, Koray Kavukcuoglu, and David Silver, appears just over a year after Hassabis’s company, DeepMind, made its public debut. Google acquired the company for $650 million in January 2014 after Hassabis first demonstrated the gaming abilities of his program, a program the DeepMind team now describes as a “novel artificial agent” that combines two existing forms of machine intelligence: a deep learning neural network and a reinforcement learning algorithm.
Deep neural networks are inspired by the human brain, and use layers of connections, or nodes, to filter raw sensory data into useful patterns. When the program first sees an Atari game, the pixel data that it receives is meaningless. But Twilley explains that from there, the program begins analyzing the pixels, sorting them by color, finding edges and patterns, and eventually developing the ability to recognize shapes and the ways in which they fit together.
The program’s second form of intelligence, reinforcement learning, enables the artificial intelligence to be programmed to find a score rewarding and to analyze its performance and change its behavior to achieve that score. The two types of intelligence combined let the program approximate the abilities of a human gamer to interpret what’s going on on the screen, learn from past mistakes, and to want to win. The dual capability, which the authors of the paper call “deep-Q network,” or DQN, gives the program skill in “mastering and understanding structure,” according to Hassabis.
In 43 of the 49 games, the DQN outperformed previous computational efforts to win, and in more than half of the games, the program could eventually play at least three-quarters as well as a professional human games tester. By teaching itself the rules of each game, the DQN surprised the researchers with novel winning strategies. Hardy notes that when it played “Seaquest,” the program determined that the game’s submarine could survive by staying near the surface for the entire game, while in “Breakout,” it figured out a new way to get through a wall of bricks.
The achievement is more impressive than it might sound on the surface. While IBM’s Deep Blue beat chess grandmaster Garry Kasparov at his more intellectual game, chess has a much more limited “feature space,” according to computer scientist Zachary Mason, who spoke to The New Yorker. Deep Blue only needed to consider the position of each piece on the board over a little more than 100 turns, a task that played in to its strengths of memory and brute-force computing. But Mason explains that “there’s a byte or so of information per pixel” in an Atari game, and that combined with hundreds of thousands of turns adds up to a much messier set of data for DeepMind’s program to process. And while Deep Blue was programmed with a library of moves and rules, the DQN used the same all-purpose code to teach itself the rules and strategies of a wide range of different games.
The team is now focusing on bringing the program up to speed on 1990s games and their more complex three-dimensional spaces. Hassabis expects to achieve that capability within five years, and in the future thinks that “if it can drive a car in a racing game it should theoretically drive a car” in the real world. But for now, the handful of games in which the program failed to achieve human-level performance were those that required long-term planning and sophisticated pathfinding, like “Ms. Pac-Man,” “Private Eye,” and “Montezuma’s Revenge.” Hassabis thinks a possible solution could be to make the program more willing to take risks.
But as Hardy notes, the findings make it clear how far researchers are from developing artificial intelligence that can approximate any human type of intelligence. The DQN isn’t capable of developing the type of conceptual knowledge that would enable it to learn what a submarine is, or to transfer what it learned in one game to another. Hassabis explains that humans “have prior knowledge that we bring from the real world,” and thinks that “it will require some new kind of algorithm” for the program to master abstractions or conceptual thought.
Longer-term, the researchers aim to build an artificial intelligence system with the capability of a toddler. Toddlers, like other, older humans, can apply prior knowledge to a new situation. But beyond that, Twilley notes that it’s unclear whether the combination of a deep neural network and reinforcement learning could ever lead to conceptual cognition, which would enable the program to only gain fluency with a game like “Sub Command,” but understand what a submarine or water or oxygen actually are. Hassabis considers this is “an open question.”
But Mason offers Twilley a less optimistic view on Hassabis’s algorithm. While he thinks “their current line of research leads to StarCraft in five or ten years and Call of Duty in maybe twenty, and controllers for drones in live battle spaces in maybe fifty,” his assessment is that “it never, ever leads to a toddler.” Toddlers can interact with the world in sophisticated ways, recognize objects when the light or shadows change, and manipulate objects in space. Mason thinks that these kinds of tasks won’t be achieved by the kind of artificial intelligence that can teach itself arcade games, and that it will take an algorithm capable of a much richer model of cognition to move artificial intelligence closer to the abilities of the human mind.