The first big hit of artificial intelligence was chess. The game has a number of possible combinations, but it was relatively separable because it was built by a set of clear rules. An algorithm can always have an accurate knowledge of the state of the game and know every move that it and its opponent can make. You can assess the status of the game by looking at the board.
But many other games are not so easy. If you take something Back-man, Then find the best move considering the shape of the maze, the location of the ghosts, the location of any additional areas to destroy, the availability of power-ups, etc. and the best plan is if Blinky or Clyde takes unexpected action that could end in disaster. We developed AIs that could handle these games as well, but for the chess and cove winners they had to take a very different approach.
At least until now. However, today Google’s Deep Mind division released an article describing the structure of AI that can deal with both Chess and Atari Classic.
Strengthens trees
Methods that have worked in games such as chess and go do their project using a tree-based approach in which they currently look forward to all the branches that emerge from different activities. This approach is computationally expensive, and the methods rely on knowing the rules of the game, which allows the current game status to be planned forward into future game levels.
Other games require mechanisms that do not really care about the state of the game. Instead, the algorithms simply evaluate what they “see” — in general, like the level of pixels on the screen for an arcade game – and select an action based on that. There is no internal model of the state of the game, and the training process often involves finding out what answer is appropriate based on that information. There have been some attempts to model a game position based on inputs such as pixel information, but they have not been successful in responding to what is on screen.
The new system, called Deep Mind Museo, is based on the work of Deep Mind Alphazero Master taught rule-based games such as AI, Chess and Go. But Mujiro also adds a new twist, which is significantly more flexible.
That twist is called “model-based reinforcement learning”. In a system that uses this approach, the software uses a game viewer to create an internal model of the game status. Critically, that level is not pre-configured based on any understanding of the game – AI can have a lot of flexibility about what information is included or not. The reinforcement of topics refers to the learning area training process, which allows AI to learn how the model used by AI is accurate and to identify when it has the information needed to make decisions.
Predictions
The model it creates is used to make many predictions. This includes the best action that gives the current status and status of the game as a result of this action. Critically, the prediction it makes is based on the internal model of its game levels — not the actual visual representation of the game as the location of the chess pieces. The forecast is made based on past experience, which is also subject to training.
Finally, the value of the move is assessed using the mechanisms of immediate rewards derived from that action (the point value of an area taken in chess, for example) and the final level of the game, such as victory or loss. Chess. These may include the same quests of trees in game levels made by previous chess algorithms, but in this case, the trees have AI’s own internal game models.
If it’s confusing, you might think about it too: Mujiro runs three ratings in parallel. One (policy process) selects the next move that gives the current model of the game position. One second foretells a new situation, and immediate rewards from the difference. One-third consider past experience to inform policy decisions. Each of these is the result of training that focuses on minimizing errors between these predictions and what is actually going on in the game.
More than that!
Obviously, for those in the Deep Mind, there would be no paper in Nature if this did not work. Mujiro took a million games against its predecessor Alpaciro to achieve similar performance in chess or show. According to Cowan, it surpassed Albazero after half a million games. In all three cases, Mujiro was considered superior to any human warrior.
But Museiro also excelled at the Atari Games team, which previously required a completely different AI approach. Compared to the previous best algorithm, which does not use an internal model, Museiro had a higher average and average score in 42 of the 57 games tested. So, while some more situations are lagging behind, it has now created model-based AI competition in these games, while maintaining the ability to handle rule-based games like chess and go.
Overall, this is an interesting achievement and an indication of how AIs are evolving in sophistication. A few years ago, it was an achievement to train AIs in a task similar to recognizing a cat in photos. But now, one can train many aspects of AI simultaneously — here, the algorithm that created the model, selected the move and practiced all at once predicting future rewards.
To some extent, this is the result of greater processing power, making millions of chess games possible. But if an AI is always going to be flexible enough to do multiple, remotely related tasks this is recognition of what we need to do.
Nature, 2020. DOI: 10.1038 / s41586-020-03051-4 (About DOIs).
Lists the image by Richard Heaven / Flickr