Deepmind develops version of AlphaGo that learns to play go without knowledge

Spread the love

Google’s AI department Deepmind has developed a variant of AlphaGo that can teach itself the game of go. The system, AlphaGo Zero, does not require a large number of go games played by humans as input.

The previous version needed this, but AlphaGo Zero only learns by playing the game against itself, Deepmind describes in a paper and an accompanying blog post. The new version starts learning by initially playing completely random go games. After three days, AlphaGo Zero was able to reach the level of the version of the system that Lee Sedol defeated. After 21 days, the system reached the level of the Master variant, which won online matches against 60 players, beating Chinese top player Ke Jie. After 40 days, the new system was better than any previous AlphaGo version, according to Deepmind.

Elo rating, of the different variants

This development is possible on the basis of a new form of reinforcement learning. Initially, a neural network plays games against itself using a search algorithm, the organization explains. This version knows nothing about the game of go, except the rules. During gameplay, the neural network predicts moves and the eventual winner of games. Because there are always new iterations, a system is created that improves itself every time. The advantage would be that the new version of AlphaGo is no longer ‘limited by the limits of human knowledge’, but can start completely from scratch.

There are also other differences from previous versions. For example, AlphaGo Zero only has one neural network, where there were previously two. In doing so, one selected the move and the other predicted the winner from each new position. These functions are now combined in the new variant. Because the system has become more and more efficient over time, it also needs fewer tpu’s. The AlphaGo version that beat Lee Sedol had 48 of these chips, while the current version is only four. AlphaGo Zero defeated the Lee version after three days with a score of 100-0.

You might also like
Exit mobile version