Go is often described as the “Chinese version of Chess,” but that description barely does the deceivingly simplistic game justice. The object of the game is to have majority control of the board. You do so by placing your white (or black) pieces (stones) on the board and using them to surround your opponent’s pieces so that they are forced to remove them. If it sounds less complicated than chess, it’s not. To put things in perspective, for each move in chess you have about 40 options. Each move on the 19-by-19 Go grid affords you 200 choices.
“There are more configurations on the board than there are atoms in the universe”
DeepMind applied machine learning with not one, but two neural networks called “Policy” and “Value.” Both look at Go’s myriad game play possibilities, but in two quite specific ways. Policy narrows the field of possible moves to a handful of promising ones, while Value looks for positive outcomes without driving all the way to every possible game conclusion. Policy network looks at some 30 million games by human Go experts to accurately predict moves up to 57% of the time. The previous record was 44%. AlphaGo essentially plays millions of games between its two neural networks and learns how to be a better Go player through trial and error and reinforcement learning.
Holy Catfish! If you are interested in reviewing the games AlphaGo played, you can see them here.