[R] AlphaGo Zero: Learning from scratch | DeepMind

It's really interesting seeing the game progressions. Within the first ~10 hours the games shown seem nonsensical. at the 12 hour mark things make a little sense if a bit unorthodox. At 16 hours and beyond it starts to look vaguely human.

Some things I took notice of: - At ~16 hours it prefers the 3-4 probe instead of the traditional knight's approach. - It experiments with the small avalanche joseki at ~18 hours (one of the most complicated) - From hour 46 and beyond, it seems to prefer early 3-3 invasions played in a manner similar to the more recent AlphaGo versions - And what is probably the largest deviation from previous AlphaGo versions and conventional wisdom is the early kick shown at hours 55 and 70. That move is generally considered bad if you don't already have a pincering stone and you would get yelled at by any Go teacher for playing it. Interestingly, the opposing AlphaGo does not play the traditional follow up move at hour 70 that is thought to punish the kick. It would be interesting to see what AlphaGo is afraid of or if it just thinks the corner invasion is better. In any case the resulting position from the kick still looks bad for the kicker to my naive eyes. I wonder if pros will start experimenting with it.

/r/MachineLearning Thread Link - deepmind.com