Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
I see the same dynamics that you do, Darren. The 400-game match always has some probability of being won by the challenger. It is just much more likely if the challenger is stronger than the champion. -Original Message- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Be

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Eric Boesch
I could be drawing wrong inferences from incomplete information, but as Darren pointed out, this paper does leave the impression Alpha Zero is not as strong as the real AlphaGo Zero, in which case it would be clearer to say so explicitly. Of course the chess and shogi results are impressive regardl

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Darren Cook
>> One of the changes they made (bottom of p.3) was to continuously >> update the neural net, rather than require a new network to beat >> it 55% of the time to be used. (That struck me as strange at the >> time, when reading the AlphaGoZero paper - why not just >50%?) Gian wrote: > I read that a

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
Requiring a margin > 55% is a defense against a random result. A 55% score in a 400-game match is 2 sigma. But I like the AZ policy better, because it does not require arbitrary parameters. It also improves more fluidly by always drawing training examples from the current probability distributi

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Brian Sheppard via Computer-go
The chess result is 64-36: a 100 rating point edge! I think the Stockfish open source project improved Stockfish by ~20 rating points in the last year. Given the number of people/computers involved, Stockfish’s annual effort level seems comparable to the AZ effort. Stockfish is really, reall

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Petr Baudis
On Wed, Dec 06, 2017 at 09:57:42AM -0800, Darren Cook wrote: > > Mastering Chess and Shogi by Self-Play with a General Reinforcement > > Learning Algorithm > > https://arxiv.org/pdf/1712.01815.pdf > > One of the changes they made (bottom of p.3) was to continuously update > the neural net, rather

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 19:48, Xavier Combelle wrote: > Another result is that chess is really drawish, at the opposite of shogi We sort-of knew that, but OTOH isn't that also because the resulting engine strength was close to Stockfish, unlike in other games? -- GCP ___

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Ingo Althöfer
> The AlphaZero paper shows it out-performs AlphaGoZero, but they are > comparing to the 20-block, 3-day version. Not the 40-block, 40-day > version that was even stronger. > As papers rarely show failures, can we take it to mean they couldn't > out-perform their best go bot, do you think? ... > >

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Gian-Carlo Pascutto
On 6/12/2017 18:57, Darren Cook wrote: >> Mastering Chess and Shogi by Self-Play with a General Reinforcement >> Learning Algorithm >> https://arxiv.org/pdf/1712.01815.pdf > > One of the changes they made (bottom of p.3) was to continuously update > the neural net, rather than require a new networ

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Xavier Combelle
Another result is that chess is really drawish, at the opposite of shogi Le 06/12/2017 à 18:50, Richard Lorentz a écrit : > One chess result stood out for me, namely, just how much easier it was > for AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather > than with black (3 wins, 47 d

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Ingo Althöfer
"Joshua Shriver" asked: > What about arimaa? My personal impression: Arimaa should be rather easy for the AlphaZero approach. My questions: * How well does the AlphaZero approach perform in Non-zero-sum games? (or in games with more than two players) * How well does the AlphaZero approach perf

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Darren Cook
> Mastering Chess and Shogi by Self-Play with a General Reinforcement > Learning Algorithm > https://arxiv.org/pdf/1712.01815.pdf One of the changes they made (bottom of p.3) was to continuously update the neural net, rather than require a new network to beat it 55% of the time to be used. (That s

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Richard Lorentz
One chess result stood out for me, namely, just how much easier it was for AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with black (3 wins, 47 draws, 0 losses). Maybe we should not give up on the idea of White to play and win in chess! On 12/06/2017 01:24 AM, Hiroshi Y

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Joshua Shriver
What about arimaa? On Wed, Dec 6, 2017 at 9:28 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de> wrote: > It seems, we are living in extremely > heavy times ... > > I want to go to bed now and meditate for threee days. > >> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. >> Ma

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread David Wu
Hex: https://arxiv.org/pdf/1705.08439.pdf This is not on a 19x19 board, and it was not tested against the current state of the art (Mohex 1.0 was the state of the art at its time, but is at least several years old now, I think), but they do get several hundred elo points stronger than this old ver

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Aja Huang
2017-12-06 13:52 GMT+00:00 Gian-Carlo Pascutto : > On 06-12-17 11:47, Aja Huang wrote: > > All I can say is that first-play-urgency is not a significant > > technical detail, and what's why we didn't specify it in the paper. > > I will have to disagree here. Of course, it's always possible I'm > m

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Álvaro Begué
My hand-wavy argument succumbs to experimental data. And to a better argument. :) I stand corrected. Thanks, Álvaro. On Wed, Dec 6, 2017 at 8:52 AM, Gian-Carlo Pascutto wrote: > On 06-12-17 11:47, Aja Huang wrote: > > All I can say is that first-play-urgency is not a significant > > technica

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Ingo Althöfer
It seems, we are living in extremely heavy times ... I want to go to bed now and meditate for threee days. > DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. > Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning > Algorithm > https://arxiv.or

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 06-12-17 11:47, Aja Huang wrote: > All I can say is that first-play-urgency is not a significant > technical detail, and what's why we didn't specify it in the paper. I will have to disagree here. Of course, it's always possible I'm misunderstanding something, or I have a program bug that I'm

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Andy
Thanks for letting us know the situation Aja. It must be hard for an engineer to not be able to discuss the details of his work! As for the first-play-urgency value, if we indulge in some reading between the lines: It's possible to interpret the paper as saying first-play-urgency is zero. After re

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Aja Huang
2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto : > On 03-12-17 17:57, Rémi Coulom wrote: > > They have a Q(s,a) term in their node-selection formula, but they > > don't tell what value they give to an action that has not yet been > > visited. Maybe Aja can tell us. > > FWIW I already asked Aja this

[Computer-go] AlphaZero

2017-12-06 Thread cazenave
Hi, It appears AlphaZero surpasses AlphaGo Zero at Go, Stockfish at Chess and Elmo at Shogi in a few hours of self play... https://arxiv.org/pdf/1712.01815.pdf Best, Tristan. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-

[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2017-12-06 Thread Hiroshi Yamashita
Hi, DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm https://arxiv.org/pdf/1712.01815.pdf AlphaZero(Chess) outperformed Stockfish after 4 hours, AlphaZero(Shogi) outperformed elmo

[Computer-go] Google did it again this time with Chess and Shogi!

2017-12-06 Thread valkyria
https://arxiv.org/abs/1712.01815 Best Magnus ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 03-12-17 17:57, Rémi Coulom wrote: > They have a Q(s,a) term in their node-selection formula, but they > don't tell what value they give to an action that has not yet been > visited. Maybe Aja can tell us. FWIW I already asked Aja this exact question a bit after the paper came out and he told m

Re: [Computer-go] action-value Q for unexpanded nodes

2017-12-06 Thread Gian-Carlo Pascutto
On 03-12-17 17:57, Rémi Coulom wrote: > They have a Q(s,a) term in their node-selection formula, but they > don't tell what value they give to an action that has not yet been > visited. Maybe Aja can tell us. FWIW I already asked Aja this exact question a bit after the paper came out and he told m