Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread uurtamo .
It's interesting to leave unused parameters or unnecessary parameterizations in the paper. It telegraphs what was being tried as opposed to simply writing something more concise and leaving the reader to wonder why and how those decisions were made. s. On Nov 7, 2017 10:54 PM, "Imran Hendley" wr

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Imran Hendley
Great, thanks guys! On Tue, Nov 7, 2017 at 1:51 PM, Gian-Carlo Pascutto wrote: > On 7/11/2017 19:07, Imran Hendley wrote: > > Am I understanding this correctly? > > Yes. > > It's possible they had in-betweens or experimented with variations at > some point, then settled on the simplest case. You

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Gian-Carlo Pascutto
On 7/11/2017 19:07, Imran Hendley wrote: > Am I understanding this correctly? Yes. It's possible they had in-betweens or experimented with variations at some point, then settled on the simplest case. You can vary the randomness if you define it as a softmax with varying temperature, that's harder

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread uurtamo .
If I understand your question correctly, "goes to 1" can happen as quickly or slowly as you'd like. Yes? On Nov 7, 2017 7:26 PM, "Imran Hendley" wrote: Hi, I might be having trouble understanding the self-play policy for AlphaGo Zero. Can someone let me know if I'm on the right track here? The

Re: [Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Álvaro Begué
Your understanding matches mine. My guess is that they had a temperature parameter in the code that would allow for things like slowly transitioning from random sampling to deterministically picking the maximum, but they ended up using only those particular values. Álvaro. On Tue, Nov 7, 2017

[Computer-go] AlphaGo Zero self-play temperature

2017-11-07 Thread Imran Hendley
Hi, I might be having trouble understanding the self-play policy for AlphaGo Zero. Can someone let me know if I'm on the right track here? The paper states: In each position s, an MCTS search is executed, guided by the neural network f_θ . The MCTS search outputs probabilities π of playing each m