It's interesting to leave unused parameters or unnecessary
parameterizations in the paper. It telegraphs what was being tried as
opposed to simply writing something more concise and leaving the reader to
wonder why and how those decisions were made.
s.
On Nov 7, 2017 10:54 PM, "Imran Hendley" wr
Great, thanks guys!
On Tue, Nov 7, 2017 at 1:51 PM, Gian-Carlo Pascutto wrote:
> On 7/11/2017 19:07, Imran Hendley wrote:
> > Am I understanding this correctly?
>
> Yes.
>
> It's possible they had in-betweens or experimented with variations at
> some point, then settled on the simplest case. You
On 7/11/2017 19:07, Imran Hendley wrote:
> Am I understanding this correctly?
Yes.
It's possible they had in-betweens or experimented with variations at
some point, then settled on the simplest case. You can vary the
randomness if you define it as a softmax with varying temperature,
that's harder
If I understand your question correctly, "goes to 1" can happen as quickly
or slowly as you'd like. Yes?
On Nov 7, 2017 7:26 PM, "Imran Hendley" wrote:
Hi, I might be having trouble understanding the self-play policy for
AlphaGo Zero. Can someone let me know if I'm on the right track here?
The
Your understanding matches mine. My guess is that they had a temperature
parameter in the code that would allow for things like slowly transitioning
from random sampling to deterministically picking the maximum, but they
ended up using only those particular values.
Álvaro.
On Tue, Nov 7, 2017
Hi, I might be having trouble understanding the self-play policy for
AlphaGo Zero. Can someone let me know if I'm on the right track here?
The paper states:
In each position s, an MCTS search is executed, guided by the neural
network f_θ . The
MCTS search outputs probabilities π of playing each m