Re: [Computer-go] Playout policy optimization

2017-02-13 Thread Gian-Carlo Pascutto
On 12/02/2017 5:44, Álvaro Begué wrote: > I thought about this for about an hour this morning, and this is what I > came up with. You could make a database of positions with a label > indicating the result (perhaps from real games, perhaps similarly to how > AlphaGo trained their value network). L

Re: [Computer-go] Playout policy optimization

2017-02-12 Thread Brian Sheppard via Computer-go
Sent: Saturday, February 11, 2017 11:44 PM To: computer-go Subject: [Computer-go] Playout policy optimization Hi, I remember an old paper by Rémi Coulom ("Computing Elo Ratings of Move Patterns in the Game of Go") where he computed "gammas" (exponentials of scores that

[Computer-go] Playout policy optimization

2017-02-11 Thread Álvaro Begué
Hi, I remember an old paper by Rémi Coulom ("Computing Elo Ratings of Move Patterns in the Game of Go") where he computed "gammas" (exponentials of scores that you could feed to a softmax) for different move features, which he fit to best explain the move probabilities from real games. Similarly,