Thanks to everybody for the links! They have given me a good amount of stuff to look at that will help with the proposal.
Many of these are very much in the same spirit as what I am proposing, though most seem to be concerned primarily with the tree rather than the playouts. It’s interesting that starting the counters at 1 is equivalent to using a uniform prior. Similarly, it can be easily seen that with a Beta(prior wins, prior losses) prior, the posterior will be Beta(prior wins + observed wins, prior losses + observed losses), and thus the new expected value is just total wins / total runs, as is used everywhere. Thank you again for your time and the links, Alex > On Sep 25, 2014, at 4:06 PM, Álvaro Begué <alvaro.be...@gmail.com> wrote: > > I believe this has been discussed in the mailing list before: If your prior > distribution of the win rate of a move is uniform, after L losses and W wins > the posterior distribution will be a beta distribution with alpha=W+1 and > beta=L+1. The expected value of this distribution is alpha/(alpha+beta) = > (W+1)/(W+L+2), which is equivalent to the common trick of starting the > counters W and L at 1 instead of at 0. > > Of course one could start with a different prior, but I think staying within > the family of beta distributions makes sense because it's very tractable. > > Is that the kind of thing you were looking for? > > > Álvaro. > > > > On Thu, Sep 25, 2014 at 6:28 PM, Alexander Terenin <atere...@ucsc.edu > <mailto:atere...@ucsc.edu>> wrote: > Hello everybody, > > I’m a PhD student in statistics at the University of California, Santa Cruz > who previously worked on the Go program Orego, currently in the process of > applying for the NSF fellowship. I am working on a Bayesian statistics - > related research proposal that I would like to use in my application, and > wanted to know if someone was aware of any research related to my topic that > has been done. > > Currently, it seems most MCTS-based Go programs, in the playouts, treat the > strength (win rate) of each move as a fixed, unknown value, which is then > estimated using frequentist techniques (specifically, by playing a random > game, and taking the estimate to be wins / total runs). Has anyone attempted > to instead statistically estimate the strength of each move using Bayesian > techniques, by defining a set of prior beliefs about the strength of a > certain move, playing a random game, and then integrating the information > gained from the random game together with the prior beliefs using Bayes' > Rule? Equivalently, has anyone defined the strength of each move to be a > random variable rather than a fixed and unknown value? Without making this > email too long, there’s some theoretical advantages that might allow for more > information to be extracted from each playout if this setup is used. > > If you are aware of any work in this direction that has been done, I would > love to hear from you! I’ve been looking through a variety of papers, and > have yet to find anything - it seems that any work remotely related to Bayes’ > Rule has concerned the tree, not the playouts. > Thank you in advance, > > Alex Terenin > atere...@ucsc.edu <mailto:atere...@ucsc.edu> > _______________________________________________ > Computer-go mailing list > Computer-go@dvandva.org <mailto:Computer-go@dvandva.org> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > <http://dvandva.org/cgi-bin/mailman/listinfo/computer-go> > _______________________________________________ > Computer-go mailing list > Computer-go@dvandva.org > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go