Brian Sheppard wrote:
In this strategy, one chooses a random number p, and then select the
strategy with highest historical mean if p epsilon, and the
strategy taken least often otherwise. If epsilon = C*log(n)/n, where
n is the number of experiments so far, then the strategy has zero
Gian-Carlo Pascutto wrote:
Remi Coulom has done some work in this area:
http://remi.coulom.free.fr/QLR/
It sounds very interesting (v-optimal sampling). But I don't understand
it enough to implement it. Your idea sounds simpler, but the enumeration
would be a problem, for parameters with wide
On that topic, I have around 17 flag who enable or not features in my
pure playouts bots, and I want to search the best combinations of them.
I known this is almost a dream but does anyone know the best way to
approximate this.
Pebbles randomly chooses (using a zero asymptotic regret strategy)