Hi all,
After a long search on the computer go mailing list archive and reading and
reading again the paper of Gelly and Silver (ICML 2007) I didn't find
answers to my question.
In this paper they introduce a way to select the next move, at a given
state, using the rave and uct value of its
On Fri, Mar 28, 2008 at 11:20 AM, Jaonary Rabarisoa [EMAIL PROTECTED] wrote:
Hi all,
After a long search on the computer go mailing list archive and reading and
reading again the paper of Gelly and Silver (ICML 2007) I didn't find
answers to my question.
In this paper they introduce a way to
On Fri, 2008-03-28 at 11:20 +0100, Jaonary Rabarisoa wrote:
- its rave and uct value are defined ( in this case we can
compute the above score)
- only the rave value is defined (in this situation the n(s,a)
= 0 and the uct value is not defined)
-
Mark Boon wrote:
Sorry, without a bit more explanation, the assembler code is very
hard to understand. What exactly does it do?
The first source code was just an example to see what kind of code
is generated. The second is useful, if you understand asm you should
understand it. The board
So if I understand, at each node we need to play every possible action once
at first, even many of these actions are surely non optimal. And this may be
slow if the number of the possible action at this node is huge.
When you talk about FPU, does it mean that you give a kind of default value
for
I use FPU for both values for precisely the reasons you describe.
Sent from my iPhone
On Mar 28, 2008, at 9:36 AM, Jaonary Rabarisoa [EMAIL PROTECTED]
wrote:
So if I understand, at each node we need to play every possible
action once at first, even many of these actions are surely non
terry mcintyre wrote:
It is possible to get some remarkably high correlation
between the moves played by pros and a predictor - yet
still not have a good program. Why?
We have a random variable, the place at which a player
plays, and some variables that we can compute. The
distribution of
On Fri, Mar 28, 2008 at 2:36 PM, Jaonary Rabarisoa [EMAIL PROTECTED] wrote:
So if I understand, at each node we need to play every possible action once
at first, even many of these actions are surely non optimal. And this may be
slow if the number of the possible action at this node is huge.
On 28-mrt-08, at 09:43, Jacques BasaldĂșa wrote:
The first source code was just an example to see what kind of code
is generated. The second is useful, if you understand asm you
should understand it.
Well, the only serious assembler I ever wrote was on a 6502 :-) And
that was a very long
So to sum up we have the following pseudo code :
at a given node :
- find the child (among the visited child only) that maximizes de UCT-RAVE
value
- if this maximum UCT-RAVE value is less than FPU value and if there still
exisits unvisited nodes :
choose one unvisited node
- continue
Is
So to sum up we have the following pseudo code :
at a given node :
- find the child (among the visited child only) that maximizes de UCT-RAVE
value
- if this maximum UCT-RAVE value is less than FPU value and if there still
exisits unvisited nodes :
choose one unvisited node
- continue
On Fri, Mar 28, 2008 at 3:10 PM, Jaonary Rabarisoa [EMAIL PROTECTED] wrote:
So to sum up we have the following pseudo code :
at a given node :
- find the child (among the visited child only) that maximizes de UCT-RAVE
value
- if this maximum UCT-RAVE value is less than FPU value and if there
12 matches
Mail list logo