Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

2009-05-21 Thread Øystein Johansen
boomslang wrote: > Hi Øystein / others, > > I didn't know gnubg used just TD(0). This does make things easier for > me. The Sutton/Barto you're referring to..., is that the book > "Reinforcement Learning: An Introduction"? Yes! It's even available online in HTML formatting. > I do have a questi

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

2009-05-21 Thread boomslang
Hi Øystein / others, thanks for your quick answer. I didn't know gnubg used just TD(0). This does make things easier for me.  The Sutton/Barto you're referring to..., is that the book "Reinforcement Learning: An Introduction"? I do have a question about this supervised training, tho

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

2009-05-21 Thread Øystein Johansen
Ian Shaw wrote: >> Our experience is: TD is nice for kickstarting the training >> process. But supervised training is the real thing. Make a big >> database of positions and the rollout results according to these >> positions and train supervised. >> >> If you still would like to do TD training w

RE: [Bug-gnubg] TD(lambda) training for neural networks -- a question

2009-05-21 Thread Ian Shaw
> -Original Message- > From: Øystein Johansen > Sent: 21 May 2009 09:19 > > Our experience is: TD is nice for kickstarting the training > process. But supervised training is the real thing. Make a > big database of positions and the rollout results according > to these positions and

Re: [Bug-gnubg] integration with gnubg

2009-05-21 Thread Øystein Johansen
Alexander Smirnov wrote: > Hello > > I wonder if it is possible to reuse gnubg engine in my application. I'm > developing open source backgammon game for K Desktop Environment and > looking for strong computer opponent. Looks like gnubg is the greatest > opponent we have now! :-) Cool! How far ha

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

2009-05-21 Thread Øystein Johansen
boomslang wrote: > Hi all, > > I have a question regarding TD(lambda) training by Tesauro (see > http://www.research.ibm.com/massive/tdl.html#h2:learning_methodology). > > The formula for adapting the weights of the neural net is > > w(t+1)-w(t) = a * [Y(t+1)-Y(t)] * sum(lambda^(t-k) * nabla(w)Y