RE: Cubeful and matchful training a BG bot

Bug reports for and general discussion about GNU Backgammon. Thu, 11 Apr 2024 01:37:14 -0700

MK: I didn't say the selection was random. The self-play moves were random.

These two statements appear contradictory to me. What do you mean by 
"selection" if not the "self-play moves"? Please clarify your understanding of 
which parts of the training are random.

My understanding is this:
0) the weights of the neural net are initialised to random values. This is the 
only random part of the process.
1) the bot generates a list of legal moves. 
2) the neural net evaluates each move and gives it a score for wins, gammon 
wins, bg wins, gammon losses, bg losses
3) the bot plays the HIGHEST scoring move (This is NOT a randomly selected 
move, but a simple calculation - a sort - to find the move with the best equity)
4) the bot uses the outcomes of the games to adjust the weights of the neural 
network according to an algorithm
5) repeat the process of steps 1 - 4 many times until the bot gets good at 
backgammon

Which steps, if any, do you disagree with? You seem to be saying that you think 
step 3 is random selection of a move.

Based on my understanding of step 3, I wrote:

IS: How do you propose to rank double vs no double, and take vs pass?
MK: To answer your last question, just like checker decisions, cube decisions 
to double, take, pass, etc. would be random

If my understanding is correct, your response above does not successfully 
answer my question, so I come back to it: How do you propose to rank double vs 
no double, and take vs pass (at step 3 of the above procedure)?

Regards,
Ian

-----Original Message-----
From: Murat K <playbg-...@yahoo.com> 
Sent: Wednesday, April 10, 2024 10:48 PM
To: bug-gnubg@gnu.org; Ian Shaw <ian.s...@riverauto.co.uk>
Subject: Cubeful and matchful training a BG bot

Hi Ian,

Since this specific subject also strayed from the value of cube ownership, I'll 
do the same with it by creating a new thread and posting it also to RGB and 
Bgonline. My response to you is below the quoted posts.

 > ---------------------------------------------------------
 > *From:* MK <playbg-...@yahoo.com>
 > *Sent:* Wednesday, April 3, 2024 10:01:17 PM  > *To:* Ian Shaw 
 > <ian.s...@riverauto.co.uk>; GnuBg Bug <bug-gnubg@gnu.org>  > *Subject:* Re: 
 > Interesting question/experiment about value of cube ownership  > On 4/2/2024 
 > 5:13 AM, Ian Shaw wrote:
 >
 >> What would be your proposed structure for training a  >> cubeful bot? What 
 >> gains and obstacles do you foresee.
 >
 > I don't know what you mean by "structure". What I propose  > is doing the 
 > same thing done training TD-Gammon v.1, i.e.
 > random self-play, but this time also cubeful and matchful,  > i.e. random 
 > cube as well as checker decisions.
 >
 > Apparently Tseauro still works at IBM with access to huge  > CPU powers. 
 > Perhaps he can be put to shame for the damage  > he caused to BG AI by what 
 > he did with TD-Gammon v.2 and  > be urged to redeem himself.
 >
 > In other forums, people talk about doing "XG rollouts on  > Amazon's cloud 
 > servers", etc. Doing more biased rollouts  > is plain stupid/illogical. Any 
 > such efforts would be put  > to better use in training a new bot instead. 
 > The question  > is who would volunteer to do it.
 >
 > People like the Alpha-Zero team, etc. don't seem to want  > to touch 
 > "gamblegammon" with a ten feet pole, possibly  > because of the gambling 
 > nature of the game.
 >
 > In the past, I have suggested in RGB that random rollout  > feature can be 
 > added to GnuBG and results from trustable  > users can be collected over 
 > time in a central database  > to gradually create a bot that won't rely on 
 > concocted,  > biased/inaccurate cube formulas and match equity tables.
 >
 > Unfortunately the faithfuls are happy with their dogmas  > and no better 
 > bots are likely in the near future... :(

 > ----------------------------------------------------------

On 4/3/2024 11:44 PM, Ian Shaw wrote:
 > MK: What I PROPOSE is doing the same thing done training  > TD-Gammon v.1, 
 > I.E. random self-play, but this time also  > cubeful and MATCHFUL, i.e. 
 > random cube as well as checker  > decisions.
 >
 > As I remember it (though it's many years since I read the  > research), the 
 > self-play wasn't accomplished by picking  > random moves. It was the initial 
 > network weights that were  > random. The move picked was the best-ranked 
 > move of all  > the evaluated moves. This is a calculation, not a random  > 
 > selection.
 >
 > How do you propose to rank double vs no double, and take  > vs pass?

 > ----------------------------------------------------------

I didn't say the selection was random. The self-play moves were random. There 
were no "calculations" either. Moves were compared and better performing ones 
rose up in rank. It was kind of a "bubble sorting" of large numbers of 
statistical data. I remember that Tom Keith had used the expression 
"percolating up" in describing how he trained a Hypergammon bot through 
cubeless random self-play. It's the only way, (using "empirical data and 
scientific method"), to train a "non-human-biased" BG bot, (at least as best as 
technically, minimally as possible).

To answer your last question, just like checker decisions, cube decisions to 
double, take, pass, etc. would be random also and the "correct" cube decisions 
would "bubble up" the same way. It will take huge amounts of computing power 
and time, but nowadays we have both.

For "matchful" play, checker and/or cube decisions based on match score need to 
be random as well, even if that requires exponentially more computing power and 
time. Again, we have both. It's just a matter of whether we want to do it. We 
can distribute the task and/or spread it over time to let the empirical, 
statistical data trickle in and accumulate.

Perhaps other people more knowledgeable in bot training can suggest ways to go 
about it in more technical details.

MK

RE: Cubeful and matchful training a BG bot

Reply via email to