Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Kensuke Matsuzaki
although it could be less or more depending on match conditions >> and what neural net is used and other things. So for LZ at least, "ACT"-like >> behavior at low visits is not new. >> >> >> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki >&g

Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Kensuke Matsuzaki
;> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > _______ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- Kensuk

Re: [Computer-go] Crazy Stone is playing on CGOS 9x9

2020-05-08 Thread Kensuke Matsuzaki
9/SGF/2020/05/08/998312.sgf > > I am not strong enough to appreciate all the subtleties, but the complexity > looks amazing. -- Kensuke Matsuzaki ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AI Ryusei 2018 result

2018-12-19 Thread Kensuke Matsuzaki
Hi, > using rollouts to compensate for Leela's network being trained with the "wrong" komi for this competition: Yes, and it seems that rollouts isn't useful when trained komi is "correct". > Our program Natsukaze also used Leela Zero recent 70 selfplay games to train DNN. What would happen

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Kensuke Matsuzaki
Hi, How do you get the V(s) for those datasets? You play out the endgame > with the Monte Carlo playouts? > Yes, I use result of 100 playout from the endgame. Sometimes the result stored in sgf differs from result of playouts. zakki ___ Computer-go mai

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Kensuke Matsuzaki
Hi, I couldn't get positive experiment results on Ray. Rn's network structure of V and W are similar and share parameters, but only final convolutional layer are different. I trained Rn's network to minimize MSE of V(s) + W(s). It uses only KGS and GoGoD data sets, no self play with RL policy. Wh