This is starting to feel like asking along the lines of, "how can I explain
this to myself or improve on what's already been done in a way that will
make this whole process work faster on my hardware".

It really doesn't look like there are a bunch of obvious shortcuts. That's
the whole point of decision-trees imposed by humans for 20+ years on the
game; it wasn't really better.

Probably what would be good to convince oneself of these things would be to
challenge each assumption in divergent branches (suggested earlier) and
watch the resulting players' strength over time. Yes, this might take a
year or more on your hardware.

I feel like maybe a lot of this is sour grapes; let's  please again
acknowledge that the hobbyists aren't there yet without trying to tear down
the accomplishments of others.

s.

On Nov 27, 2017 7:36 PM, "Eric Boesch" <ericboe...@gmail.com> wrote:

> I imagine implementation determines whether transferred knowledge is
> helpful. It's like asking whether forgetting is a problem -- it often is,
> but evidently not for AlphaGo Zero.
>
> One crude way to encourage stability is to include an explicit or implicit
> age parameter that forces the program to perform smaller modifications to
> its state during later stages. If the parameters you copy from problem A to
> problem B also include that age parameter, so the network acts old even
> though it is faced with a new problem, then its initial exploration may be
> inefficient. For an MCTS based example, if a MCTS node is initialized to a
> 10877-6771 win/loss record based on evaluations under slightly different
> game rules, then with a naive implementation, even if the program discovers
> the right refutation under the new rules right away, it would still need to
> revisit that node thousands of times to convince itself the node is now
> probably a losing position.
>
> But unlearning bad plans in a reasonable time frame is already a feature
> you need from a good learning algorithm. Even AlphaGo almost fell into trap
> states; from their paper, it appears that it stuck with 1-1 as an opening
> move for much longer than you would expect from a program probably already
> much better than 40 kyu. Even if it's unrealistic for Go specifically, you
> could imagine some other game where after days of analysis, the program
> suddenly discovers a reliable trick that adds one point for white to every
> single game. The effect would be the same as your komi change -- a mature
> network now needs to adapt to a general shift in the final score. So the
> task of adapting to handle similar games may be similar to the task of
> adapting to analysis reversals within a single game, and improvements to
> one could lead to improvements to the other.
>
>
>
> On Fri, Nov 24, 2017 at 7:54 AM, Stephan K <stephan.ku...@gmail.com>
> wrote:
>
>> 2017-11-21 23:27 UTC+01:00, "Ingo Althöfer" <3-hirn-ver...@gmx.de>:
>> > My understanding is that the AlphaGo hardware is standing
>> > somewhere in London, idle and waitung for new action...
>> >
>> > Ingo.
>>
>> The announcement at
>> https://deepmind.com/blog/applying-machine-learning-mammography/ seems
>> to disagree:
>>
>> "Our partners in this project wanted researchers at both DeepMind and
>> Google involved in this research so that the project could take
>> advantage of the AI expertise in both teams, as well as Google’s
>> supercomputing infrastructure - widely regarded as one of the best in
>> the world, and the same global infrastructure that powered DeepMind’s
>> victory over the world champion at the ancient game of Go."
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to