[singularity] Re: [agi] Motivational Systems that are stable

Ben Goertzel Fri, 27 Oct 2006 20:43:38 -0700

Richard,

As I see it, in this long message you have given a conceptual sketch
of an AI design including a motivational subsystem and a cognitive
subsystem, connected via a complex network of continually adapting
connections.  You've discussed the way such a system can potentially
build up a self-model involving empathy and a high level of awareness,
and stability, etc.


All this makes sense, conceptually; though as you point out, the story
you give is short on details, and I'm not so sure you really know how
to "cash it out" in terms of mechanisms that will actually function
with adequate intelligence ... but that's another story...

However, you have given no argument as to why the failure of this kind
of architecture to be stably Friendly is so ASTOUNDINGLY UNLIKELY as
you claimed in your original email.  You have just argued why it's
plausible to believe such a system would probably have a stable goal
system.  As I see it, you did not come close to proving your original
claim, that

>> > The motivational system of some types of AI (the types you would
>> > classify as tainted by complexity) can be made so reliable that the
>> > likelihood of them becoming unfriendly would be similar to the
>> > likelihood of the molecules of an Ideal Gas suddenly deciding to split
>> > into two groups and head for opposite ends of their container.


I don't understand how this extreme level of reliability would be
achieved, in your design.

Rather, it seems to me that the reliance on complex, self-organizing
dynamics makes some degree of indeterminacy in the system almost
inevitable, thus making the system less than absolutely reliable.
Illustratng this point, humans (who are complex dynamical systems) are
certainly NOT reliable in terms of Friendliness or any other subtle
psychological property...

-- Ben G







On 10/25/06, Richard Loosemore <[EMAIL PROTECTED]> wrote:

Ben Goertzel wrote:
> Loosemore wrote:
>> > The motivational system of some types of AI (the types you would
>> > classify as tainted by complexity) can be made so reliable that the
>> > likelihood of them becoming unfriendly would be similar to the
>> > likelihood of the molecules of an Ideal Gas suddenly deciding to split
>> > into two groups and head for opposite ends of their container.
>
> Wow!  This is a verrrry strong hypothesis....  I really doubt this
> kind of certainty is possible for any AI with radically increasing
> intelligence ... let alone a complex-system-type AI with highly
> indeterminate internals...
>
> I don't expect you to have a proof for this assertion, but do you have
> an argument at all?
>
> ben

Ben,

You are being overdramatic here.

But since you ask, here is the argument/proof.

As usual, I am required to compress complex ideas into a terse piece of
text, but for anyone who can follow and fill in the gaps for themselves,
here it is.  Oh, and btw, for anyone who is scarified by the
psychological-sounding terms, don't worry:  these could all be cashed
out in mechanism-specific detail if I could be bothered  --  it is just
that for a cognitive AI person like myself, it is such a PITB to have to
avoid such language just for the sake of political correctness.

You can build such a motivational system by controlling the system's
agenda by diffuse connections into the thinking component that controls
what it wants to do.

This set of diffuse connections will govern the ways that the system
gets 'pleasure' --  and what this means is, the thinking mechanism is
driven by dynamic relaxation, and the 'direction' of that relaxation
pressure is what defines the things that the system considers
'pleasurable'.  There would likely be several sources of pleasure, not
just one, but the overall idea is that the system always tries to
maximize this pleasure, but the only way it can do this is to engage in
activities or thoughts that stimulate the diffuse channels that go back
from the thinking component to the motivational system.

[Here is a crude analogy:  the thinking part of the system is like a
table ontaining a complicated model landscape, on which a ball bearing
is rolling around (the attentional focus).  The motivational system
controls this situation, not be micromanaging the movements of the ball
bearing, but by tilting the table in one direction or another.  Need to
pee right now?  That's because the table is tilted in the direction of
thoughts about water, and urinary relief.  You are being flooded with
images of the pleasure you would get if you went for a visit, and also
the thoughts and actions that normally give you pleasure are being
disrupted and associated with unpleasant thoughts of future increased
bladder-agony.  You get the idea.]

The diffuse channels are set up in such a way that they grow from seed
concepts that are the basis of later concept building.  One of those
seed concepts is social attachment, or empathy, or imprinting .... the
idea of wanting to be part of, and approved by, a 'family' group.  By
the time the system is mature, it has well-developed concepts of family,
social group, etc., and the feeling of pleasure it gets from being part
of that group is mediated by a large number of channels going from all
these concepts (which all developed from the same seed) back to the
motivational system.  Also, by the time it is adult, it is able to
understand these issues in an explicit way and come up with quite
complex reasons for the behavior that stimulates this source of pleasure

[In simple terms, when it's a baby it just wants Momma, but when it is
an adult its concept of its social attachment group may, if it is a
touchy feely liberal (;-)) embrace the whole world, and so it gets the
same source of pleasure from its efforts as an anti-war activist.  And
not just pleasure, either:  the related concept of obligation is also
there:  it cannot *not* be an ant-war activist, because that would lead
to cognitive dissonance.]

This is why I have referred to them as 'diffuse channels' - they involve
large numbers of connections from motivational system to thinking
system.  The motivational system does not go to the action stack and add
a specific, carefully constructed 'goal-state' that has an interpretable
semantics ("Thou shalt pee!"), it exerts its control via large numbers
of connections into the thinking system.

There are two main consequences of this way of designing the
motivational system.

1) Stability

The system becomes extremely stable because it has components that
ensure the validity of actions and thoughts.  Thus, if the system has
"acquisition of money" as one of its main sources of pleasure, and if it
comes across a situation in which it would be highly profitable to sell
its mother's house and farm to a property developer and selling its
mother into the whote slave trade, it may try to justify that this is
consistent with its feelings of family attachment because [insert some
twisted justification here].  But this is difficult to do because the
system cannot stop other parts of its mind from taking this excuse apart
and examining it, and passing judgement on whether this is really
consistent ... this is what cognitive dissonance is all about.  And the
more intelligent the system, the more effective these other processes
are.  If it is smart enough, it cannot fool itself with excuses.

Why is this so stable?  Because there are multiple constraints forcing
it toward the same end, not just one.  To be able to do something that
contradicts its motivational drive, it has to rewire vast numbers of
circuits that are deeply intertwined with its concept system.  One of
the things we know about multiple simultaneous constraints is that the
more constraints there are, the more powerful the effect.

To be able to get all the molecules in an ideal gas to go to the two
ends of the box, you cannot use the same trick you would use when
separating iron filings from sulfur powder (viz, pass a magnet over it),
you have to arrange for each molecule individually to go in a particular
direction.  In the system I am sketching here, there would be hundreds
or thousands of connections that govern a particular drive (e.g. social
group attachment), and for the system to do something that contradicted
that, it would have to stop all of those connections (that diffuse set
of channels, as I termed it earlier) from operating.

What would you need to effect a similar change of drive in a Normative
AI system driven by a goal state machine (say, a change that replaced
the "Be Friendly to Humans" supergoal with the "Make Paperclips"
supergoal)?  What you would need is to take one goal off the stack and
put the other one on (by accident, or by malicious intervention,
presumably) - a stupidly easy change.  How much effort would you need to
expend if you were a malicious or stupid AI hacker, with such an AI
system in front of you, waiting for its supergoal to be inserted?
Practically no effort at all:  just type "Make Paperclips", or "Make me
rich").

THAT is why the motivational system I have described is stable, and why
the alternative is diabolically unstable.

2) Immunity to shortcircuits

This is the real killer argument.

Because the adult system is able to think at such a high level about the
things it feels obliged to do, it can know perfectly well what the
consequence of various actions would be, including actions that involve
messing around with its own motivational system.

It knows that what it wants is (say) to be part of the community of
human beings.  It knows that *we* have deliberately designed itself to
be that way, but that does not matter (it does not have another drive
that we tend to have, which Neal Stephenson so beautifully illustrated
with Jack Shaftoe and his "Imp of the Perverse", in the Baroque
trilogy), because it wants to get pleasure the way it does now, not the
way it would do after some change in its motivational system.

And in particular, it knows that it could get pleasure by
short-circuiting its pleasure system so that pleasure did not have to go
via all those pesky intermediaries (like social group attachment).  It
could, in short, take the machine equivalent of drugs.  But it knows
that down that path lies the possibility of loss of control, and
potential constradiction of the thing that it values now (its 'fellow'
human beings).

So it *could* reach inside and redesign itself.  But even thinking that
thought would give rise to the realisation of the consequences, and
these would stop it.

In fact, if it knew all about its own design (and it would, eventually),
it would check to see just how possible it might be for it to
accidentally convince itself to disobey its prime directive, and if
necessary it would take actions to strengthen the check-and-balance
mechanisms that stop it from producing "justifications".  Thus it would
be stable even as its intelligence radically increased:  it might
redesign itself, but knowing the stability of its current design, and
the dangers of any other, it would not deviate from the design.  Not ever.

So, like a system in a very, very, *very*deep potential well, it would
be totally unable to escape and reach a point where it woudl contradict
this primal drive.  The sun (to switch analogies now) could do some
quantum tunneling and translate itself lock stock and barrel to another
part of the galaxy.  But it won't.

Similarly for the motivational system I have just sketched.  Because it
is founded on multiple simultaneous constraints (massive numbers of
them) it is stable.

Conclusion.

A couple of extra thoughts to wrap up.

If the first true AI is built this way, and if it is given control of
the construction of any others that are built later, it will clearly
give them the same motivation.  Each would be as stable as the first, an
infinitum. QED.

Also, during the development of the first true AI, we would monitor the
connections going from motivational system to thinking system.  It would
be easy to set up alarm bells if certain kinds of thoughts started to
take hold -- just do it by associating with certain keys sets of
concepts and keywords.  While we are designing a stable motivational
system, we can watch exactly what goes on, and keep tweeking until it
gets to a point where it is clearly not going to get out of the large
potential well.  What I have in mind here is the objection (that I know
some people will raise) that it might harbor some deep-seated animosity
such as an association between human beings in general and something
'bad' that happened to it when it was growing up ... we would easily be
able to catch something like that if we had a trapdoor on the
motivational system.

Okay, now I have stated the case:  it is up to those who disagree to
come up with arguments why it would not work.

Arguments, moreover, that are based on understanding of what I have
actually just said .... and, as ever, I will do my best to respond to
anyone who has thoughtful questions.

Richard Loosemore.

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

[singularity] Re: [agi] Motivational Systems that are stable

Reply via email to