Re: [agi] Re: [singularity] Motivational Systems that are stable

Richard Loosemore Mon, 30 Oct 2006 08:14:09 -0800


Ben,

I guess the issue I have with your critique is that you say that I havegiven no details, no rigorous argument, just handwaving, etc.

But you are being contradictory: on the one hand you say that theproposal is vague/underspecified/does not give any arguments .... butthen having said that, you go on to make specific criticisms and saythat it is wrong on this or that point.

I don't think you can have it both ways. Either you don't see anargument, and rest your case, or you do see an argument and want tocritique it. You are trying to do both: you repeatedly make broadaccusations about the quality of the proposal ("some verrrrry hand-wavy,intuitive suggestions", "you have not given any sort of rigorousargument", "... your intuitive suggestions...", "you did not give anydetails as to why you think your proposal will 'work'", etc. etc.), butthen go on to make specific points about what is wrong with it.

Now, if the specific points you make were valid criticisms, I couldperhaps overlook the inconsistency and just address the criticisms. Butthat is exactly what I just did, and your specific criticisms, as Iexplained in the last message, were mostly about issues that had nothingto do with the general class of architectures I proposed, but only withweird cases or weird issues that had no bearing on my case.

Since you just dropped most of those issues (except one, which I willaddress in a moment), I must assume that you accept that I have given agood reply to each of them. But instead of conceding that the argumentI gave must therefore have some merit, you repeat -- even moreinsistently than before -- that there is nothing in the argument, thatit is all just vague handwaving etc.


No fair!

This kind of response:

  -  Your argument is either too vague or I don't understand it.

Would be fine, and I would just try to clarify it in the future.

But this response:

  -  This is all just handwaving, with no details and no argument.
  -  It is also a wrong argument, for these reasons:
  -  [Reasons that are mostly just handwaving or irrelevant].

Is not so good.

*************************

I will say something about the specific point you make about my claimthat as time goes on the system will check new ideas against previousones to make sure that new ones are consistent with ALL the old ones, sotherefore it will become more and more stable.

What you have raised is a minor technical issue, together with someconfusion about what exactly I meant:

The "ideas" being checked against "all previous ideas" are *not* theincoming general learned concepts (cup, salt, cricket, democracy,sneezes..... etc.) but the concepts related to planned actions and thesystem's base of moral/ethical/motivational concerns. Broadly speaking,it is when there is a new "perhaps I should do this ..." idea that thecomparison starts. I did actually say this, but it was a littleobscurely worded.

Now, when I said "checked for consistency against all previous ideas" Iwas speaking rather loosely (my bad). Obviously I would not do this byan exhaustive comparison [please: I don't need to have it explained tome that this is O(n^^2)! :-) ]. The mechanism would work something likea parallel terraced scan: issues are represented at different levels ofgranularity, and if any kind of inconsistency is detected at one of thehigh (low-granularity) levels, it provokes a focussing on the problemand an elaboration of everything involved in the idea, which then canbring in lots more consideration, potentially resulting in a completecomparison on that one issue. In addition, but the system would usevarious other (monte-carlo-esque) techniques for taking random looks atthe implications of some issue, to catch problems that might not getpast the top level scan.

Specific example. The system thinks that maybe selling its mother intothe white slave trade is a good way to make money. But this very ideacauses simple associations with [white slave trade] to kick in (forexample [misery], [brutality], [betrayal], and so on). These simpleassociations get connected with [mother] and in a moment the systemfinds that the concept [unhappy mother] sends a big fat negative signalback to the motivational system, waking up the module that isresponsible for the [social group attachment] motivation. Pretty soonthis kicks in a full-scale reexamination of the entire idea, and whenexamined in detail it is found to be inconsistent with the system'sprime motivations.

So although you made a reasonable point, this is not a technicaldifficulty that cannot be handled easily.

I note that you did anticipate this reply, when you said "Some heuristicshortcuts must be used to decrease the number of comparisons, and suchheuristics introduce the possibility of error...", and then also "Thekind of distributed system you are describing seems NOT to solve thecomputational problem of verifying the consistency of each new knowledgeitem with each other knowledge item."

But these two statements are actually very hard to defend. Heuristicsthat decrease the number of comparisons IN A CONVENTIONAL AI SYSTEM areunreliable, precisely because of the "fragile, mechanistic" nature ofsuch AI designs (see my reply to Hank Conn) ... but the whole force ofmy argument is to do the job without such conventional AI techniques, sothat one won't fly unless you can say why. As for the type ofdistributed system I propose being unable to solve this kind of problem,the very reverse is true: parallel terraced scans are among the verybest methods known for dealing with this kind of problem! I couldn'thave chosen a better architecture. Your statement is mystifying.


***********

What I feel I have done now is to address every one of the specificcriticisms that you have put on the table to date.

I am certainly willing to accept that, beyond those specific points, youmay have a gut feeling that it doesn't work, or that you prefer not toaddress it in more detail at this stage. I'd be happy to postponefurther debate until I can get a more detailed version in print.

What I would find extremely unfair would be more accusations that it is"just vague handwaving" without specific questions designed to show thatthe argument falls apart under probing. I don't see the argumentfalling apart, so making that accusation again would be unjustified.



Richard Loosemore




Ben Goertzel wrote:

Hi,

There is something about the gist of your response that seemed strange
to me, but I think I have put my finger on it:  I am proposing a general
*class* of architectures for an AI-with-motivational-system.  I am not
saying that this is a specific instance (with all the details nailed
down) of that architecture, but an entire class..... an approach.

However, as I explain in detail below, most of your criticisms are that
there MIGHT be instances of that architecture that do not work.


No.   I don't see why there will be any instances of your architecture
that do "work" (in the sense of providing guaranteeable Friendliness
under conditions of radical, intelligence-increasing
self-modification).

And you have not given any sort of rigorous argument that such
instances will exist....

Just some verrrrry hand-wavy, intuitive suggestions, centering on the
notion that (to paraphrase) "because there are a lot of constraints, a
miracle happens"  ;-)

I don't find your intuitive suggestions foolish or anything, just
highly sketchy and unconvincing.

I would say the same about Eliezer's attempt to make a Friendly AI
architecture in his old, now-repudiated-by-him essay Creating a
Friendly AI.  A lot in CFAI seemed plausible to me , and the intuitive
arguments were more fully fleshed out than your in your email
(naturally, because it was an article, not an email) ... but in the
end I felt unconvinced, and Eliezer eventually came to agree with me
(though not on the best approach to fixing the problems)...

 > In a radically self-improving AGI built according to your
 > architecture, the set of constraints would constantly be increasing in
 > number and complexity ... in a pattern based on stimuli from the
 > environment as well as internal stimuli ... and it seems to me you
 > have no way to guarantee based on the smaller **initial** set of
 > constraints, that the eventual larger set of constraints is going to
 > preserve "Friendliness" or any other criterion.

On the contrary, this is a system that grows by adding new ideas whose
motivatonal status must be consistent with ALL of the previous ones, and
the longer the system is allowed to develop, the deeper the new ideas
are constrained by the sum total of what has gone before.


This does not sound realistic.  Within realistic computational
constraints, I don't see how an AI system is going to verify that each
of its new ideas is consistent with all of its previous ideas.

This is a specific issue that has required attention within the
Novamente system.  In Novamente, each new idea is specifically NOT
required to be verified for consistency against all previous ideas
existing in the system, because this would make the process of
knowledge acquisition computationally intractable.  Rather, it is
checked for consistency against those other pieces of knowledge with
which it directly interacts.  If an inconsistency is noticed, in
real-time, during the course of thought, then it is resolved
(sometimes by a biased random decision, if there is not enough
evidence to choose between two inconsistent alternatives; or
sometimes, if the matter is important enough, by explicitly
maintaining two inconsistent perspectives in the system, with separate
labels, and an instruction to pay attention to resolving the
inconsistency as more evidence comes in.)

The kind of distributed system you are describing seems NOT to solve
the computational problem of verifying the consistency of each new
knowledge item with each other knowledge item.


Thus:  if the system has grown up and acquired a huge number of examples
and ideas about what constitutes good behavior according to its internal
system of values, then any new ideas about new values must, because of
the way the system is designed, prove themselves by being compared
against all of the old ones.


If each idea must be compared against all other ideas, then cognition
has order n^2 where n is the number of ideas.  This is not workable.
Some heuristic shortcuts must be used to decrease the number of
comparisons, and such heuristics introduce the possibility of error...

And I said "ridiculously small chance" advisedly:  if 10,000 previous
constraints apply to each new motivational idea, and if 9,900 of them
say 'Hey, this is inconsistent with what I think is a good thing to do',
then it doesn't have a snowball's chance in hell of getting accepted.
THIS is the deep potential well I keep referring to.


The problem, as I said, is posing a set of constraints that is both
loose enough to allow innovative new behaviors, and tight enough to
prevent the wrong behaviors...

I maintain that we can, during early experimental work, understand the
structure of the motivational system well enough to get it up to a
threshold of acceptably friendly behavior, and that beyond that point
its stability will be self-reinforcing, for the above reasons.


Well, I hope so ;-)

I don't rule out the possibility, but I don't feel you've argued for
it convincingly, either...

Overall, the goal would be, not to give it an intrinsic definition of
"friendliness", but actually something closer to a "desire to discover
and empathize with the normative aspirations of the human species".  In
other words, make it want to be part of the community of humankind, the
way that you and I, as compassionate individual humans, want to be part
of that community.  Make it one of us, but without certain irrascible
motivational mechanisms that evolution stuck us with.


This does sound reasonable to me.

BTW, "the normative aspirations of the human species" sounds a fair
bit like Eliezer's Coherent Extrapolated Volition, though ;-)  ... the
question with both your formulation and his is how do you really
define this, either formally or even pragmatically...  Human
aspirations are quite diverse on the individual and also cultural
level; is there really such a thing as our "species-wide aspirations"?

-- Ben

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Re: [singularity] Motivational Systems that are stable

Reply via email to