Hi,

After reading KnowabilityOfFAI and perhaps coming to an Awful
Realization, it seems Friendliness is plausible with strict criteria
for an optimization target. It also seems an optimization target is
necessary, regardless, with more or less strict criteria.

This passage from KnowabilityOfFAI:

"
You can try to prove a theorem along the lines of: "Providing that the
transistors in this computer chip behave the way they're supposed to,
the AI that runs on this chip will always try to be Friendly." You're
going to prove a statement about the search the AI carries out to find
its actions. Metaphorically speaking, you're going to prove that the
AI will always, to the best of its knowledge, seek to move little old
ladies to the other side of the street and avoid the deaths of nuns.
To prove this formally, you would have to precisely define "try to be
Friendly": the complete criterion that the AI uses to choose among its
actions - including how the AI learns a model of reality from
experience, and how the AI identifies the goal-valent aspects of the
reality it learns to model."

"
Once you've formulated this precise definition, you still can't prove
an absolute certainty that the AI will be Friendly in the real world,"

is very similar to points I made in (the slightly more technical document)

http://www.goertzel.org/papers/LimitationsOnFriendliness.pdf

where I distinguish between action-based and outcome-based
Friendliness.  What Eliezer is pointing out in the above passage is
that action-based Friendliness does not guarantee outcome-based
Friendliness.   And, he is positing that it might be possible (one can
try) to prove that action-based Friendliness holds of some AI system.

Eliezer also claims, in this document, that

"
...mere mathematical proof would not give us real-world certainty.

But if you can't even prove mathematically that the AI is Friendly,
it's practically guaranteed to fail. Mathematical proof does not give
us real-world certainty. But if you proved mathematically that the AI
was Friendly, then it would be possible to win. You would not
automatically fail.
"

However, this latter assertion is merely asserted, not demonstrated
nor even argued for.

I see no reason to assume that a superintelligent AI, created with a
rational design and a hierarchical goal system whose top-level goal is
oriented toward compassion and benevolence, would be "guaranteed to
fail" (in the sense of being un-Friendly to humans) unless humans can
prove mathematically that it wil remain Friendly.

For one thing, the lack of a mathematical proof that the system will
remain Friendly may tell us more about our limited ability about
mathematical proof than about any intrinsic property of the system.
Possibly such a proof exists, but humans aren't sophisticated enough
to find it yet.

Or, perhaps the system is of a sort that will remain Friendly in
almost but not all real-world situations.  In this case, a
mathematical proof would be difficult to come by (because the proof
would have to describe precisely what these exceptional real-world
situations are), yet the system's behavior would still be "almost
surely Friendly."

And these are not the only counterarguments to Eliezer's basically
unsupported assertion that "the lack of a mathematical proof
guarantees failure."

In my view, proving outcome-based Friendliness is what would be really
interesting.  Proving action-based Friendliness is not all that
fascinating to me, because I don't all that fully trust my own (or any
other human's) judgment of what actions are going to be useful for an
AI to take in unknown future situations.

I am pretty sure that proving outcome-based Friendliness of
interesting self-modifying superhuman AI's will not be possible,
because of computation theoretic considerations like the one Shane
Legg was alluding to, and I alluded to in my above-referenced essay.

I do not know whether proving action-baed Friendliness of interesting
self-modifying superhuman AI's will be possible or not ... but I am
almost (not quite, at the moment) tempted to say that it would be
UNETHICAL to create a superhuman AI and restrain it, in future
situations going way beyond anything any human now knows about, to
behave in accordance with the details of some human-created behavioral
criterion.

I agree that AGI's should have hierarchical goal systems and
well-crafted top-level goals that include positive moral values.

However, I think that the introduction of provability into the
discussion is largely a red herring.  Current mathematics and science
do not suffice to prove rigorous, nontrivial theorems about the
behavior of complex systems in complex environments -- which is what
theorems about action-based, let alone outcome-based, FAI would be.
And it is quite likely that if we did have a revolution in complex
computational systems theory, what it would tell us is why
Friendliness canNOT be provably guaranteed -- rather than providing us
with provable guarantees....

It is almost surely not true (and very certainly not demostrated) that
(as Eliezer opines) without provable Friendliness, disaster is almost
guaranteed.  I do not think AGI development should be halted to wait
for a revolution in the formal mathematical theory of complex
computational systems, which probably will not solve the problem
anyway.

-- Ben G

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to