Right, but then when your program encounters an evil alien who says
"I'll destroy Earth unless you say 'Cheese!' ", your program winds up
taking an action that isn't very benevolent after all...
This gets at the distinction between outcome-based and action-based
Friendliness, which I alluded to earlier.
Outcome-based Friendliness is really hard (maybe impossible) to
guarantee due to the complexity of the world.
Action-based Friendliness ("trying to be Friendly") is subtler. It
can't be defined as simply as you suggest, because it has to do NOT
just with what actions the AI takes (considering it as if it were in a
vacuum), but with what actions the AI takes IN CONTEXT. I.e., it has
to do with the AI taking the right actions in response to whatever the
environment throws at it.
The problem with action-based Friendliness, of course, is that as the
world unfolds and gives us more information, we may learn that the
connections between actions and outcomes we now assume are not
actually correct ones...
-- Ben
On 9/19/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
I define an AGI as "friendly" if it outputs "hello world", and unfriendly otherwise.
Then I can write a program that outputs "hello world" (left as an exercise for the reader), and
prove that this program is friendly by running it and checking the output.
It is straightforward to extend this to any r.e. definition of friendliness.
Given any output known to be friendly, I can write a program that can be proven
to be friendly.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]