Hi,
After reading KnowabilityOfFAI and perhaps coming to an Awful Realization, it seems Friendliness is plausible with strict criteria for an optimization target. It also seems an optimization target is necessary, regardless, with more or less strict criteria.
This passage from KnowabilityOfFAI: " You can try to prove a theorem along the lines of: "Providing that the transistors in this computer chip behave the way they're supposed to, the AI that runs on this chip will always try to be Friendly." You're going to prove a statement about the search the AI carries out to find its actions. Metaphorically speaking, you're going to prove that the AI will always, to the best of its knowledge, seek to move little old ladies to the other side of the street and avoid the deaths of nuns. To prove this formally, you would have to precisely define "try to be Friendly": the complete criterion that the AI uses to choose among its actions - including how the AI learns a model of reality from experience, and how the AI identifies the goal-valent aspects of the reality it learns to model." " Once you've formulated this precise definition, you still can't prove an absolute certainty that the AI will be Friendly in the real world," is very similar to points I made in (the slightly more technical document) http://www.goertzel.org/papers/LimitationsOnFriendliness.pdf where I distinguish between action-based and outcome-based Friendliness. What Eliezer is pointing out in the above passage is that action-based Friendliness does not guarantee outcome-based Friendliness. And, he is positing that it might be possible (one can try) to prove that action-based Friendliness holds of some AI system. Eliezer also claims, in this document, that " ...mere mathematical proof would not give us real-world certainty. But if you can't even prove mathematically that the AI is Friendly, it's practically guaranteed to fail. Mathematical proof does not give us real-world certainty. But if you proved mathematically that the AI was Friendly, then it would be possible to win. You would not automatically fail. " However, this latter assertion is merely asserted, not demonstrated nor even argued for. I see no reason to assume that a superintelligent AI, created with a rational design and a hierarchical goal system whose top-level goal is oriented toward compassion and benevolence, would be "guaranteed to fail" (in the sense of being un-Friendly to humans) unless humans can prove mathematically that it wil remain Friendly. For one thing, the lack of a mathematical proof that the system will remain Friendly may tell us more about our limited ability about mathematical proof than about any intrinsic property of the system. Possibly such a proof exists, but humans aren't sophisticated enough to find it yet. Or, perhaps the system is of a sort that will remain Friendly in almost but not all real-world situations. In this case, a mathematical proof would be difficult to come by (because the proof would have to describe precisely what these exceptional real-world situations are), yet the system's behavior would still be "almost surely Friendly." And these are not the only counterarguments to Eliezer's basically unsupported assertion that "the lack of a mathematical proof guarantees failure." In my view, proving outcome-based Friendliness is what would be really interesting. Proving action-based Friendliness is not all that fascinating to me, because I don't all that fully trust my own (or any other human's) judgment of what actions are going to be useful for an AI to take in unknown future situations. I am pretty sure that proving outcome-based Friendliness of interesting self-modifying superhuman AI's will not be possible, because of computation theoretic considerations like the one Shane Legg was alluding to, and I alluded to in my above-referenced essay. I do not know whether proving action-baed Friendliness of interesting self-modifying superhuman AI's will be possible or not ... but I am almost (not quite, at the moment) tempted to say that it would be UNETHICAL to create a superhuman AI and restrain it, in future situations going way beyond anything any human now knows about, to behave in accordance with the details of some human-created behavioral criterion. I agree that AGI's should have hierarchical goal systems and well-crafted top-level goals that include positive moral values. However, I think that the introduction of provability into the discussion is largely a red herring. Current mathematics and science do not suffice to prove rigorous, nontrivial theorems about the behavior of complex systems in complex environments -- which is what theorems about action-based, let alone outcome-based, FAI would be. And it is quite likely that if we did have a revolution in complex computational systems theory, what it would tell us is why Friendliness canNOT be provably guaranteed -- rather than providing us with provable guarantees.... It is almost surely not true (and very certainly not demostrated) that (as Eliezer opines) without provable Friendliness, disaster is almost guaranteed. I do not think AGI development should be halted to wait for a revolution in the formal mathematical theory of complex computational systems, which probably will not solve the problem anyway. -- Ben G ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
