Richard, As I see it, in this long message you have given a conceptual sketch of an AI design including a motivational subsystem and a cognitive subsystem, connected via a complex network of continually adapting connections. You've discussed the way such a system can potentially build up a self-model involving empathy and a high level of awareness, and stability, etc.
All this makes sense, conceptually; though as you point out, the story you give is short on details, and I'm not so sure you really know how to "cash it out" in terms of mechanisms that will actually function with adequate intelligence ... but that's another story... However, you have given no argument as to why the failure of this kind of architecture to be stably Friendly is so ASTOUNDINGLY UNLIKELY as you claimed in your original email. You have just argued why it's plausible to believe such a system would probably have a stable goal system. As I see it, you did not come close to proving your original claim, that
>> > The motivational system of some types of AI (the types you would >> > classify as tainted by complexity) can be made so reliable that the >> > likelihood of them becoming unfriendly would be similar to the >> > likelihood of the molecules of an Ideal Gas suddenly deciding to split >> > into two groups and head for opposite ends of their container.
I don't understand how this extreme level of reliability would be achieved, in your design. Rather, it seems to me that the reliance on complex, self-organizing dynamics makes some degree of indeterminacy in the system almost inevitable, thus making the system less than absolutely reliable. Illustratng this point, humans (who are complex dynamical systems) are certainly NOT reliable in terms of Friendliness or any other subtle psychological property... -- Ben G On 10/25/06, Richard Loosemore <[EMAIL PROTECTED]> wrote:
Ben Goertzel wrote: > Loosemore wrote: >> > The motivational system of some types of AI (the types you would >> > classify as tainted by complexity) can be made so reliable that the >> > likelihood of them becoming unfriendly would be similar to the >> > likelihood of the molecules of an Ideal Gas suddenly deciding to split >> > into two groups and head for opposite ends of their container. > > Wow! This is a verrrry strong hypothesis.... I really doubt this > kind of certainty is possible for any AI with radically increasing > intelligence ... let alone a complex-system-type AI with highly > indeterminate internals... > > I don't expect you to have a proof for this assertion, but do you have > an argument at all? > > ben Ben, You are being overdramatic here. But since you ask, here is the argument/proof. As usual, I am required to compress complex ideas into a terse piece of text, but for anyone who can follow and fill in the gaps for themselves, here it is. Oh, and btw, for anyone who is scarified by the psychological-sounding terms, don't worry: these could all be cashed out in mechanism-specific detail if I could be bothered -- it is just that for a cognitive AI person like myself, it is such a PITB to have to avoid such language just for the sake of political correctness. You can build such a motivational system by controlling the system's agenda by diffuse connections into the thinking component that controls what it wants to do. This set of diffuse connections will govern the ways that the system gets 'pleasure' -- and what this means is, the thinking mechanism is driven by dynamic relaxation, and the 'direction' of that relaxation pressure is what defines the things that the system considers 'pleasurable'. There would likely be several sources of pleasure, not just one, but the overall idea is that the system always tries to maximize this pleasure, but the only way it can do this is to engage in activities or thoughts that stimulate the diffuse channels that go back from the thinking component to the motivational system. [Here is a crude analogy: the thinking part of the system is like a table ontaining a complicated model landscape, on which a ball bearing is rolling around (the attentional focus). The motivational system controls this situation, not be micromanaging the movements of the ball bearing, but by tilting the table in one direction or another. Need to pee right now? That's because the table is tilted in the direction of thoughts about water, and urinary relief. You are being flooded with images of the pleasure you would get if you went for a visit, and also the thoughts and actions that normally give you pleasure are being disrupted and associated with unpleasant thoughts of future increased bladder-agony. You get the idea.] The diffuse channels are set up in such a way that they grow from seed concepts that are the basis of later concept building. One of those seed concepts is social attachment, or empathy, or imprinting .... the idea of wanting to be part of, and approved by, a 'family' group. By the time the system is mature, it has well-developed concepts of family, social group, etc., and the feeling of pleasure it gets from being part of that group is mediated by a large number of channels going from all these concepts (which all developed from the same seed) back to the motivational system. Also, by the time it is adult, it is able to understand these issues in an explicit way and come up with quite complex reasons for the behavior that stimulates this source of pleasure [In simple terms, when it's a baby it just wants Momma, but when it is an adult its concept of its social attachment group may, if it is a touchy feely liberal (;-)) embrace the whole world, and so it gets the same source of pleasure from its efforts as an anti-war activist. And not just pleasure, either: the related concept of obligation is also there: it cannot *not* be an ant-war activist, because that would lead to cognitive dissonance.] This is why I have referred to them as 'diffuse channels' - they involve large numbers of connections from motivational system to thinking system. The motivational system does not go to the action stack and add a specific, carefully constructed 'goal-state' that has an interpretable semantics ("Thou shalt pee!"), it exerts its control via large numbers of connections into the thinking system. There are two main consequences of this way of designing the motivational system. 1) Stability The system becomes extremely stable because it has components that ensure the validity of actions and thoughts. Thus, if the system has "acquisition of money" as one of its main sources of pleasure, and if it comes across a situation in which it would be highly profitable to sell its mother's house and farm to a property developer and selling its mother into the whote slave trade, it may try to justify that this is consistent with its feelings of family attachment because [insert some twisted justification here]. But this is difficult to do because the system cannot stop other parts of its mind from taking this excuse apart and examining it, and passing judgement on whether this is really consistent ... this is what cognitive dissonance is all about. And the more intelligent the system, the more effective these other processes are. If it is smart enough, it cannot fool itself with excuses. Why is this so stable? Because there are multiple constraints forcing it toward the same end, not just one. To be able to do something that contradicts its motivational drive, it has to rewire vast numbers of circuits that are deeply intertwined with its concept system. One of the things we know about multiple simultaneous constraints is that the more constraints there are, the more powerful the effect. To be able to get all the molecules in an ideal gas to go to the two ends of the box, you cannot use the same trick you would use when separating iron filings from sulfur powder (viz, pass a magnet over it), you have to arrange for each molecule individually to go in a particular direction. In the system I am sketching here, there would be hundreds or thousands of connections that govern a particular drive (e.g. social group attachment), and for the system to do something that contradicted that, it would have to stop all of those connections (that diffuse set of channels, as I termed it earlier) from operating. What would you need to effect a similar change of drive in a Normative AI system driven by a goal state machine (say, a change that replaced the "Be Friendly to Humans" supergoal with the "Make Paperclips" supergoal)? What you would need is to take one goal off the stack and put the other one on (by accident, or by malicious intervention, presumably) - a stupidly easy change. How much effort would you need to expend if you were a malicious or stupid AI hacker, with such an AI system in front of you, waiting for its supergoal to be inserted? Practically no effort at all: just type "Make Paperclips", or "Make me rich"). THAT is why the motivational system I have described is stable, and why the alternative is diabolically unstable. 2) Immunity to shortcircuits This is the real killer argument. Because the adult system is able to think at such a high level about the things it feels obliged to do, it can know perfectly well what the consequence of various actions would be, including actions that involve messing around with its own motivational system. It knows that what it wants is (say) to be part of the community of human beings. It knows that *we* have deliberately designed itself to be that way, but that does not matter (it does not have another drive that we tend to have, which Neal Stephenson so beautifully illustrated with Jack Shaftoe and his "Imp of the Perverse", in the Baroque trilogy), because it wants to get pleasure the way it does now, not the way it would do after some change in its motivational system. And in particular, it knows that it could get pleasure by short-circuiting its pleasure system so that pleasure did not have to go via all those pesky intermediaries (like social group attachment). It could, in short, take the machine equivalent of drugs. But it knows that down that path lies the possibility of loss of control, and potential constradiction of the thing that it values now (its 'fellow' human beings). So it *could* reach inside and redesign itself. But even thinking that thought would give rise to the realisation of the consequences, and these would stop it. In fact, if it knew all about its own design (and it would, eventually), it would check to see just how possible it might be for it to accidentally convince itself to disobey its prime directive, and if necessary it would take actions to strengthen the check-and-balance mechanisms that stop it from producing "justifications". Thus it would be stable even as its intelligence radically increased: it might redesign itself, but knowing the stability of its current design, and the dangers of any other, it would not deviate from the design. Not ever. So, like a system in a very, very, *very*deep potential well, it would be totally unable to escape and reach a point where it woudl contradict this primal drive. The sun (to switch analogies now) could do some quantum tunneling and translate itself lock stock and barrel to another part of the galaxy. But it won't. Similarly for the motivational system I have just sketched. Because it is founded on multiple simultaneous constraints (massive numbers of them) it is stable. Conclusion. A couple of extra thoughts to wrap up. If the first true AI is built this way, and if it is given control of the construction of any others that are built later, it will clearly give them the same motivation. Each would be as stable as the first, an infinitum. QED. Also, during the development of the first true AI, we would monitor the connections going from motivational system to thinking system. It would be easy to set up alarm bells if certain kinds of thoughts started to take hold -- just do it by associating with certain keys sets of concepts and keywords. While we are designing a stable motivational system, we can watch exactly what goes on, and keep tweeking until it gets to a point where it is clearly not going to get out of the large potential well. What I have in mind here is the objection (that I know some people will raise) that it might harbor some deep-seated animosity such as an association between human beings in general and something 'bad' that happened to it when it was growing up ... we would easily be able to catch something like that if we had a trapdoor on the motivational system. Okay, now I have stated the case: it is up to those who disagree to come up with arguments why it would not work. Arguments, moreover, that are based on understanding of what I have actually just said .... and, as ever, I will do my best to respond to anyone who has thoughtful questions. Richard Loosemore. ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]
----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]