Elliotte Rusty Harold wrote: > For past protocols like HTTP and URLs, we can plead ignorance and > lack of imagination. We never knew how bad things were going to get. > Now we do. We no longer have any excuses for knowingly designing > systems that are open to spoofing, denial of service, or outright > system cracking. Mistakes will of course continue to be made, but we > have to try to make as few as possible and fix the problems where we > can as soon as we can. There are legacy problems in HTTP, DNS, URLs, > and many other systems; but when we're designing something truly new > like internationalized domain names it only makes sense to avoid > these known problems.
And I'm with you all the way to this point. Where we part company, I think is at the implied "and so..." If the basic requirements are that we find a way (for IDN) to present meaningful strings to end users (note, not any natural language phrase, but just a suitably contained, meaningful subset thereof that users can live with) and then find a foolproof way to map that to IP numbers, *and* that those meaningful strings be truly internationalized and not just the current restricted subset of ASCII, then we have a problem. Either you have to more or less completely ignore the structure and integrity of writing systems, and try to constrain down the problem to a totally etic, psychological perception-based notion of no visual confusion allowed in visible symbols to be represented in strings, anywhere, anytime. Or you have to admit that internationalizing the strings even just the teensiest bit (e.g. allowing Cyrillic in the door along with ASCII, or for that matter just allowing in accented Latin letters along with ASCII) is going to increase the confusability level in visible symbols used in strings. The reductio ad absurdum of the first position is that allowing even a single additional character in domain names, no matter how distinct or innocuous, incrementally increases the opportunity for confusion, spoofing, or other monkey business over the current situation. So if we "no longer have any excuses" to do anything that might knowingly increase the opportunity for security holes, then logically, we should just shut down the whole IDN effort and proclaim to the world, "Let them eat ASCII!" Heck, it doesn't even have to be close to visual confusability to cause a problem. What if IDN allowed just two Han characters in, and nothing else, and those Han characters were for nihon (Japanese for Japan). Then somebody could register Microsoft<nihon>.com and never mind the naive user -- the knowledgable, biliterate English/Japanese user could be spoofed into thinking that was Microsoft's Japan division, instead of Trojans 'R Us. I think that rather than coming to the Unicode list to proclaim "Unicode is a security risk! The sky is falling!" the better way to conceive this is that globalization of the IT infrastructure of the world is a difficult business that presents many new possibilities for security risks if internationalization of existing protocols and the handling of textual data from around the world is not done carefully. If the customers of the Internet are demanding that it be internationalized better that it currently is (and I believe they are), and if part of that internationalization is responding to demands that Japan be able to have Japanese domain names, China have Chinese domain names, etc., as I believe it is, then we just have to come to grips with the complexity of text handling that that implies. And in turn that means that just as years ago system programmers learned to their chagrin that their systems broke because they had been doing casemapping with c -= 0x20 assignments, so Internet protocol developers are going to have to learn that their security is broken if it depends on the structure and constraints of ASCII, or on the use of small glyph sets where all the glyphs are visually distinct from each other. --Ken