* Elliotte Rusty Harold | | Let's say I register microsoft.com, only the fifth letter isn't a | lower-case Latin o. It's actually a lower case Greek omicron.
I'll grant you that this is possible, perhaps even likely, and that it may cause problems, but I'm far from convinced that this in any way supports the "there are security problems in Unicode" thesis. There are many characters which look alike, and yet are different, which can cause problems of this kind. There are for example already viruses which exploit the visual similarity between 1 and l in the Windows system font to keep themselves from being discovered in file listings. So if this really is considered a problem it would seem to me that you would need to deal with the problem of [EMAIL PROTECTED], [EMAIL PROTECTED], and [EMAIL PROTECTED] looking very similar to [EMAIL PROTECTED] in lots of fonts. To exploit this, all you need to know is what email client someone uses, and usually every email they write will have that information in its headers. It seems to me that this problem really needs some other fix than the merging of all similar-looking characters in all character sets. I just can't see that working. Similarly, the "security problems" caused by using Unicode encoding tricks to hide or mangle text in, say, contracts, is no different from using HTML or CSS (or whatever) tricks to achieve the same effect, and yet nobody is talking about security problems with HTML or CSS. See [1] for one way of dealing with it that is now being worked on. So while I accept that there is a problem it does not seem to me that Unicode is the problem. To me the problem seems to be the complexity of the relationship between the bytes sent to the user and what the user actually sees and reacts to. That complexity is not going to disappear, and aspects of the same "problem" exist with just about any information representation, so clearly the solution must be something other than changing all of these syntaxes/formats/encodings. In the specific case you cite, for example, a better solution might be for the user's email client to keep track of all the user's contacts and for it to indicate in some clearly visible way whether the current email comes from one of them or not. Whether it uses string matching of email addresses or digital signatures to do that doesn't really matter; it solves the problem in your example either way. [1] <URL: http://www.w3.org/TR/xmldsig-core/#sec-Seen > -- Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net > ISO SC34/WG3, OASIS GeoLang TC <URL: http://www.garshol.priv.no >