> On Jun 21, 2016, at 8:47 AM, John McCall via swift-evolution > <[email protected]> wrote: > >> On Jun 20, 2016, at 7:07 PM, Xiaodi Wu <[email protected]> wrote: >> On Mon, Jun 20, 2016 at 8:58 PM, John McCall via swift-evolution >> <[email protected]> wrote: >>> On Jun 20, 2016, at 5:22 PM, Jordan Rose via swift-evolution >>> <[email protected]> wrote: >>> IIRC, some languages require zero-width joiners (though not zero-width >>> spaces, which are distinct) to properly encode some of their characters. >>> I'd be very leery of having Swift land on a model where identifiers can be >>> used with some languages and not others; that smacks of ethnocentrism. >> >> None of those languages require zero-width characters between two Latin >> letters, or between a Latin letter and an Arabic numeral, or at the end of a >> word. Since standard / system APIs will (barring some radical shift) use >> those code points exclusively, it's justifiable to give them some special >> attention. >> >> Although the practical implementation may need to be more limited in scope, >> the general principle doesn't need to privilege Latin letters and Arabic >> numerals. If, in any context, the presence or absence of a zero-width glyph >> cannot possibly be distinguished by a human reading the text, then the >> compiler should also be indifferent to its presence or absence (or, >> alternatively, its presence should be a compile-time error). > > Sure, that's obvious. Jordan was observing that the simplest way to enforce > that, banning such characters from identifiers completely, would still > interfere with some languages, and I was pointing out that just doing enough > to protect English would get most of the practical value because it would > protect every use of the system and standard library. A program would then > only become attackable in this specific way for its own identifiers using > non-Latin characters. > > All that said, I'm not convinced that this is worthwhile; the > identifier-similarity problem in Unicode is much broader than just invisible > characters. In fact, Swift still doesn't canonicalize identifiers, so > canonically equivalent compositions of the same glyph will actually produce > different names. So unless we're going to fix that and then ban all sorts of > things that are known to generally be represented with a confusable glyph in > a typical fixed-width font (like the mathematical alphabets), this is just a > problem that will always exist in some form.
Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages: http://unicode.org/reports/tr31/ Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax. -Joe _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
