> On Jun 21, 2016, at 8:47 AM, John McCall via swift-evolution 
> <[email protected]> wrote:
> 
>> On Jun 20, 2016, at 7:07 PM, Xiaodi Wu <[email protected]> wrote:
>> On Mon, Jun 20, 2016 at 8:58 PM, John McCall via swift-evolution 
>> <[email protected]> wrote:
>>> On Jun 20, 2016, at 5:22 PM, Jordan Rose via swift-evolution 
>>> <[email protected]> wrote:
>>> IIRC, some languages require zero-width joiners (though not zero-width 
>>> spaces, which are distinct) to properly encode some of their characters. 
>>> I'd be very leery of having Swift land on a model where identifiers can be 
>>> used with some languages and not others; that smacks of ethnocentrism.
>> 
>> None of those languages require zero-width characters between two Latin 
>> letters, or between a Latin letter and an Arabic numeral, or at the end of a 
>> word.  Since standard / system APIs will (barring some radical shift) use 
>> those code points exclusively, it's justifiable to give them some special 
>> attention.
>> 
>> Although the practical implementation may need to be more limited in scope, 
>> the general principle doesn't need to privilege Latin letters and Arabic 
>> numerals. If, in any context, the presence or absence of a zero-width glyph 
>> cannot possibly be distinguished by a human reading the text, then the 
>> compiler should also be indifferent to its presence or absence (or, 
>> alternatively, its presence should be a compile-time error).
> 
> Sure, that's obvious.  Jordan was observing that the simplest way to enforce 
> that, banning such characters from identifiers completely, would still 
> interfere with some languages, and I was pointing out that just doing enough 
> to protect English would get most of the practical value because it would 
> protect every use of the system and standard library.  A program would then 
> only become attackable in this specific way for its own identifiers using 
> non-Latin characters.
> 
> All that said, I'm not convinced that this is worthwhile; the 
> identifier-similarity problem in Unicode is much broader than just invisible 
> characters.  In fact, Swift still doesn't canonicalize identifiers, so 
> canonically equivalent compositions of the same glyph will actually produce 
> different names.  So unless we're going to fix that and then ban all sorts of 
> things that are known to generally be represented with a confusable glyph in 
> a typical fixed-width font (like the mathematical alphabets), this is just a 
> problem that will always exist in some form.

Any discussion about this ought to start from UAX #31, the Unicode consortium's 
recommendations on identifiers in programming languages:

http://unicode.org/reports/tr31/

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to 
be allowed. The document also describes a stability policy for handling new 
Unicode versions, other confusability issues, and many of the other problems 
with adopting Unicode in a programming language's syntax.

-Joe
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to