Re: [swift-evolution] A path forward on rationalizing unicode identifiers and operators

Kenny Leung via swift-evolution Sun, 01 Oct 2017 22:30:35 -0700

I guess theoretically you could have two variables that look alike, but are 
actually different values, allowing you to insert some obfuscated malicious 
code somehow.


-Kenny


> On Oct 1, 2017, at 10:01 PM, Chris Lattner <clatt...@nondot.org> wrote:
> 
>> 
>> On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution 
>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> 
>> Hi All.
>> 
>> I’d like to help as well. I have fun with operators.
>> 
>> There is also the issue of code security with invisible unicode characters 
>> and characters that look exactly alike.
> 
> Unless there is a compelling reason to add them, I think we should ban 
> invisible characters.  What is the harm of characters that look alike?
> 
> -Chris
> 
> 
>> (They should make a Coding font that ensures all characters look different.) 
>> Was that ever resolved? Googling, I found this:
>> 
>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html
>>  
>> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html>
>> 
>> Which seems to have been left at this:
>> 
>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html
>>  
>> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html>
>> 
>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229
>>  
>> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229>
>> 
>> Should we throw all of this into the same pot, and make any characters that 
>> aren’t on the approved list illegal?
>> 
>> -Kenny
>> 
>> 
>>> On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution 
>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>> 
>>> I’m happy to participate in the reshaping of the proposal. It would be nice 
>>> to gather a group of people again to help drive it forward.
>>> 
>>> That said, it’s unclear to me that superscript T is clearly an operator, 
>>> any more than would be superscript H (Hermitian), superscript 2, 
>>> superscript 3, etc. But at any rate, this would be discussion for the 
>>> future workgroup.
>>> 
>>> I would strongly advocate that the things-that-are-identifiers group be 
>>> strongly tied to the existing, complete Unicode standard for such, and that 
>>> the critical parts of the previous document about normalization be retained.
>>> 
>>> On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution 
>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>> 
>>> The core team recently met to discuss PR609 - Refining identifier and 
>>> operator symbology:
>>> https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md
>>>  
>>> <https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md>
>>> 
>>> The proposal correctly observes that the partitioning of unicode codepoints 
>>> into identifiers and operators is a mess in some cases.  It really is an 
>>> outright bug for 🙂 to be an identifier, but ☹️ to be an operator.  That 
>>> said, the proposal itself is complicated and is defined in terms of a bunch 
>>> of unicode classes that may evolve in the “wrong way for Swift” in the 
>>> future.
>>> 
>>> The core team would really like to get this sorted out for Swift 5, and 
>>> sooner is better than later :-).  Because it seems that this is a really 
>>> hard problem and that perfection is becoming the enemy of good 
>>> <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team 
>>> requests the creation of a new proposal with a different approach.  The 
>>> general observation is that there are three kinds of characters: things 
>>> that are obviously identifiers, things that are obviously math operators, 
>>> and things that are non-obvious.  Things that are non-obvious can be made 
>>> into invalid code points, and legislated later in follow-up proposals 
>>> if/when someone cares to argue for them.
>>> 
>>> 
>>> To make progress on this, we suggest a few separable steps:
>>> 
>>> First, please split out the changes to the ASCII characters (e.g. . and \ 
>>> operator parsing rules) to its own (small) proposal, since it is unrelated 
>>> to the unicode changes, and can make progress on that proposal 
>>> independently.
>>> 
>>> 
>>> Second, someone should take a look at the concrete set of unicode 
>>> identifiers that are accepted by Swift 4 and write a new proposal that 
>>> splits them into the three groups: those that are clearly identifiers 
>>> (which become identifiers), those that are clearly operators (which become 
>>> operators), and those that are unclear or don’t matter (these become 
>>> invalid code points).
>>> 
>>> I suggest that the criteria be based on utility for Swift code, not on the 
>>> underlying unicode classification.  For example, the discussion thread for 
>>> PR609 mentions that the T character in “  xᵀ  ” is defined in unicode as a 
>>> latin “letter”.  Despite that, its use is Swift would clearly be as a 
>>> postfix operator, so we should classify it as an operator.
>>> 
>>> Other suggestions:
>>>  - Math symbols are operators excepting those primarily used as identifiers 
>>> like “alpha”.  If there are any characters that are used for both, this 
>>> proposal should make them invalid.
>>>  - While there may be useful ranges for some identifiers (e.g. to handle 
>>> european accented characters), the Emoji range should probably have each 
>>> codepoint independently judged, and currently unassigned codepoints should 
>>> not get a meaning defined for them.
>>>  - Unicode “faces”, “people”, “animals” etc are all identifiers.
>>>  - In order to reduce the scope of the proposal, it is a safe default to 
>>> exclude characters that are unlikely to be used by Swift code today, 
>>> including Braille, weird currency symbols, or any set of characters that 
>>> are so broken and useless in Swift 4 that it isn’t worth worrying about.
>>>  - The proposal is likely to turn a large number of code points into 
>>> rejected characters.  In the discussions, some people will be tempted to 
>>> argue endlessly about individual rejections.  To control that, we can 
>>> require that people point out an example where the character is already in 
>>> use, or where it has a clear application to a domain that is known today: 
>>> the discussion needs to be grounded and practical, not theoretical.
>>> 
>>> 
>>> Third, if there is interest sometime in the future, we can have subsequent 
>>> proposals that expand the range of accepted code points, motivated by the 
>>> specific application domain that cares about them.  These proposals will 
>>> not be source breaking, so they can happen at any time.
>>> 
>>> 
>>> Is anyone interested in helping to push this effort forward?
>>> 
>>> -Chris
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] A path forward on rationalizing unicode identifiers and operators

Reply via email to