Has anyone figured out whether character sequences that are non-canonical (de)compositions but could be recomposed to the same result are the same identifier or not?
That is: are identifiers merely sequences of characters or intended to be comparable as “Unicode strings” (under some sort of compatibility rule)? On Jun 5, 2014, at 11:27 AM, Martin v. Löwis <mar...@v.loewis.de> wrote: > Am 04.06.14 11:28, schrieb Andre Schappo: >> The restrictions seem a little like IDNA2008. Anyone have links to >> info giving a detailed explanation/tabulation of allowed and non >> allowed Unicode chars for Swift Variable and Constant names? > > The language reference is at > > https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html > > For reference, the definition of identifier-character is (read each > line as an alternative) > > identifier-character → Digit 0 through 9 > identifier-character → U+0300–U+036F, U+1DC0–U+1DFF, U+20D0–U+20FF, or > U+FE20–U+FE2F > identifier-character → identifier-head > > where identifier-head is > > identifier-head → Upper- or lowercase letter A through Z > identifier-head → U+00A8, U+00AA, U+00AD, U+00AF, U+00B2–U+00B5, or > U+00B7–U+00BA > identifier-head → U+00BC–U+00BE, U+00C0–U+00D6, U+00D8–U+00F6, or > U+00F8–U+00FF > identifier-head → U+0100–U+02FF, U+0370–U+167F, U+1681–U+180D, or > U+180F–U+1DBF > identifier-head → U+1E00–U+1FFF > identifier-head → U+200B–U+200D, U+202A–U+202E, U+203F–U+2040, U+2054, > or U+2060–U+206F > identifier-head → U+2070–U+20CF, U+2100–U+218F, U+2460–U+24FF, or > U+2776–U+2793 > identifier-head → U+2C00–U+2DFF or U+2E80–U+2FFF > identifier-head → U+3004–U+3007, U+3021–U+302F, U+3031–U+303F, or > U+3040–U+D7FF > identifier-head → U+F900–U+FD3D, U+FD40–U+FDCF, U+FDF0–U+FE1F, or > U+FE30–U+FE44 > identifier-head → U+FE47–U+FFFD > identifier-head → U+10000–U+1FFFD, U+20000–U+2FFFD, U+30000–U+3FFFD, or > U+40000–U+4FFFD > identifier-head → U+50000–U+5FFFD, U+60000–U+6FFFD, U+70000–U+7FFFD, or > U+80000–U+8FFFD > identifier-head → U+90000–U+9FFFD, U+A0000–U+AFFFD, U+B0000–U+BFFFD, or > U+C0000–U+CFFFD > identifier-head → U+D0000–U+DFFFD or U+E0000–U+EFFFD > > As the construction principle for this list, they say > > "Identifiers begin with an upper case or lower case letter A through Z, > an underscore (_), a noncombining alphanumeric Unicode character in the > Basic Multilingual Plane, or a character outside the Basic Multilingual > Plan that isn’t in a Private Use Area. After the first character, digits > and combining Unicode characters are also allowed." > > Regards, > Martin > _______________________________________________ > Unicode mailing list > Unicode@unicode.org > http://unicode.org/mailman/listinfo/unicode _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode