> I think we're using terminology differently here. What you call "character 
> normalization" is what I'm calling canonicalization. NFC is described in UAX 
> #15 as "canonical decomposition followed by canonical composition" and I'm 
> just using the word "canonicalization" because it's shorter. If Swift 
> represents each identifier in an NFC-transformed form (what I call 
> canonicalized), then I understand the identifier to be canonicalized. What is 
> the distinction you're drawing here?

There is a small difference between normalisation and canonicalisation, but 
it's mostly splitting hairs. They both ensure something is represented 
properly, but canonicalisation implies establishing a single base 
representation for something. Web addresses are a good example. Both 
http://www.apple.com and http://apple.com are valid normalised addresses, but 
only the former is the canonical address for the Apple website.

> Just re-read UAX #31. I see two different issues here too--do these match up 
> with what you're saying above?
> 
> * Disallowing certain glyphs in identifiers. To do so, we can implement the 
> recommendation to disallow all glyphs in UAX #31 Table 4, except ZWJ and ZWNJ 
> in the specific scenarios outlined in section 2.3.
> 
> * Internally, when comparing two identifiers A and B, compare NFC(A) and 
> NFC(B) without modifying or otherwise restricting the actual user-facing code 
> to contain only NFC-normalized strings. This would be the approach 
> recommended in section 1.3.

Yes, that's correct. The proposal would be to normalise the encoding via NFC 
and then canonicalise the identifiers by ignoring invisible characters except 
in the scenarios described in UAX #31.

Sincerely,
João Pinheiro
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to