Re: Compatibility Casefold Equivalence

Carl via Unicode Thu, 22 Nov 2018 12:01:12 -0800

(It looks like my HTML email got scrubbed, sorry for the double post)

Hi,



In Chapter 3 Section 13, the Unicode spec defines D146:


"A string X is a compatibility caseless match for a string Y if and only if: 
NFKD(toCasefold(NFKD(toCasefold(NFD(X))))) = 
NFKD(toCasefold(NFKD(toCasefold(NFD(Y)))))"


I am trying to understand the "if and only if" part of this.   Specifically, 
why is the outermost NFKD necessary?  Could it also be a NFKC normalization?   
Is wrapping the outer NFKD in a NFC or NFKC on both sides of the equation okay?


My use case is that I am trying to store user-provided tags in a database.  I 
would like the tags to be deduplicated based on compatibility and caseless 
equivalence, which is how I ended up looking at D146.  However, because 
decomposition can result in much larger strings, I would prefer to keep  the 
stored version in NFC or NFKC (I *think* this doesn't matter after doing the 
casefolding as described above).


Thanks,


Carl

Re: Compatibility Casefold Equivalence

Reply via email to