Re: Folding algorithm and canonical equivalence

Asmus Freytag Sun, 18 Jul 2004 01:16:02 -0700

At 11:15 PM 7/17/2004, John Cowan wrote:

I agree that in the TR#30 context, the Right Thing is to remove the
character pair mappings altogether, and all of the single-character
mappings that have canonical decompositions

In other words, in your opinion, the reasonable thing to do would be for someone to do the AccentFolding as defined in the TR, and then do a DiacriticFolding, to fold the cases where even in NFD accents don't exist as as separate characters.

That's certainly reasonable and not the only case where it's interesting to have chained foldings.

Jony is arguing to extend AccentFolding to Hebrew (fold to unpointed). His suggestion is to fold *all* combining marks used with Hebrew in that case. I want to double check that he really means all combining marks in the Hebrew block, or just some of them.

AccentFolding can't just fold all gc=Mn, since that would include quite a few that are script specific as well as the marks for Symbols, for which different folding rules might need to apply in some context. So I think I'll use as the set of accents to remove all the ones that show up as part of decompositions, plus as many Hebrew accents that Jony can confirm.

(another alternative would be to make the Hebrew folding a separate definition, to allow people to apply one, but not the other.)

I'll make another Draft of DiacriticFolding.txt with the canonical decomp derivables removed. A./

Re: Folding algorithm and canonical equivalence

Reply via email to