Re: Folding algorithm and canonical equivalence

Asmus Freytag Sat, 17 Jul 2004 17:07:00 -0700

Thank you for reviewing this.

DiacriticFolding (unlike AccentFolding) is selective about which combining marks it removes for which base character. I wonder whether that's truly intended, or whether it could be replaced by a combination of

AccentFolding
OtherDiacriticFolding

where AccentFolding removes *all* nonspacing marks following Latin, Greek or Cyrillic letters and we would remove from DiacriticFolding all cases that are already handled by accent folding.

That still doesn't take care of Hebrew, so we would need to decide how to handle that. Perhaps you would like to put forth a proposal as to what accents or diacritics should be folded for Hebrew, and in what context. Is it just Dagesh?

The other alternative would be to limit the nonspacing marks to those that actually occur with Latin / Greek / Cyrillic letters as ordinary diacritics (i.e. all the diacritics that show up in DiacriticFolding.txt), but then remove them if they follow *any* base character from that set, not just in certain fixed combinations.

Rather than list the mappings in a file, we would simply list the conditions, similar to AccendFolding (see http://www.unicode.org/reports/tr30/Foldings.txt) and reduce the data file to those cases where there are no mappings (o with stroke -> o, combining stroke overlay, etc.).

John, you proposed the initial set. Do you have any suggestion here?

A./

Re: Folding algorithm and canonical equivalence

Reply via email to