Roozbeh Pournader wrote:
Well, anything that is completely ignored in collation creates problems
with deterministic sorting.

I don't think you mean "deterministic". UCA is deterministic, it just sorts many strings as equal.


There are certain words in Persian, with
completely different meanings, that only differ in a ZWNJ[1]. Having ZWNJ
ignored by default, means they may appear in this or that order, possibly
based on the original order of input. I guess this is not what we want for deterministic collation.


The desired behavior for ZWNJ, is being treated like punctuations. Ignored in the first levels, but considered at the end. (Personal Note:
write something for UTC on this.)

Possible. I assume that ZWNJ is ignored in UCA because that is the expected behavior for many other languages. Not ignoring ZWNJ is possible with a tailoring that gives it some non-zero weights.


Note that many languages require tailorings for at least a couple of characters to follow national standards.

markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.




Reply via email to