Magda Danish (Unicode) wrote:
 > -----Original Message-----
 > From: Vladimir Ivanov [mailto:[EMAIL PROTECTED]

 > It is clearly seen that there are letters on both sides of ZWNJ within the
 > word boundaries. Placing ZWNJ on an edge of the word doesn’t make sense in
 > Persian. From this point of view ZWNJ should be treated as a special
 > character rather than a delimiter.

The Unicode Collation Algorithm (UCA) for which allkeys.txt is the default weight table does treat ZWNJ and a number of other characters as special. For these, they are completely ignored by the UCA - same as if you stripped them from the text.


 > But in Allkeys Table it is placed on line #68 well before other popular
 > delimiters: HORIZONTAL TABULATION line #192,

The order of entries in allkeys is irrelevant; what is relevant is the assignment of weights, and ZWNJ gets all-zero weights. You need to implement the algorithm, not just the relative order of entries in the file. (allkeys does sort its entries by shifted, multi-level weights, but order for same-weight characters does not matter.)


 > I’ve sold this problem for myself by placing ZWNJ somewhere after
 > delimiters, but what are the theoretical reasons for putting
 > it before them?
 > In order to get what? In what languages?

"Before" is wrong, see above. Think of ZWNJ as "not there" for UCA.


 > By the way, the sorting algorithm built into MS Windows puts compound words
 > with ZWNJ AFTER their simple components. So in this respect it acts on the
 > principles different from Allkeys Table.

Windows does not implement the Unicode Collation Algorithm, as far as I know.


Best regards,
markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.




Reply via email to