> -----Original Message----- > From: Vladimir Ivanov [mailto:[EMAIL PROTECTED]
> It is clearly seen that there are letters on both sides of ZWNJ within the > word boundaries. Placing ZWNJ on an edge of the word doesn’t make sense in > Persian. From this point of view ZWNJ should be treated as a special > character rather than a delimiter.
The Unicode Collation Algorithm (UCA) for which allkeys.txt is the default weight table does treat ZWNJ and a number of other characters as special. For these, they are completely ignored by the UCA - same as if you stripped them from the text.
> But in Allkeys Table it is placed on line #68 well before other popular > delimiters: HORIZONTAL TABULATION line #192,
The order of entries in allkeys is irrelevant; what is relevant is the assignment of weights, and ZWNJ gets all-zero weights. You need to implement the algorithm, not just the relative order of entries in the file. (allkeys does sort its entries by shifted, multi-level weights, but order for same-weight characters does not matter.)
> I’ve sold this problem for myself by placing ZWNJ somewhere after > delimiters, but what are the theoretical reasons for putting > it before them? > In order to get what? In what languages?
"Before" is wrong, see above. Think of ZWNJ as "not there" for UCA.
> By the way, the sorting algorithm built into MS Windows puts compound words > with ZWNJ AFTER their simple components. So in this respect it acts on the > principles different from Allkeys Table.
Windows does not implement the Unicode Collation Algorithm, as far as I know.
Best regards, markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.