Wow! Great news for sorting Unicode! On May 30, 2006, at 5:08 PM, Devin Asay wrote:
I got your code to work by making some simple changes in the sortCodeFromRussian function:
Deven, I've been processing some bits of UTF-8, and something dawned on me that is probably known by the Unicode experts.
**** A lexical byte sort of well-formed UTF-8 will result in a Unicode code point sort! *****
That avoids the NUL problem in sort. That means that russianLex() can return the UTF-8 of the string with your character conversions.
I think the replace command will work with UTF-8, so you can even avoid a character loop. All you need is 34 replaces and then a return. OK, that might actually be slower than a character loop.
Dar Unicode Sophomore _______________________________________________ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution