On Fri, Aug 04, 2006 at 10:02:58PM -0700, Cory Nelson wrote: > On 8/4/06, Trevor Talbot <[EMAIL PROTECTED]> wrote: > >On 8/4/06, Cory Nelson <[EMAIL PROTECTED]> wrote: > > > >> But, since you brought it up - I have no expectations of SQLite > >> integrating a full Unicode locale library, however it would be a great > >> improvement if it would respect the current locale and use wcs* > >> functions when available, or at least order by standard Unicode order > >> instead of completely mangling things on UTF-8 codes. > > > >What do you mean by "standard Unicode order" in this context? > > > > Convert UTF-8 to UTF-16 (or both to UCS-4 if you want to be entirely > correct) while sorting, to at least make them follow the same pattern.
Huh? UTF-8 handled in the naive way (using "memcmp", like sqlite does) will automagically give you sorting by unicode codepoint (probably the only useful meaning of "standard Unicode order" here). UTF-16 handled in the naive way (either using "memcmp" or lexicographically on 2-byte integers) will sort things by codepoint, mostly, sort of, and otherwise by a weird order that falls out of details of the UTF-16 standard accidentally.[1] Perhaps you're using a legacy system that standardized on UTF-16 before the BMP ran out, and want to be compatible with its idiosyncratic sorting -- then converting things to UTF-16 before comparing makes sense. But that's not really appropriate to make as a general recommendation... better to convert UTF-16 to UTF-8, if you want to be entirely correct :-). [1] see e.g. http://icu.sourceforge.net/docs/papers/utf16_code_point_order.html -- Nathaniel -- Details are all that matters; God dwells there, and you never get to see Him if you don't struggle to get them right. -- Stephen Jay Gould