On 17 Nov 2009, at 6:37pm, Igor Tandetnik wrote:

> Simon Slavin <slav...@bigfraud.org> wrote:
>> On 17 Nov 2009, at 5:52pm, Igor Tandetnik wrote:
>> 
>>> But for your goals, it has to be sortable, right? In a proper
>>> Unicode collation, U+0041 U+0301 would behave quite differently from
>>> U+0301 U+0041. Consider "A ' E" (where ' stands for a combining
>>> acute accent). In most locales, this would sort between AE and BE.
>>> Now, if we reverse it naively, we'll end up with "E ' A", with the
>>> accent now attached to E and not A. The result would sort between EA
>>> and FA, rather than between EA and EB as you would probably want.   
>> 
>> Obviously, your routine to reverse a string must be unicode-aware. 
> 
> Tim Romano seems to insist on precisely the opposite.

That would be suffient for Tim, but it's too weak to be useful for many people, 
therefore it's probably never going to be written.

>> First split the string into characters, then reassemble them in
>> reverse order.
> 
> The problem is, in Unicode it's not quite clear what constitutes a 
> "character". Are we talking about codepoints, sort elements, graphemes? 
> Depending on the application, either definition might make sense.

I agree about the problem, but sort elements is the obvious answer in this 
case.  By the way, for those of you wondering about what it would take to 
support Unicode in an index (i.e. to sort Unicode strings) here's an outline of 
the problems involved and what's necessary:

<http://unicode.org/reports/tr10/>

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to