On 19 Sep 2009, at 3:07am, Igor Tandetnik wrote:

> Simon Slavin wrote:
>> Thanks to you and Jay for explanations.  I hadn't encountered ICU at
>> all before.  Your descriptions make perfect sense and are very
>> interesting since ICU is a good attempt to get around one of the
>> fundamental problems of Unicode.
> Out of curiosity - what do you consider a fundamental problem of
> Unicode? The fact that different people may prefer their strings  
> sorted
> differently?

Only in that it's a fundamental problem with the way Unicode was  
defined.  I completely recognise that the question of sorting cannot  
be answered at the level of characters for the reasons we discussed:  
different alphabets have different meanings for the same characters,  
and Unicode has just one entry for the character.  It might have made  
more sense to define two levels of character definitions: one which  
says what 'c with a hat on' looks like, and another that defines  
alphabets, character alternatives, and where 'c with a hat on' comes  
in various alphabets.

The problem I was referring to is that there's no consistent way of  
picking up which characters are variants of other characters.  In the  
Roman alphabet, it would be very useful to be able to look at the  
codes for 'l' and capital 'L' and realise that they're somehow the  
same.  In Hebrew it would be useful to be able match not only capital  
and lower-case characters, but also the variants used when a character  
occurs at the end of a word.

ICU is a great way to approach these problems and similar ones.  I  
have no problem with it.

On 19 Sep 2009, at 3:17am, Roger Binns wrote:

> Errr, this is not the fault of Unicode.

Your reaction to my post is amusingly similar to my reaction when  
people assume that database synchronisation is simple.

Sorry to have irritated you.  I understand Unicode in more detail than  
we've discussed here.  I do not consider these things to be 'the fault  
of Unicode' rather, in the words I used, 'problems with Unicode'.  And  
I do consider Unicode to be far superior to the mess of code pages we  
used to have to implement before it became popular.

sqlite-users mailing list

Reply via email to