As Igor mentioned in a previous post, Hebrew has no capital and lower case
versions of a letter. It has only what's called Mantzpach, which are 5
letters which look different if in the end of a word; IIRC this is never an
issue since they are ordered right after the original letter, and have their
own character value (unlike capital/lower letters in English). Niqqud (the
diacritical signs) are considered characters on their own, so you should
either strip them (a common solution), or take that into account; here I'm
not sure what is the default behavior.

Also, since Hebrew letters are only used in Hebrew texts (unlike latin
characters), sorting is hardly an issue if you have a Unicode representation
of the character (usually a two byte one). The issues ICU is more likely to
help you with are logical to visual conversion and the other way around and
BiDi stuff.

That's my own grasp of things, never had to use ICU myself.

Itamar.

-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin
Sent: Saturday, September 19, 2009 6:44 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Most wanted features of SQLite ?


On 19 Sep 2009, at 3:07am, Igor Tandetnik wrote:

> Simon Slavin wrote:
>> Thanks to you and Jay for explanations.  I hadn't encountered ICU at 
>> all before.  Your descriptions make perfect sense and are very 
>> interesting since ICU is a good attempt to get around one of the 
>> fundamental problems of Unicode.
>
> Out of curiosity - what do you consider a fundamental problem of 
> Unicode? The fact that different people may prefer their strings 
> sorted differently?

Only in that it's a fundamental problem with the way Unicode was defined.  I
completely recognise that the question of sorting cannot be answered at the
level of characters for the reasons we discussed:  
different alphabets have different meanings for the same characters, and
Unicode has just one entry for the character.  It might have made more sense
to define two levels of character definitions: one which says what 'c with a
hat on' looks like, and another that defines alphabets, character
alternatives, and where 'c with a hat on' comes in various alphabets.

The problem I was referring to is that there's no consistent way of picking
up which characters are variants of other characters.  In the Roman
alphabet, it would be very useful to be able to look at the codes for 'l'
and capital 'L' and realise that they're somehow the same.  In Hebrew it
would be useful to be able match not only capital and lower-case characters,
but also the variants used when a character occurs at the end of a word.

ICU is a great way to approach these problems and similar ones.  I have no
problem with it.


On 19 Sep 2009, at 3:17am, Roger Binns wrote:

> Errr, this is not the fault of Unicode.

Your reaction to my post is amusingly similar to my reaction when people
assume that database synchronisation is simple.

Sorry to have irritated you.  I understand Unicode in more detail than we've
discussed here.  I do not consider these things to be 'the fault of Unicode'
rather, in the words I used, 'problems with Unicode'.  And I do consider
Unicode to be far superior to the mess of code pages we used to have to
implement before it became popular.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to