On Mon, Nov 30, 2009 at 08:56:48PM +0000, Simon Slavin wrote:
> On 30 Nov 2009, at 6:49pm, Nicolas Williams wrote:
> > See my previous message: it would make no sense to have a column with
> > data-dependent collations.  But perhaps I'm missing something.  Can you
> > describe the semantics of data-dependent collations?
> 
> Suppose I am preparing a sales system.  It can be used by users in
> many countries, but a user in each country sees only products sold in
> that country, and they see those products sorted into alphabetic order
> -- however their own country sorts its alphabet.  So in one table I
> have
> 
> Product ID    Market          Product name
> ----------    ------          ------------
> 1             US              Box
> 2             US              Bible
> 3             US              Barrel
> 4             France          BoƮte
> 5             France          Bible
> [...]
> 
> So in the same column I have words in three different languages.  If
> German had a different alphabet to English, depending on which
> customer I am SELECTing for I will need to use a different collation
> to sort the results.  Of course, I expect this is relatively rare, and
> can be perhaps be handled using VIEWs or some other trick, but I
> wanted to demonstrate that it is sometimes needed.

This falls into case (b): localization.  Here it'd be nice if SQLite3
allowed collations to be named via parameters, so you could compile a
query once.  But it does not, so you need to compile a query for each
language.  But that's as far as it goes: you're still specifying the
collation rather than letting the data specify a collation.

I.e., it'd be nice if you could compile

    SELECT product_name FROM products
        WHERE market = :country ORDER BY product_name COLLATE :lang ASC;

bind in the country and language, and then execute, but as it is SQLite3
doesn't let you parametrize collations.  So you have to mprintf and
compile a separate query for each collation.

Oh, I suppose one might want to allow a collation to be the result of
another query, so that you could lookup collations by country code, or
what have you

    SELECT product_name FROM products
        WHERE market = :country
        ORDER BY (SELECT lang FROM country_to_lang WHERE country = :country)

But that's still not a collation that is derived from the data being
sorted, but from other data.

This:

    SELECT product_name FROM products
        WHERE market = :country ORDER BY lang_of(product_name);

makes no sense!  That's because nothing guarantees that
lang_of(product_name) will be singular here.  (Also, not only does it
make no sense, it doesn't help you since SQLite3 will not compile such a
query.)

> >>> Note too that Unicode has codepoints for specifying the language that
> >>> the subsequent text is written in.
> > 
> > http://unicode.org/unicode/faq/languagetagging.html#2
> > http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf#G17521
> > (section 16.9)
> 
> Many thanks.  I did not know this and it saves me from having to
> invent my own system in something else I am doing.  I do understand
> why it should not normally be used and I will not over-use it.

I don't think you should use Unicode language tags though.  See above
and previous messages.

> On 30 Nov 2009, at 7:16pm, Nuno Lucas wrote:
> > Note that some countries have different collations depending on the 
> > objective of the output. For example, dictionary order may be different 
> > from phone-book order and different from other general listing order.
> 
> Arghhhhhhhh !

Yes, it all sucks.  At least some language academies are now working to
reduce such problems.  For example, it used to be that case folding is
Spanish did preserve accents in the to-upper-case direction, and Spanish
used to have special sort orders for 'ch' and 'll' (which were as though
a single character).

Nico
-- 
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to