Re: [sqlite] Custom Collation comparing only firt character?

Roberto Colnaghi Fri, 26 Aug 2011 04:04:49 -0700

Thank you for your detailed reply.
Though I cannot use DLLs since it is an iPhone iOS (MacOSX) operational 
system.I was hoping for a collation callback that is called for all characters, 
not only the first.
For my subset of data, it fits just perfect. All comparing fields are UTF8 
VARCHAR.
Shouldn't sqlite3_create_collation be called for every single character? Let's 
say the comparing names are "São Paulo" and "Santos". ->  SELECT * FROM Game 
WHERE TeamHome = 'SANTOS' COLLATE anyCIAI;








The LOG function shows a comparison between S and other first char only only:








41 65 A - 53 83 S = -18
43 67 C - 53 83 S = -16
46 70 F - 53 83 S = -13
53 83 S - 53 83 S = 0
49 73 I - 53 83 S = -10
46 70 F - 53 83 S = -13
50 80 P - 53 83 S = -3
43 67 C - 53 83 S = -16
47 71 G - 53 83 S = -12
I was expecting it to go further in the comparison:"São Paulo" and "Santos" 
should LOGS - Sã - ao - n -> stops here, not what your looking for
When using it on ORDER BY, it is clear that only the first char is compared.
Since ICU is not an option for iPhone, I've run out of options and ideas here.
Thanks again,











Date: Thu, 25 Aug 2011 13:30:57 +0200
To: colna...@msn.com
From: j...@antichoc.net
Subject: Re: [sqlite] Custom Collation comparing only firt character?



Hi Roberto,


It all depends on your data source(s).  If you're sure you have all
accented codepoints dealt whith in your custom collation, then it may be
enough.  But if your application has any possibility to have to deal
some day with codepoints that you didn't consider in your
collation, then you're going to have it changed, possibily several times,
while the app is in the wild.  That may be a serious issue with
embedded systems ...


FYI I forward you a download link to a small SQLite extension I wrote for
dealing with Unicode text from several locales.  Brief background:
my wife and I run a e-shop.  We have customers in 27 countries and
suppliers in India and China, among other regions.  For instance I
had to syndicate catalogs and price lists from several indian sources,
some of them writen in indian scripts and using indian digits.


In short, the extension offers locale-independant unaccentuation, casing
and collation(s).  There are a number of other text-related
functions inside as well, like a locale-independant fuzzy
compare.


You can download the extension
here.


It isn't a replacement for ICU: ICU is a _huge_ beast, is slow and
requires you to select a _specific_ locale to work with for every
operation.  My extension is oriented towards locale-independancy,
allowing you to perform operations on columns holding text from anywhere
(using any Unicode codepoint).  Of course locale-independant
collation is less than perfect when you focus on a given specific locale:
for that ICU is way better.  The full Windows x86 footprint is
<180kb and even includes functions for utf-8 and utf-16 text to avoid
back and forth UTF conversions.  Compare to 16+Mb of ICU...


While it's likely that the whole baby may not be 100% fitted to your use
case, you can still grab ideas, code, tables from the included
source.  Be sure to take the time to read the explanations on top of
the C source.


The archive comes with a ready-to-use x86 Windows DLL which allows you to
play with the various functions without requiring you to write a single
line of code, for instance using a third-party SQLite manager like SQLite
Expert (by far my favorite at any rate).


Feel free to do whatever you want with the source but please report bugs
and/or issues.

Don't hesitate to chime here should you have any question.


Best regards.


--

j...@antichoc.net                                         
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Custom Collation comparing only firt character?

Reply via email to