Thank you for your detailed reply. Though I cannot use DLLs since it is an iPhone iOS (MacOSX) operational system.I was hoping for a collation callback that is called for all characters, not only the first. For my subset of data, it fits just perfect. All comparing fields are UTF8 VARCHAR. Shouldn't sqlite3_create_collation be called for every single character? Let's say the comparing names are "São Paulo" and "Santos". -> SELECT * FROM Game WHERE TeamHome = 'SANTOS' COLLATE anyCIAI;
The LOG function shows a comparison between S and other first char only only: 41 65 A - 53 83 S = -18 43 67 C - 53 83 S = -16 46 70 F - 53 83 S = -13 53 83 S - 53 83 S = 0 49 73 I - 53 83 S = -10 46 70 F - 53 83 S = -13 50 80 P - 53 83 S = -3 43 67 C - 53 83 S = -16 47 71 G - 53 83 S = -12 I was expecting it to go further in the comparison:"São Paulo" and "Santos" should LOGS - Sã - ao - n -> stops here, not what your looking for When using it on ORDER BY, it is clear that only the first char is compared. Since ICU is not an option for iPhone, I've run out of options and ideas here. Thanks again, Date: Thu, 25 Aug 2011 13:30:57 +0200 To: colna...@msn.com From: j...@antichoc.net Subject: Re: [sqlite] Custom Collation comparing only firt character? Hi Roberto, It all depends on your data source(s). If you're sure you have all accented codepoints dealt whith in your custom collation, then it may be enough. But if your application has any possibility to have to deal some day with codepoints that you didn't consider in your collation, then you're going to have it changed, possibily several times, while the app is in the wild. That may be a serious issue with embedded systems ... FYI I forward you a download link to a small SQLite extension I wrote for dealing with Unicode text from several locales. Brief background: my wife and I run a e-shop. We have customers in 27 countries and suppliers in India and China, among other regions. For instance I had to syndicate catalogs and price lists from several indian sources, some of them writen in indian scripts and using indian digits. In short, the extension offers locale-independant unaccentuation, casing and collation(s). There are a number of other text-related functions inside as well, like a locale-independant fuzzy compare. You can download the extension here. It isn't a replacement for ICU: ICU is a _huge_ beast, is slow and requires you to select a _specific_ locale to work with for every operation. My extension is oriented towards locale-independancy, allowing you to perform operations on columns holding text from anywhere (using any Unicode codepoint). Of course locale-independant collation is less than perfect when you focus on a given specific locale: for that ICU is way better. The full Windows x86 footprint is <180kb and even includes functions for utf-8 and utf-16 text to avoid back and forth UTF conversions. Compare to 16+Mb of ICU... While it's likely that the whole baby may not be 100% fitted to your use case, you can still grab ideas, code, tables from the included source. Be sure to take the time to read the explanations on top of the C source. The archive comes with a ready-to-use x86 Windows DLL which allows you to play with the various functions without requiring you to write a single line of code, for instance using a third-party SQLite manager like SQLite Expert (by far my favorite at any rate). Feel free to do whatever you want with the source but please report bugs and/or issues. Don't hesitate to chime here should you have any question. Best regards. -- j...@antichoc.net _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users