For those that were interested in the  discussion about iOS NOCASE collation
<http://sqlite.1065341.n5.nabble.com/Collation-advice-td70668.html>   here
are some news.

First of all we made the original algorithm more safe. The most important
step was to limit the character set where ascii comparison was applied. This
was done by modifying SAFE_CHAR macro:

#define SAFE_CHAR(x)    ( (0x61<=(x) && (x)<=0x7A)  ||  (0x30<=(x) &&
(x)<=0x39)  ||  (x)==32 )

After running a huge number of tests the code was released to production.
However, from time to time we got reports about sqlite crashes. Closer
analysis showed integrity problems in indexes.

At that point we decided to run collation tests (testing
reflexivity/symmetry/transitivity) on randomly generated strings. And we
found things like this:

'qsnB;QQ' < 'qsS1&u4'  &&  'qsS1&u4' < 'qß:0,hg', but 'qsnB;QQ' > 'qß:0,hg'
!!!

Hard to believe that a system compare function can behave this way, but this
is how the iOS call
CFStringCompare(s1, s2,
kCFCompareCaseInsensitive|kCFCompareLocalized|kCFCompareDiacriticInsensitive|kCFCompareNonliteral);
works.

This particular problem is caused by the flag kCFCompareLocalized. Needless
to say that there is no documentation that would warn you.

I thought this is worth publishing as it may save quite a few headaches for
iOS programmers that need to cook sorting algorithm.




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Revisiting-Collation-Advice-tp71546.html
Sent from the SQLite mailing list archive at Nabble.com.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to