>So, for example, if one wanted to find all rows where myNormalColumn >ENDS WITH 'fi c d', one could search myFlippedColumn like this: > >select * from LEXICON where myFlippedColumn LIKE 'd c if%' -- >allows index use
Make this select * from LEXICON where myFlippedColumn LIKE flip('fi c d') || '%' and you get rid of _this_ issue. But if you happen to have the decomposed A grave 'À' Igor examplified stored as a single codepoint (or vice-versa) or with any spacing modifier (or half an infinity of them!) then you're loosing any chance to match. Also as Igor just replies, collation wouldn't work nicely. >This doesn't really require combining-form intelligence on the part of >the developer's code either. As long as the search-term on the RHS gets >flipped codepoint-by-codepoint and no attempt is made to "be >intelligent" about the combining form, everything will be honky-dory. That seems to me as another good instance for "know you data" thing. The best bet for a given proprietary base would be to work with string conforming to some set of well defined rules and stick with them, at least for data subject to comparison. The rules don't even have to be one of the "Normalized" form and can be any consistent invariant that fits the needs, the simpler the better of course. If collation is needed, then a much more complex flipping is required in the general case. Anyway, since the vast majority of DB applications appear to be in the business area, is there a common need to work with anything else than the most compact and easy to handle Norm C strings (and possibly filter out exotic spacing or modifiers) at the DB storage level? Saying so, I mean for the "typical" data one is likely to index, search, compare in most applications. BTW, this raises a side question. Without hijacking the thread, I for one would be interested to know how other major RDBMS handle Unicode data in their predefined fixed-size CHAR(25)? I wild guess that the filtering layers apply a severe filter to every input field to avoid having 12 significant characters represented by a 453 codepoint sequence and truncated to the first 25 including several non-informational codepoints. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users