On Tue, Jul 24, 2012 at 8:29 AM, Steven E. Harris <s...@panix.com> wrote: > Richard Hipp <drh-czdrofg0bjidnm+yrof...@public.gmane.org> writes: >> Many keys can be decoded, but for TEXT keys with application-defined >> collating sequences, the encoding is not reversible. Many applications >> will use the ucol_getSortkey() interface to ICU to get the TEXT keys, >> and there is no way to reverse that encoding to recover the original >> text. > > I take it that in this case, the key encoding is not bijective; rather, > the encoding is a surjective function, collapsing multiple source values > into the same encoded value.
If you want normalization-insensitive matching/collating, or if you want case-insensitivity (and the standard Unicode collations basically are), then it does indeed follow that getSortkey must be surjective (because you can construct two strings that compare equal, and therefore the getSortkey output for them must also be equal). If we ignore normalization you can have a bijective getSortkey type of function for *some* collations, though there's little value in making it easy to revert given that in the common case getSortkey must be surjective. Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users