On Tue, Jul 24, 2012 at 8:29 AM, Steven E. Harris <s...@panix.com> wrote:
> Richard Hipp <drh-czdrofg0bjidnm+yrof...@public.gmane.org> writes:
>> Many keys can be decoded, but for TEXT keys with application-defined
>> collating sequences, the encoding is not reversible. Many applications
>> will use the ucol_getSortkey() interface to ICU to get the TEXT keys,
>> and there is no way to reverse that encoding to recover the original
>> text.
>
> I take it that in this case, the key encoding is not bijective; rather,
> the encoding is a surjective function, collapsing multiple source values
> into the same encoded value.

If you want normalization-insensitive matching/collating, or if you
want case-insensitivity (and the standard Unicode collations basically
are), then it does indeed follow that getSortkey must be surjective
(because you can construct two strings that compare equal, and
therefore the getSortkey output for them must also be equal).  If we
ignore normalization you can have a bijective getSortkey type of
function for *some* collations, though there's little value in making
it easy to revert given that in the common case getSortkey must be
surjective.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to