Hello all,

I have a case where the user needs to perform a search in a text column of
a table with many rows. Typically the user enters the first n matching
characters as a search string and the application issues a SELECT statement
that uses the LIKE operator with the search string:

SELECT * FROM ATable WHERE AColumn LIKE :SearchString || '%'

According to the LIKE optimization this statement will use an index so it
will be fast.

The application is used by Greek users. The greek alphabet has some letters
that are visually identical to corresponding latin letters when in
capitals, (e.g. A, E, I, O, B, H, K, etc), some that are different but
match deterministically to a single latin letter (e.g. Σ = S, Φ = F, Ω = O
etc.) and some that don't have a deterministic match (e.g. Γ = Gh or Yi).

The table contains strings that consist of words that can be written in
either latin or greek characters; sometimes even mixed (the user changed
input locale midword before typing the first non-common letter). I have a
request that the search should match strings that are written with either
latin or greek or mixed letters, e.g. "MIS" should match "MISKO" (all
latin), "ΜΙΣΚΟ" (all greek) or "MIΣΚΟ" (first two letters latin, rest
greek). I thought of using a custom collation that does this type of
comparison, have the column use that collation and create an index on that
column to speed up the search but I discovered that the LIKE operator
either will not use collations other than BINARY and NOCASE (pragma
case_sensitive_like) or (if overloaded to perform custom matching) will not
use an index, and, worse yet, its behaviour will be the same to all string
comparisons regardless of collation. So, a full table scan seems inevitable.

I was wondering whether it is realistic to ask for the LIKE operator to use
by default the assigned collation of a column. I assume that an index on
that column is using by default the specified collation of the column for
comparisons, so a LIKE clause like the aforementioned can use the index and
perform a fast search while using the "mixed" comparison I need. This would
transparently solve my problem and make the case_sensitive_like pragma
redundant, but for backward compatibility this behaviour could be activated
by a new pragma.

Are there any details I am missing that prevent this from being implemented?

Thanks in advance.

--Constantine.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to