Your analysis is correct.

One way this is often fixed in full-text-search systems is to keep
index stats so that the more specific bits of the query can come
first.  In this case, "nicky" AND "nigel" gives you a small enough
result set that it would be more efficient to scan the matched
documents for "Indiana" than it would be to pull the "Indiana" hit
list and intersect with it.

I wouldn't hold your breath on that, though, it would require a lot of work :-).

You might be able to cobble something together above fts3.  Like maybe
isolating the city, state, and zip columns into another table (joined
with the fts3 table on rowid=docid), and indexed on those columns.
Then you could use that table to drop the common items from the query
(something like "select rowid from state_table where state = ? limit
1" will be very fast even if there are many matches).  Run the
resulting query against the fts3 table, putting the results in a temp
table, then join that against the state table for the words you
dropped out of the query.  I know that sounds absurd, but it might end
up not being THAT hard (at least compared to cobbling together an
alternate system entirely).

At minimum, for a query like 'nicky nigel indiana', you should be able
to quickly mock up a test by moving state to a separate table joined
with your fts table on rowid, and then have your WHERE clause throw
'nicky nigel' as a match against the fts table and 'indiana' for the
state in the state table.  If that's still terribly slow, then no need
to implement further.

[BTW, very impressed that you're pumping 20M rows in.  I certainly did
that in testing, but still amazing.]

-scott


On Fri, Oct 9, 2009 at 10:54 AM, Mark <godef...@gmail.com> wrote:
> I have a database containing a single table, with FTS3:
>
> create virtual table my_fts_table using fts3
> (
>        last_name   text not null,
>        first_name  text not null,
>        middle_name text not null,
>        identifiers text not null,
>        address_1   text not null,
>        city        text not null,
>        state       text not null,
>        zip         text not null
> );
>
> I loaded a large number of records (20 million) into this table. I then
> proceeded to run queries against this table, of the following form:
>
> select rowid from my_fts_table where my_fts_table match @query limit 21
>
> This query (usually) returns very quickly. For example, when @query is
> 'nicky nigel', the query returns in mere milliseconds (59, and that's as
> measured by a web browser client when the server's doing the work). If I
> modify the query to be something like 'nicky nigel indiana', the query
> suddenly takes MUCH longer. Over 17 seconds, in fact. It returns fewer
> rows (2 versus 4), but it seems like it must be searching for each term
> individually and then doing the intersection of those results. 'indiana'
> of course returns LOTS of rows (7927809 to be exact), because it's not
> very selective.
>
> I'm wondering if there's something I'm doing wrong, or whether this is
> just an artifact of the FTS3 implementation. I'd love to use it, it's
> simple and perfect for my needs (and I'm already using SQLite in this
> project so it's a great fit), but having the query suddenly take
> "forever" because someone included a state in the search just won't work
> for me.
>
> Mark
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to