Re: [sqlite] Need advice on using nested selects in JOIN statements as a logical 'AND'

Tristan Van Berkom Sat, 30 Nov 2013 09:41:32 -0800

On Thu, 2013-11-28 at 00:20 -0500, Igor Tandetnik wrote:
> On 11/27/2013 11:52 PM, Tristan Van Berkom wrote:
> > ================================================
> > SELECT DISTINCT summary.uid, summary.vcard FROM 'folder_id' AS summary
> > LEFT OUTER JOIN 'folder_id_phone_list' AS phone_list
> >      ON phone_list.uid = summary.uid
> > LEFT OUTER JOIN 'folder_id_email_list' AS email_list
> >      ON email_list.uid = summary.uid
> > WHERE (phone_list.value IS NOT NULL AND phone_list.value LIKE '%0505')
> >    AND (email_list.value IS NOT NULL AND email_list.value LIKE 'eddie%')
> 
> Why are you using outer joins when your WHERE clause discards unmatched 
> records anyway? If you replace LEFT OUTER with INNER, the end result 
> would be exactly the same.
> 
> I have a strong feeling that, once you replace outer joins with regular 
> joins, this statement would run just as fast as your convoluted one with 
> nested selects. By using outer joins, you prevent SQLite from reordering 
> the conditions and using the most efficient search strategy - it is 
> forced to perform those joins left to right, which results in bad 
> performance because, apparently, you don't have indexes on 
> phone_list.uid or email_list.uid.
> 
> Try this straightforward query:
> 
> SELECT DISTINCT summary.uid, summary.vcard FROM folder_id AS summary
> JOIN folder_id_phone_list AS phone_list
>      ON phone_list.uid = summary.uid
> JOIN folder_id_email_list AS email_list
>      ON email_list.uid = summary.uid
> WHERE phone_list.value LIKE '%0505'  AND email_list.value LIKE 'eddie%';


This has been a great help, I've run benchmarks and this does perform
well with large data sets (large for addressbooks at least).

But I have now one more question.

I have a tradeoff that depends on whether I create an index
on email_list.uid or not.

The reason it was important to create an index on the uid
column (related to the primary key in the main summary table),
is because it speeds up inserts.

Maybe my insert statements are wrong, what I do at insert time
is first:
   DELETE FROM 'email_list' WHERE uid = 'the uid to remove'

And then I run the normal INSERT OR (REPLACE/FAIL) INTO
the summary table, and then populate the email_list table
again (it's important to replace all the rows in the email_list
table with the new ones from the new vCard).

When we have huge books, the index on email_list.uid helps
a lot.

However, in a statement formed like the one you proposed 
above, it screws with the query optimizer in SQLite I suspect,
i.e. when searching for a prefix on an email address,
SQLite (I suspect) decides to prioritize the UID index instead
of the perfectly good (much better even) index on email_list.value.

For exact matches, it would seem email_list.value is correctly
chosen for index traversal, for prefix matches, SQLite seems to
want to just prioritize the UID.

Here are my results:

Prefix matches on the email_list.value column:
  https://people.gnome.org/~tvb/filter-by-short-email-address-prefix.png

Exact matches on the email_list.value column:
  https://people.gnome.org/~tvb/filter-by-email-address.png

Inserting contacts:
  https://people.gnome.org/~tvb/contact-saving.png

The green line you can ignore, we're interested in "Master" and
"Experimental".

Master adds no index to the email_list.uid column

Experimental adds an index to the email_list.uid column

So, is there a way that I can tell SQLite forcibly to
prioritize the index on email_list.value when making
a prefix match ?

Would an explicit "COLLATE email_list.value" placed 
somewhere in the above query help ?

Thanks everyone for your help, I've come a long way with
the advice I've gotten from this list.

Best,
    -Tristan


> 
> >     o Leverage the index which I've created on 'folder_id_email_list'
> >       (I am using case insensitive LIKE statements so the indexes
> >       work in that statement).
> 
> Normally, you need case-sensitive LIKE in order to use the index, unless 
> the index is created with COLLATE NOCASE. You could use EXPLAIN QUERY 
> PLAN to confirm that the index is indeed being utilized.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Need advice on using nested selects in JOIN statements as a logical 'AND'

Reply via email to