On Thu, 2013-11-28 at 00:20 -0500, Igor Tandetnik wrote: > On 11/27/2013 11:52 PM, Tristan Van Berkom wrote: > > ================================================ > > SELECT DISTINCT summary.uid, summary.vcard FROM 'folder_id' AS summary > > LEFT OUTER JOIN 'folder_id_phone_list' AS phone_list > > ON phone_list.uid = summary.uid > > LEFT OUTER JOIN 'folder_id_email_list' AS email_list > > ON email_list.uid = summary.uid > > WHERE (phone_list.value IS NOT NULL AND phone_list.value LIKE '%0505') > > AND (email_list.value IS NOT NULL AND email_list.value LIKE 'eddie%') > > Why are you using outer joins when your WHERE clause discards unmatched > records anyway? If you replace LEFT OUTER with INNER, the end result > would be exactly the same. > > I have a strong feeling that, once you replace outer joins with regular > joins, this statement would run just as fast as your convoluted one with > nested selects. By using outer joins, you prevent SQLite from reordering > the conditions and using the most efficient search strategy - it is > forced to perform those joins left to right, which results in bad > performance because, apparently, you don't have indexes on > phone_list.uid or email_list.uid. > > Try this straightforward query: > > SELECT DISTINCT summary.uid, summary.vcard FROM folder_id AS summary > JOIN folder_id_phone_list AS phone_list > ON phone_list.uid = summary.uid > JOIN folder_id_email_list AS email_list > ON email_list.uid = summary.uid > WHERE phone_list.value LIKE '%0505' AND email_list.value LIKE 'eddie%';
This has been a great help, I've run benchmarks and this does perform well with large data sets (large for addressbooks at least). But I have now one more question. I have a tradeoff that depends on whether I create an index on email_list.uid or not. The reason it was important to create an index on the uid column (related to the primary key in the main summary table), is because it speeds up inserts. Maybe my insert statements are wrong, what I do at insert time is first: DELETE FROM 'email_list' WHERE uid = 'the uid to remove' And then I run the normal INSERT OR (REPLACE/FAIL) INTO the summary table, and then populate the email_list table again (it's important to replace all the rows in the email_list table with the new ones from the new vCard). When we have huge books, the index on email_list.uid helps a lot. However, in a statement formed like the one you proposed above, it screws with the query optimizer in SQLite I suspect, i.e. when searching for a prefix on an email address, SQLite (I suspect) decides to prioritize the UID index instead of the perfectly good (much better even) index on email_list.value. For exact matches, it would seem email_list.value is correctly chosen for index traversal, for prefix matches, SQLite seems to want to just prioritize the UID. Here are my results: Prefix matches on the email_list.value column: https://people.gnome.org/~tvb/filter-by-short-email-address-prefix.png Exact matches on the email_list.value column: https://people.gnome.org/~tvb/filter-by-email-address.png Inserting contacts: https://people.gnome.org/~tvb/contact-saving.png The green line you can ignore, we're interested in "Master" and "Experimental". Master adds no index to the email_list.uid column Experimental adds an index to the email_list.uid column So, is there a way that I can tell SQLite forcibly to prioritize the index on email_list.value when making a prefix match ? Would an explicit "COLLATE email_list.value" placed somewhere in the above query help ? Thanks everyone for your help, I've come a long way with the advice I've gotten from this list. Best, -Tristan > > > o Leverage the index which I've created on 'folder_id_email_list' > > (I am using case insensitive LIKE statements so the indexes > > work in that statement). > > Normally, you need case-sensitive LIKE in order to use the index, unless > the index is created with COLLATE NOCASE. You could use EXPLAIN QUERY > PLAN to confirm that the index is indeed being utilized. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users