On Wed, 27 Nov 2013 at 21:21:43 -0800, Igor Tandetnik wrote
> Why are you using outer joins when your WHERE clause discards
> unmatched records anyway? If you replace LEFT OUTER with INNER, the
> end result would be exactly the same.

Not for all queries.

Consider the query
 (or (beginswith "full_name" "foo") (beginswith "email" "foo"))

The full_name field is in the main table, since each vcard only has
*one* full_name field. There can be multiple email addresses, so those
are in an auxiliary table.

The SQL query becomes:

 SELECT DISTINCT summary.uid, summary.vcard from 'folder_id' AS summary
 LEFT OUTER JOIN 'folder_id_email_list' AS email_list
     ON email_list.uid = summary.uid
 WHERE (email_list.value like 'foo%')
    OR (summary.full_name like 'foo%');

If that *isn't* a LEFT OUTER JOIN, and there's a record in the main
'folder_id' table which doesn't have *any* email addresses, then it will
be incorrectly omitted from the results.

As it happens, the performance on the LEFT OUTER JOIN sucks somewhat,
since it doesn't use any indices on folder_id(full_name):

0|0|0|SCAN TABLE folder_id AS summary USING INDEX sqlite_autoindex_folder_id_1
0|1|1|SEARCH TABLE folder_id_email_list AS email_list USING INDEX 
UID_INDEX_email_folder_id (uid=?)
0|0|0|USE TEMP B-TREE FOR DISTINCT

There's an optimisation there which maybe it would be nice if sqlite
would see for itself. It works a *whole* lot faster if we express it
thus:

 SELECT DISTINCT summary.uid, summary.vcard from 'folder_id' as summary
 JOIN 'folder_id_email_list' as email_list
     ON +email_list.uid = summary.uid
  WHERE (email_list.value like 'foo%')
UNION
 SELECT summary.uid, summary.vcard from 'folder_id' as summary
  WHERE (summary.full_name like 'foo%');

(Note that this time it *can* be an INNER JOIN.) 

1|0|1|SEARCH TABLE folder_id_email_list AS email_list USING INDEX 
INDEX_email_folder_id (value>? AND value<?)
1|1|0|SEARCH TABLE folder_id AS summary USING INDEX 
sqlite_autoindex_folder_id_1 (uid=?)
2|0|0|SEARCH TABLE folder_id AS summary USING INDEX INDEX_full_name_folder_id 
(full_name>? AND full_name<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)

FWIW on my data set of about 240,000 records from the company address
book, the top query runs in about 1400ms while the last one takes about
6ms. Since this is the query that happens when we start typing into the
To: or Cc: field of an email composer window, we want it to be fast :)

Tristan's original query was about using nested SELECT to implement
logical 'AND'. I don't care much about that, but I sure as hell want him
to use UNION to implement logical 'OR'.

Unless there's a better way? Should we be asking for sqlite to actually
spot the optimisation for itself?

-- 
David Woodhouse                            Open Source Technology Centre
david.woodho...@intel.com                              Intel Corporation
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to