On Saturday, 8 November, 2014 06:56, Tristan Van Berkom <tris...@upstairslabs.com> said:
>On Sat, 2014-11-08 at 06:23 -0700, Keith Medcalf wrote: >> How about the direct approach: >> >> SELECT uid >> FROM resource >> WHERE uid NOT IN (SELECT resource_uid >> FROM event_participant, event >> WHERE event_participant.event_uid = event.uid >> AND event.shift_uid = :shift_uid >> AND event.date = :event_date) >> >> or perhaps >> >> SELECT uid >> FROM resource >> WHERE NOT EXISTS (SELECT 1 >> FROM event_participant, event >> WHERE event_participant.event_uid = event.uid >> AND resource_uid = resource.uid >> AND event.shift_uid = :shift_uid >> AND event.date = :event_date) >> >> Is not the "right way to do it" the one that obtains the result >required, not the one that uses a "checkbox" implementation? > >I've presented 2 queries which both obtain the required result, and I >would have to say both of them have varying levels of correctness. > >If obtaining the correct result was all that was required, then I could >simply issue multiple queries and balloon my memory with result UIDs >in between each of them, all for the sake of better readability (but >then, I'm not sure why I would need a nice tool like SQLite to do it >in the first place). > >So I would have to say, the "right way to do it" is the most efficient >way, the one which provides SQLite with the best indications of how >to plot an efficient query plan. > >The heuristics of the table is generally that resources are rather >finite, while events grow over time (possibly 10 events each with >10 participants for 3 shifts per day, meaning after a year that >table will be large, and we'll still want to find the available >resources hopefully in under 30ms, ideally 10ms). > >So yeah, speed at query time (not insert/update/delete time), is very >important. While either of the above are direct translations, if you wish to minimize materialization of intermediate data then the second would be preferable since it uses only inner joins. However, the first may be far more efficient depending on the shape of your data and the number of rows, how much memory is available for SQLite to use (Bytes, KBytes, MBytes), and the speed of your I/O devices. Given that selecting all the resources based on your stated heuristics of growth, having proper indexes, and the constraints given in the query, I would say that the first NOT IN will be more efficient since it is only building a list of a (couple) hundred or so exclusions, and NOT IN is very efficient. The second form will use a few thousand bytes less memory but perform significantly more I/O. Theoretically, of course, the execution plan of a query which obtains a specified result should not depend on the phrasing of the query. They should all result in the same execution plan and the same result. However, this is not true in practice because of limitations in optimization. Some optimizers need a bit of help in figuring out how to generate an optimum plan, and some others will generate overly complicated and inefficient plans simply because they prefer to use newly added or nifty features only available in that particular database engine rather than a simpler, faster, more direct solution. I don't know why an outer join would be preferable in any case -- you are going to use extra CPU and I/O to perform a join on which a significant proportion of the results will be discarded -- or depend on the optimizer to notice this and "optimize" away the outer join thus ending up with a "more direct" phrasing which could have been written in the first place (and is far more understandable for future maintainers). Many optimizers can only optimize within specific constraints and specifying a cute complicated query will result in the most efficient cute and complicated solution it can muster (ie, a local optimum, not a global optimum). Your statement 1 implements NOT IN indirectly by doing a LEFT JOIN then discarding the intersection set. This requires an extra join and discard over a more direct NOT IN phrasing. It is also more difficult to understand. It could be optimized into the NOT IN or NOT EXISTS form by the optimizer if you are lucky (presently it will not). Your statement 2 implement the correlated subquery NOT EXISTS by an indirect expression, again using a LEFT JOIN then discarding the intersection results. Same comments regarding extra operations and optimizations that cannot (presently) be done apply. If you prefer the ON syntax, then you may of course use it (though, except in the case of a LEFT OUTER JOIN, it is merely syntactic sugar for a list of tables to join and where conditions. SELECT uid FROM resource WHERE uid NOT IN (SELECT resource_uid FROM event_participant JOIN event ON event_participant.event_uid = event.uid WHERE event.shift_uid = :shift_uid AND event.date = :event_date) SELECT uid FROM resource WHERE NOT EXISTS (SELECT 1 FROM event_participant JOIN event ON event_participant.event_uid = event.uid WHERE resource_uid = resource.uid AND event.shift_uid = :shift_uid AND event.date = :event_date) _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users