Hi all,

Today I've stumbled on a situation where I think I really need to use a
RIGHT OUTER JOIN, and looking at all the examples on the internet I
could find so far, I'm not finding a way to simulate it properly using
LEFT OUTER JOINs.

So I thought, before I commit to an inefficient alternative I would
check to see if someone on this list has an idea of how I can form an
efficient query for this.

I'll start by highlighting some of the schema which is relevant to the
query:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CREATE TABLE shift (
  uid INTEGER PRIMARY KEY AUTOINCREMENT,
    
  ... name and start time/duration of a shift ...
);

CREATE TABLE resource (
  uid INTEGER PRIMARY KEY AUTOINCREMENT,
    
  ... details about the resource ...
);

CREATE TABLE event (
  uid        INTEGER  PRIMARY KEY AUTOINCREMENT,
  date       INTEGER  NOT NULL, /* unix timestamp */
  shift_uid  INTEGER  NOT NULL REFERENCES shift (uid),

  ... other details about the event ...
);

CREATE TABLE IF NOT EXISTS event_participant (
  id              INTEGER CHECK (id > 0),
  event_uid       INTEGER REFERENCES event (uid) ON DELETE CASCADE,
  resource_uid    INTEGER REFERENCES resource (uid),

  ... other details about this participant ...

  PRIMARY KEY (event_uid, id)
);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now my goal here, is to pull out every resource UID which is _not_
assigned to any event on a given date and shift UID.

My first attempt that gets the job done looks like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~                              Statement 1                             ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SELECT resource.uid

/* All resources... */
FROM   resource

/* Subtract from this set all the booked resources */
LEFT JOIN (

    /* All resources which are booked on the given date / shift_uid */
    SELECT participant.resource_uid AS uid
    FROM event
    JOIN event_participant
      AS participant
      ON (participant.event_uid = event.uid)
    WHERE (event.date      = 1411185600 AND
           event.shift_uid = 1 AND
           participant.resource_uid IS NOT NULL)

) AS booked_resource ON (booked_resource.uid = resource.uid)

WHERE booked_resource.uid IS NULL;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXPLAIN SAYS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1|0|0|SEARCH TABLE event USING INDEX event_shift_uid_idx (shift_uid=?)
(~2 rows)
1|1|1|SEARCH TABLE event_participant AS participant USING INDEX
event_participant_pk_idx (event_uid=?) (~5 rows)
0|0|0|SCAN TABLE resource (~1000000 rows)
0|1|1|SEARCH SUBQUERY 1 AS booked_resource USING AUTOMATIC COVERING
INDEX (uid=?) (~2 rows)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now, this is not all that bad, except that I try to avoid nesting select
statements in cases like this as it prevents the query planner from
doing anything intelligent.


So I made another attempt which discards the nested statement:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~                              Statement 2                             ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SELECT resource.uid

/* All Resources */
FROM resource

/* Get the cartesian product of event * resources, dangerously big,
 * but hopefully trimmed down by constraints found in the final WHERE
 * clause
 */
JOIN event

/* Subtract from this set all the booked resources */
LEFT JOIN event_participant
  AS participant
  ON (participant.event_uid    = event.uid AND
      participant.resource_uid = resource.uid)
WHERE event.date               = 1411185600 AND
      event.shift_uid          = 1 AND
      participant.resource_uid IS NULL;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXPLAIN SAYS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0|0|1|SEARCH TABLE event USING INDEX event_shift_uid_idx (shift_uid=?)
(~2 rows)
0|1|0|SCAN TABLE resource (~1000000 rows)
0|2|2|SEARCH TABLE event_participant AS participant USING INDEX
event_participant_resource_uid_idx (resource_uid=?) (~2 rows)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It turns out that this query is almost semantically the same as the
first, however without the nested select, EXPLAIN QUERY PLAN seems to
indicate the performance is better (the SCAN on the 'resource' table
is still present, but the extra SEARCH SUBQUERY is not gone).

Finally, after doing some more digging, I think I've found that the
correct way to make this query should really be with RIGHT OUTER JOIN as
follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~                              Statement 3                             ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SELECT resource.uid

/* All events on the date specified in the final WHERE clause */
FROM event

/* Join participants by indexed event key, only those with assigned
 * resources
 */
JOIN event_participant
  AS participant
  ON (participant.event_uid = event.uid AND
      participant.resource_uid IS NOT NULL)

/* Join in all the resources which do not appear in the preceding set */
RIGHT OUTER JOIN resource
  ON (resource.uid = participant.resource_uid)
WHERE event.DATE      = 1411185600 AND
      event.shift_uid = 1 AND
      participant.resource_uid IS NULL;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

However, unlike other simple use cases of LEFT OUTER JOIN, this one
cannot simply be inverted to get the same performance/results as
with RIGHT OUTER JOIN (inverting the statement only works in the
very simple case where you really only have 2 tables to join).

So, in closing, I have only been doing SQL for a couple of years now
and would consider myself a 'newbie' at this - and I was wondering if
someone more enlightened could point out something I've missed.

If it's the case that RIGHT OUTER JOIN would be the correct way but
SQLite simply doesn't offer that (and there is no way to efficiently
work around this), I will gladly except that and probably go with a
variation of 'Statement 2' above.

I would also be interested to know if 'Statement 1' may be superior
to 'Statement 2' due to the dangerously huge cartesian product of
the 'event' and 'resource' tables.

Thank you all for your attention :)

Best Regards,
    -Tristan


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to