> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jack...@eu.citrix.com> wrote:
> 
> Stuff the two queries together: we use the firsty query as a WITH
> clause.  This is significantly faster, perhaps because the query
> optimiser does a better job but probably just because it saves on
> round trips.
> 
> No functional change.
> 
> Perf: subjectively this seemed to help when the cache was cold.  Now I
> have a warm cache and it doesn't seem to make much difference.
> 
> Perf: runtime of my test case now ~5-7s.
> 
> Example queries before (from the debugging output):
> 
> Query A part I:
> 
>            SELECT f.flight AS flight,
>                   j.job AS job,
>                   f.started AS started,
>                   j.status AS status
>                     FROM flights f
>                     JOIN jobs j USING (flight)
>                     JOIN runvars r
>                             ON  f.flight=r.flight
>                            AND  r.name=?
>                    WHERE  j.job=r.job

Did these last two get mixed up?  My limited experience w/ JOIN ON and WHERE 
would lead me to expect we’re joining on `f.flight=r.flight and r.job = j.job`, 
and having `r.name = ?` as part of the WHERE clause.  I see it’s the same in 
the combined query as well.

>                      AND  f.blessing=?
>                      AND  f.branch=?
>                      AND  j.job=?
>                      AND  r.val=?
>                      AND  (j.status='pass' OR j.status='fail'
>                           OR j.status='truncated'!)
>                      AND  f.started IS NOT NULL
>                      AND  f.started >= ?
>                 ORDER BY f.started DESC
> 
> With bind variables:
>     "test-amd64-i386-xl-pvshim"
>     "guest-start"
> 
> Query B part I:
> 
>            SELECT f.flight AS flight,
>                   s.job AS job,
>                   NULL as started,
>                   NULL as status,
>                   max(s.finished) AS max_finished
>                      FROM steps s JOIN flights f
>                        ON s.flight=f.flight
>                     WHERE s.job=? AND f.blessing=? AND f.branch=?
>                       AND s.finished IS NOT NULL
>                       AND f.started IS NOT NULL
>                       AND f.started >= ?
>                     GROUP BY f.flight, s.job
>                     ORDER BY max_finished DESC
> 
> With bind variables:
>    "test-armhf-armhf-libvirt"
>    'real'
>    "xen-unstable"
>    1594144469
> 
> Query common part II:
> 
>        WITH tsteps AS
>        (
>            SELECT *
>              FROM steps
>             WHERE flight=? AND job=?
>        )
>        , tsteps2 AS
>        (
>            SELECT *
>              FROM tsteps
>             WHERE finished <=
>                     (SELECT finished
>                        FROM tsteps
>                       WHERE tsteps.testid = ?)
>        )
>        SELECT (
>            SELECT max(finished)-min(started)
>              FROM tsteps2
>          ) - (
>            SELECT sum(finished-started)
>              FROM tsteps2
>             WHERE step = 'ts-hosts-allocate'
>          )
>                AS duration

Er, wait — you were doing a separate `duration` query for each row of the 
previous query?  Yeah, that sounds like it could be a lot of round trips. :-)

> 
> With bind variables from previous query, eg:
>     152045
>     "test-armhf-armhf-libvirt"
>     "guest-start.2"
> 
> After:
> 
> Query A (combined):
> 
>            WITH f AS (
>            SELECT f.flight AS flight,
>                   j.job AS job,
>                   f.started AS started,
>                   j.status AS status
>                     FROM flights f
>                     JOIN jobs j USING (flight)
>                     JOIN runvars r
>                             ON  f.flight=r.flight
>                            AND  r.name=?
>                    WHERE  j.job=r.job
>                      AND  f.blessing=?
>                      AND  f.branch=?
>                      AND  j.job=?
>                      AND  r.val=?
>                      AND  (j.status='pass' OR j.status='fail'
>                           OR j.status='truncated'!)
>                      AND  f.started IS NOT NULL
>                      AND  f.started >= ?
>                 ORDER BY f.started DESC
> 
>            )
>            SELECT flight, max_finished, job, started, status,
>            (
>        WITH tsteps AS
>        (
>            SELECT *
>              FROM steps
>             WHERE flight=f.flight AND job=f.job
>        )
>        , tsteps2 AS
>        (
>            SELECT *
>              FROM tsteps
>             WHERE finished <=
>                     (SELECT finished
>                        FROM tsteps
>                       WHERE tsteps.testid = ?)
>        )
>        SELECT (
>            SELECT max(finished)-min(started)
>              FROM tsteps2
>          ) - (
>            SELECT sum(finished-started)
>              FROM tsteps2
>             WHERE step = 'ts-hosts-allocate'
>          )
>                AS duration
> 
>            ) FROM f

I mean, in both queries (A and B), the transform should basically result in the 
same thing happening, as far as I can tell.

I can try to analyze the duration query and see if I can come up with any 
suggestions, but that would be a different patch anyway.

 -George

Reply via email to