Re: [sqlalchemy] Transforming SQL to SQLAlchemy

D.S. Ljungmark Mon, 27 Jul 2015 09:19:26 -0700

Thanks!
Going through this of how to go backwards from a query to SQLAlchemy
is the part that I was missing. I'm personally not wedded to the
Postgres method, actually the opposite. But it's also been the one
call that didn't stall massively on larger datasets.


The performance characteristics are pretty strange, as the one that
you suggested at first went up way high, but goes down again when you
disable seqscan..

Here are the explain outputs (ignoring my version, as that is even
slower and actually pages out to disk for sorting due to having too
many rows)


And frankly, I do not have the knowledge I need to understand "why"
the difference between seqscan enabled and disabled is happening.
Guessing that my indexes aren't correctly set up, but as said, I lack
the necessary DBA skills for this. I feel as if I'm in over my head.


Now, since this query is turning out to be a whole lot more annoying
to work on than I'd ever imagined, is there a better way of
representing my data store for this?

Duplicating the latest timestamp in a 1:1 relationship, or something else?

//D.S.




caramel=# explain analyze SELECT (c).csr_id, (c).not_before,
(c).not_after FROM (SELECT (SELECT c FROM certificate c WHERE
c.csr_id=csr.id ORDER BY c.not_after DESC LIMIT 1) AS c FROM csr
offset 0) s;

                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on s  (cost=0.00..989627.43 rows=196332 width=32)
(actual time=4.288..6130.111 rows=196332 loops=1)
   ->  Limit  (cost=0.00..987664.11 rows=196332 width=4) (actual
time=4.286..6030.657 rows=196332 loops=1)
         ->  Index Only Scan using csr_rejected_idx4 on csr
(cost=0.00..987664.11 rows=196332 width=4) (actual
time=4.284..5968.457 rows=196332 loops=1)
               Heap Fetches: 0
               SubPlan 1
                 ->  Limit  (cost=0.00..4.96 rows=1 width=2029)
(actual time=0.029..0.029 rows=1 loops=196332)
                       ->  Index Scan Backward using
certificate_csr_id_not_after_idx on certificate c  (cost=0.00..24.80
rows=5 width=2029) (actual time=0.028..0.028 rows=1 loops=196332)
                             Index Cond: (csr_id = csr.id)
 Total runtime: 6163.357 ms



caramel=# explain analyze  select c.csr_id, c.not_before, c.not_after
from certificate as c
join csr on c.csr_id = csr.id
where c.csr_id = (
    select csr_id from certificate where c.csr_id=csr.id
    order by c.not_after desc limit 1
);

         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=16391.49..395380.31 rows=5 width=20) (actual
time=21552.556..21574.528 rows=1 loops=1)
   Hash Cond: (c.csr_id = csr.id)
   Join Filter: ((SubPlan 1) = c.csr_id)
   Rows Removed by Join Filter: 571550
   ->  Seq Scan on certificate c  (cost=0.00..289383.98 rows=555898
width=20) (actual time=1.029..16866.441 rows=571551 loops=1)
   ->  Hash  (cost=13937.34..13937.34 rows=196332 width=4) (actual
time=88.475..88.475 rows=196332 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 6903kB
         ->  Index Only Scan using csr_rejected_idx4 on csr
(cost=0.00..13937.34 rows=196332 width=4) (actual time=0.018..42.298
rows=196332 loops=1)
               Heap Fetches: 0
   SubPlan 1
     ->  Limit  (cost=0.00..0.14 rows=1 width=4) (actual
time=0.006..0.006 rows=1 loops=571551)
           ->  Result  (cost=0.00..75707.40 rows=555898 width=4)
(actual time=0.006..0.006 rows=1 loops=571551)
                 One-Time Filter: (c.csr_id = csr.id)
                 ->  Index Only Scan using certificate_csr_id_idx on
certificate  (cost=0.00..75707.40 rows=555898 width=4) (actual
time=0.005..0.005 rows=1 loops=571551)
                       Heap Fetches: 0
 Total runtime: 21574.638 ms
(16 rows)


caramel=# SET enable_seqscan = OFF;
SET
caramel=# explain analyze  select c.csr_id, c.not_before, c.not_after
from certificate as c
join csr on c.csr_id = csr.id
where c.csr_id = (
    select csr_id from certificate where c.csr_id=csr.id
    order by c.not_after desc limit 1
);

         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1600.52..2226996.44 rows=5 width=20) (actual
time=0.026..12796.020 rows=1 loops=1)
   Merge Cond: (csr.id = c.csr_id)
   Join Filter: ((SubPlan 1) = c.csr_id)
   Rows Removed by Join Filter: 571550
   ->  Index Only Scan using csr_rejected_idx4 on csr
(cost=0.00..13937.34 rows=196332 width=4) (actual time=0.007..422.941
rows=196332 loops=1)
         Heap Fetches: 0
   ->  Index Scan using certificate_csr_id_idx on certificate c
(cost=0.00..2127137.37 rows=555898 width=20) (actual
time=0.005..8224.643 rows=571551 loops=1)
   SubPlan 1
     ->  Limit  (cost=0.00..0.14 rows=1 width=4) (actual
time=0.006..0.006 rows=1 loops=571551)
           ->  Result  (cost=0.00..75707.40 rows=555898 width=4)
(actual time=0.006..0.006 rows=1 loops=571551)
                 One-Time Filter: (c.csr_id = csr.id)
                 ->  Index Only Scan using certificate_csr_id_idx on
certificate  (cost=0.00..75707.40 rows=555898 width=4) (actual
time=0.005..0.005 rows=1 loops=571551)
                       Heap Fetches: 0
 Total runtime: 12798.066 ms
(14 rows)


caramel=# \d certificate
                                     Table "public.certificate"
   Column   |            Type             |
Modifiers
------------+-----------------------------+----------------------------------------------------------
 id         | integer                     | not null default
nextval('certificate_id_seq'::regclass)
 pem        | text                        | not null
 not_before | timestamp without time zone | not null
 not_after  | timestamp without time zone | not null
 csr_id     | integer                     | not null
Indexes:
    "certificate_pkey" PRIMARY KEY, btree (id)
    "certificate_csr_id_idx" btree (csr_id)
    "certificate_csr_id_not_after_idx" btree (csr_id, not_after)
Foreign-key constraints:
    "certificate_csr_id_fkey" FOREIGN KEY (csr_id) REFERENCES csr(id)

caramel=# \d csr
                                  Table "public.csr"
   Column   |         Type          |                    Modifiers
------------+-----------------------+--------------------------------------------------
 id         | integer               | not null default
nextval('csr_id_seq'::regclass)
 sha256sum  | character(64)         | not null
 pem        | text                  | not null
 orgunit    | character varying(64) |
 commonname | character varying(64) |
 rejected   | boolean               |
Indexes:
    "csr_pkey" PRIMARY KEY, btree (id)
    "csr_sha256sum_key" UNIQUE CONSTRAINT, btree (sha256sum)
    "csr_rejected_idx4" btree (id, rejected)
Referenced by:
    TABLE "accesslog" CONSTRAINT "accesslog_csr_id_fkey" FOREIGN KEY
(csr_id) REFERENCES csr(id)
    TABLE "certificate" CONSTRAINT "certificate_csr_id_fkey" FOREIGN
KEY (csr_id) REFERENCES csr(id)
    TABLE "icky" CONSTRAINT "icky_csr_id_fkey" FOREIGN KEY (csr_id)
REFERENCES csr(id)


On Mon, Jul 27, 2015 at 5:05 PM, Mike Bayer <mike...@zzzcomputing.com> wrote:
>
>
> On 7/27/15 3:47 AM, D.S. Ljungmark wrote:
>>
>> But we're not looking for one row in the Certificate table, what it
>> does is, for _every_ CSR, take the Certificate metadata that matches
>> the _oldest_ certificate.
>
>
> whoops, right I looked at it for too long and didn't see us selecting all of
> csr.
>
>>
>> The equivalent using a Join would be the following (the below is the
>> generated code):
>>
>> SELECT certificate.csr_id AS certificate_csr_id,
>> certificate.not_before AS certificate_not_before,
>> certificate.not_after AS certificate_not_after
>> FROM certificate JOIN (SELECT certificate.csr_id AS csr_id,
>> max(certificate.not_after) AS not_after
>> FROM certificate GROUP BY certificate.csr_id) AS latest ON
>> certificate.csr_id = latest.csr_id AND certificate.not_after =
>> latest.not_after
>>
>> And while I'm pretty certain the above query isn't in any way the
>> optimal one,  I'm still a bit out of my depth on the SQL side of
>> things.
>
> I bet if you run EXPLAIN ANALYZE on that query it will perform better (or
> definitely not worse) than the correlated subquery version.
>
> Here's another correlated subquery version that uses standard SQL, just move
> the correlated subq you have to the WHERE clause so that it only need refer
> to one column:
>
> select c.csr_id, c.not_before, c.not_after from certificate as c
> join csr on c.csr_id = csr.id
> where c.csr_id = (
>     select csr_id from certificate where c.csr_id=csr.id
>     order by c.not_after desc limit 1
> )
>
>
> now suppose that no, you really want to use the Postgresql-specific syntax.
> That's fine.   There shoudn't be much need to convert it to a SQL expression
> construct - you can just run session.execute("your query").     If you need
> ORM objects back from it, run
> session.query(Certificate).from_statement("your query").
>
>
> Otherwise, to begin supporting PG's syntaxes, you'd need to use the compiler
> extension to build up constructs that do all of these special Postgresql
> syntaxes (http://docs.sqlalchemy.org/en/rel_1_0/core/compiler.html).
>
> PG's syntaxes in this area are geared towards turning SQL itself into an
> "object oriented" programming language (see
> http://ledgersmbdev.blogspot.com/2012/08/intro-to-postgresql-as-object.html
> for a long series on this philosophy) which is *potentially* awkward in
> conjunction with a tool like SQLAlchemy, which is trying to build up similar
> kinds of patterns on the Python side instead.   This is why SQLAlchemy
> doesn't get into this area of PG syntax; it would require an entirely new
> set of constructs that work smoothly and consistently with PG's way of
> thinking here, and would first off be huge; it really should be a
> third-party extension that attempts to support all, or a large portion, of
> PG's "object-relational" system.  But after all that,  it may very well turn
> out that using another programming language like Python on top of all of
> that starts to look redundant (or maybe it would be amazing, I don't know.
> It's just way outside the scope of "SQL mapping" because PG is really trying
> to invent their own language in this area). If your application is tailored
> towards this architecture that might suggest that it prefers Postgresql's
> syntaxes directly, and you only need an object marshalling layer in your
> app.
>
> SQLAlchemy has always stressed that literal SQL is always an option.
> Historically it has meant that you lose flexibility with ORM features like
> eager loads and such, so I've added
> https://bitbucket.org/zzzeek/sqlalchemy/issues/3501/ for 1.1, which takes
> advantage of some improved result-row-mapping features in 1.0 so that we can
> finally map any raw SQL of any kind directly to full object graphs without
> losing any flexibility.
>
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>>
>> On Sun, Jul 26, 2015 at 8:33 PM, Mike Bayer <mike...@zzzcomputing.com>
>> wrote:
>>>
>>>
>>> On 7/26/15 12:56 PM, Dennis Ljungmark wrote:
>>>
>>> Hello, all.
>>>
>>>     I am trying to reshape the (somewhat tricky) SQL query below into
>>> something SQLAlchemy can generate, and I just hit a wall:
>>>
>>> This is the query:
>>> SELECT (c).csr_id, (c).not_before, (c).not_after FROM (SELECT (SELECT c
>>> FROM
>>> certificate c WHERE c.csr_id=csr.id ORDER BY c.not_after DESC LIMIT 1) AS
>>> c
>>> FROM csr offset 0) s;
>>>
>>> And, I just can't seem to get SQLAlchemy to emit a correlated scalar
>>> subquery like this.
>>>
>>> correlated scalar subqueries are illustrated in the Core tutorial at
>>> http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#scalar-selects.
>>> I'm not sure why inklesspen's example isn't making use of as_scalar() or
>>> stmt.label(), which would be required.     Query supports the same use if
>>> you are working with ORM queries.
>>>
>>> However, the statement "SELECT c FROM certificate c" and the "(c).csr_id"
>>> syntax appears to be using a very unusual postgresql-specific syntax
>>> which
>>> is rolling the whole inner row into a single-column tuple and then
>>> referencing it that way, which is odd because the row itself is already a
>>> tuple, not sure what forcing it into a tuple that contains exactly one
>>> tuple
>>> accomplishes here.
>>>
>>> While we can certainly use tuple_() and probably create some extra custom
>>> constructs around it to support this syntax, I see nothing in this
>>> specific
>>> statement to justify the use of these syntaxes, nor the need to produce
>>> correlated scalar subqueries when we just are looking for one row from
>>> the
>>> "certificate" table.  I don't at the moment see why you can't use this
>>> statement, which is easier to read, platform agnostic, and way more
>>> efficient:
>>>
>>> SELECT c.csr_id, c.not_before, c.not_after FROM certificate AS c JOIN csr
>>> ON
>>> csr.id = c.csr_id ORDER BY c.not_after DESC LIMIT 1
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> There ought to be something, but for the life of me I can't seem to make
>>> it
>>> happen.
>>>
>>> InklessPen tried to help me on IRC, and we got something, but it's still
>>> not
>>> quite the same,
>>>   https://gist.github.com/inklesspen/49e69e1f33f3852d348a
>>>
>>> A reduced model is below,
>>>
>>>
>>> class CSR(Base):
>>>      certificates = _orm.relationship("Certificate", backref="csr")
>>>
>>> class Certificate(Base):
>>>      not_before = _sa.Column(_sa.DateTime, nullable=False)
>>>      not_after = _sa.Column(_sa.DateTime, nullable=False)
>>>      csr_id = _fkcolumn(CSR.id, nullable=False)
>>>
>>>
>>> Regards,
>>>    D.S.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "sqlalchemy" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> sqlalchemy+unsubscr...@googlegroups.com.
>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Transforming SQL to SQLAlchemy

Reply via email to