Re: [sqlalchemy] Transforming SQL to SQLAlchemy

D.S. Ljungmark Tue, 28 Jul 2015 02:14:14 -0700

On Mon, Jul 27, 2015 at 7:36 PM, Mike Bayer <mike...@zzzcomputing.com> wrote:
>
>
> On 7/27/15 12:19 PM, D.S. Ljungmark wrote:
>>
>> Thanks!
>> Going through this of how to go backwards from a query to SQLAlchemy
>> is the part that I was missing. I'm personally not wedded to the
>> Postgres method, actually the opposite. But it's also been the one
>> call that didn't stall massively on larger datasets.
>>
>> The performance characteristics are pretty strange, as the one that
>> you suggested at first went up way high, but goes down again when you
>> disable seqscan..
>>
>> Here are the explain outputs (ignoring my version, as that is even
>> slower and actually pages out to disk for sorting due to having too
>> many rows)
>>
>>
>> And frankly, I do not have the knowledge I need to understand "why"
>> the difference between seqscan enabled and disabled is happening.
>> Guessing that my indexes aren't correctly set up, but as said, I lack
>> the necessary DBA skills for this. I feel as if I'm in over my head.
>>
>>
>> Now, since this query is turning out to be a whole lot more annoying
>> to work on than I'd ever imagined, is there a better way of
>> representing my data store for this?
>>
>> Duplicating the latest timestamp in a 1:1 relationship, or something else?
>
>
> i think your schema is fine, profiling is a whole can of beans and I'd
> recommend googling PG documentation and blogs on that.  However, correlated
> subqueries are always less than ideal - the query you have that uses just
> JOIN and GROUP BY should perform the best.  The use case of "select all the
> X with a max/min/count/avg/etc. of related Y" is very common and the join to
> subquery w/ GROUP BY approach is the way to do it.   Also, SQL is
> "declarative" but is a leaky abstraction; you always need to be savvy about
> things like indexes, table scans, etc.  That's just how it is.



Thanks, it didn't feel as if it should be as hard as I'm finding it.
Alas, I'll have to look at this in a few weeks again, work-related
things.

Thanks again for the help.


//D.S.


>>
>> //D.S.
>>
>>
>>
>>
>> caramel=# explain analyze SELECT (c).csr_id, (c).not_before,
>> (c).not_after FROM (SELECT (SELECT c FROM certificate c WHERE
>> c.csr_id=csr.id ORDER BY c.not_after DESC LIMIT 1) AS c FROM csr
>> offset 0) s;
>>
>>                    QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>   Subquery Scan on s  (cost=0.00..989627.43 rows=196332 width=32)
>> (actual time=4.288..6130.111 rows=196332 loops=1)
>>     ->  Limit  (cost=0.00..987664.11 rows=196332 width=4) (actual
>> time=4.286..6030.657 rows=196332 loops=1)
>>           ->  Index Only Scan using csr_rejected_idx4 on csr
>> (cost=0.00..987664.11 rows=196332 width=4) (actual
>> time=4.284..5968.457 rows=196332 loops=1)
>>                 Heap Fetches: 0
>>                 SubPlan 1
>>                   ->  Limit  (cost=0.00..4.96 rows=1 width=2029)
>> (actual time=0.029..0.029 rows=1 loops=196332)
>>                         ->  Index Scan Backward using
>> certificate_csr_id_not_after_idx on certificate c  (cost=0.00..24.80
>> rows=5 width=2029) (actual time=0.028..0.028 rows=1 loops=196332)
>>                               Index Cond: (csr_id = csr.id)
>>   Total runtime: 6163.357 ms
>>
>>
>>
>> caramel=# explain analyze  select c.csr_id, c.not_before, c.not_after
>> from certificate as c
>> join csr on c.csr_id = csr.id
>> where c.csr_id = (
>>      select csr_id from certificate where c.csr_id=csr.id
>>      order by c.not_after desc limit 1
>> );
>>
>>           QUERY PLAN
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>   Hash Join  (cost=16391.49..395380.31 rows=5 width=20) (actual
>> time=21552.556..21574.528 rows=1 loops=1)
>>     Hash Cond: (c.csr_id = csr.id)
>>     Join Filter: ((SubPlan 1) = c.csr_id)
>>     Rows Removed by Join Filter: 571550
>>     ->  Seq Scan on certificate c  (cost=0.00..289383.98 rows=555898
>> width=20) (actual time=1.029..16866.441 rows=571551 loops=1)
>>     ->  Hash  (cost=13937.34..13937.34 rows=196332 width=4) (actual
>> time=88.475..88.475 rows=196332 loops=1)
>>           Buckets: 32768  Batches: 1  Memory Usage: 6903kB
>>           ->  Index Only Scan using csr_rejected_idx4 on csr
>> (cost=0.00..13937.34 rows=196332 width=4) (actual time=0.018..42.298
>> rows=196332 loops=1)
>>                 Heap Fetches: 0
>>     SubPlan 1
>>       ->  Limit  (cost=0.00..0.14 rows=1 width=4) (actual
>> time=0.006..0.006 rows=1 loops=571551)
>>             ->  Result  (cost=0.00..75707.40 rows=555898 width=4)
>> (actual time=0.006..0.006 rows=1 loops=571551)
>>                   One-Time Filter: (c.csr_id = csr.id)
>>                   ->  Index Only Scan using certificate_csr_id_idx on
>> certificate  (cost=0.00..75707.40 rows=555898 width=4) (actual
>> time=0.005..0.005 rows=1 loops=571551)
>>                         Heap Fetches: 0
>>   Total runtime: 21574.638 ms
>> (16 rows)
>>
>>
>> caramel=# SET enable_seqscan = OFF;
>> SET
>> caramel=# explain analyze  select c.csr_id, c.not_before, c.not_after
>> from certificate as c
>> join csr on c.csr_id = csr.id
>> where c.csr_id = (
>>      select csr_id from certificate where c.csr_id=csr.id
>>      order by c.not_after desc limit 1
>> );
>>
>>           QUERY PLAN
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>   Merge Join  (cost=1600.52..2226996.44 rows=5 width=20) (actual
>> time=0.026..12796.020 rows=1 loops=1)
>>     Merge Cond: (csr.id = c.csr_id)
>>     Join Filter: ((SubPlan 1) = c.csr_id)
>>     Rows Removed by Join Filter: 571550
>>     ->  Index Only Scan using csr_rejected_idx4 on csr
>> (cost=0.00..13937.34 rows=196332 width=4) (actual time=0.007..422.941
>> rows=196332 loops=1)
>>           Heap Fetches: 0
>>     ->  Index Scan using certificate_csr_id_idx on certificate c
>> (cost=0.00..2127137.37 rows=555898 width=20) (actual
>> time=0.005..8224.643 rows=571551 loops=1)
>>     SubPlan 1
>>       ->  Limit  (cost=0.00..0.14 rows=1 width=4) (actual
>> time=0.006..0.006 rows=1 loops=571551)
>>             ->  Result  (cost=0.00..75707.40 rows=555898 width=4)
>> (actual time=0.006..0.006 rows=1 loops=571551)
>>                   One-Time Filter: (c.csr_id = csr.id)
>>
>>                   ->  Index Only Scan using certificate_csr_id_idx on
>> certificate  (cost=0.00..75707.40 rows=555898 width=4) (actual
>> time=0.005..0.005 rows=1 loops=571551)
>>                         Heap Fetches: 0
>>   Total runtime: 12798.066 ms
>> (14 rows)
>>
>>
>> caramel=# \d certificate
>>                                       Table "public.certificate"
>>     Column   |            Type             |
>> Modifiers
>>
>> ------------+-----------------------------+----------------------------------------------------------
>>   id         | integer                     | not null default
>> nextval('certificate_id_seq'::regclass)
>>   pem        | text                        | not null
>>   not_before | timestamp without time zone | not null
>>   not_after  | timestamp without time zone | not null
>>   csr_id     | integer                     | not null
>> Indexes:
>>      "certificate_pkey" PRIMARY KEY, btree (id)
>>      "certificate_csr_id_idx" btree (csr_id)
>>      "certificate_csr_id_not_after_idx" btree (csr_id, not_after)
>> Foreign-key constraints:
>>      "certificate_csr_id_fkey" FOREIGN KEY (csr_id) REFERENCES csr(id)
>>
>> caramel=# \d csr
>>                                    Table "public.csr"
>>     Column   |         Type          |                    Modifiers
>>
>> ------------+-----------------------+--------------------------------------------------
>>   id         | integer               | not null default
>> nextval('csr_id_seq'::regclass)
>>   sha256sum  | character(64)         | not null
>>   pem        | text                  | not null
>>   orgunit    | character varying(64) |
>>   commonname | character varying(64) |
>>   rejected   | boolean               |
>> Indexes:
>>      "csr_pkey" PRIMARY KEY, btree (id)
>>      "csr_sha256sum_key" UNIQUE CONSTRAINT, btree (sha256sum)
>>      "csr_rejected_idx4" btree (id, rejected)
>> Referenced by:
>>      TABLE "accesslog" CONSTRAINT "accesslog_csr_id_fkey" FOREIGN KEY
>> (csr_id) REFERENCES csr(id)
>>      TABLE "certificate" CONSTRAINT "certificate_csr_id_fkey" FOREIGN
>> KEY (csr_id) REFERENCES csr(id)
>>      TABLE "icky" CONSTRAINT "icky_csr_id_fkey" FOREIGN KEY (csr_id)
>> REFERENCES csr(id)
>>
>>
>> On Mon, Jul 27, 2015 at 5:05 PM, Mike Bayer <mike...@zzzcomputing.com>
>> wrote:
>>>
>>>
>>> On 7/27/15 3:47 AM, D.S. Ljungmark wrote:
>>>>
>>>> But we're not looking for one row in the Certificate table, what it
>>>> does is, for _every_ CSR, take the Certificate metadata that matches
>>>> the _oldest_ certificate.
>>>
>>>
>>> whoops, right I looked at it for too long and didn't see us selecting all
>>> of
>>> csr.
>>>
>>>> The equivalent using a Join would be the following (the below is the
>>>> generated code):
>>>>
>>>> SELECT certificate.csr_id AS certificate_csr_id,
>>>> certificate.not_before AS certificate_not_before,
>>>> certificate.not_after AS certificate_not_after
>>>> FROM certificate JOIN (SELECT certificate.csr_id AS csr_id,
>>>> max(certificate.not_after) AS not_after
>>>> FROM certificate GROUP BY certificate.csr_id) AS latest ON
>>>> certificate.csr_id = latest.csr_id AND certificate.not_after =
>>>> latest.not_after
>>>>
>>>> And while I'm pretty certain the above query isn't in any way the
>>>> optimal one,  I'm still a bit out of my depth on the SQL side of
>>>> things.
>>>
>>> I bet if you run EXPLAIN ANALYZE on that query it will perform better (or
>>> definitely not worse) than the correlated subquery version.
>>>
>>> Here's another correlated subquery version that uses standard SQL, just
>>> move
>>> the correlated subq you have to the WHERE clause so that it only need
>>> refer
>>> to one column:
>>>
>>> select c.csr_id, c.not_before, c.not_after from certificate as c
>>> join csr on c.csr_id = csr.id
>>> where c.csr_id = (
>>>      select csr_id from certificate where c.csr_id=csr.id
>>>      order by c.not_after desc limit 1
>>> )
>>>
>>>
>>> now suppose that no, you really want to use the Postgresql-specific
>>> syntax.
>>> That's fine.   There shoudn't be much need to convert it to a SQL
>>> expression
>>> construct - you can just run session.execute("your query").     If you
>>> need
>>> ORM objects back from it, run
>>> session.query(Certificate).from_statement("your query").
>>>
>>>
>>> Otherwise, to begin supporting PG's syntaxes, you'd need to use the
>>> compiler
>>> extension to build up constructs that do all of these special Postgresql
>>> syntaxes (http://docs.sqlalchemy.org/en/rel_1_0/core/compiler.html).
>>>
>>> PG's syntaxes in this area are geared towards turning SQL itself into an
>>> "object oriented" programming language (see
>>>
>>> http://ledgersmbdev.blogspot.com/2012/08/intro-to-postgresql-as-object.html
>>> for a long series on this philosophy) which is *potentially* awkward in
>>> conjunction with a tool like SQLAlchemy, which is trying to build up
>>> similar
>>> kinds of patterns on the Python side instead.   This is why SQLAlchemy
>>> doesn't get into this area of PG syntax; it would require an entirely new
>>> set of constructs that work smoothly and consistently with PG's way of
>>> thinking here, and would first off be huge; it really should be a
>>> third-party extension that attempts to support all, or a large portion,
>>> of
>>> PG's "object-relational" system.  But after all that,  it may very well
>>> turn
>>> out that using another programming language like Python on top of all of
>>> that starts to look redundant (or maybe it would be amazing, I don't
>>> know.
>>> It's just way outside the scope of "SQL mapping" because PG is really
>>> trying
>>> to invent their own language in this area). If your application is
>>> tailored
>>> towards this architecture that might suggest that it prefers Postgresql's
>>> syntaxes directly, and you only need an object marshalling layer in your
>>> app.
>>>
>>> SQLAlchemy has always stressed that literal SQL is always an option.
>>> Historically it has meant that you lose flexibility with ORM features
>>> like
>>> eager loads and such, so I've added
>>> https://bitbucket.org/zzzeek/sqlalchemy/issues/3501/ for 1.1, which takes
>>> advantage of some improved result-row-mapping features in 1.0 so that we
>>> can
>>> finally map any raw SQL of any kind directly to full object graphs
>>> without
>>> losing any flexibility.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>> On Sun, Jul 26, 2015 at 8:33 PM, Mike Bayer <mike...@zzzcomputing.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 7/26/15 12:56 PM, Dennis Ljungmark wrote:
>>>>>
>>>>> Hello, all.
>>>>>
>>>>>      I am trying to reshape the (somewhat tricky) SQL query below into
>>>>> something SQLAlchemy can generate, and I just hit a wall:
>>>>>
>>>>> This is the query:
>>>>> SELECT (c).csr_id, (c).not_before, (c).not_after FROM (SELECT (SELECT c
>>>>> FROM
>>>>> certificate c WHERE c.csr_id=csr.id ORDER BY c.not_after DESC LIMIT 1)
>>>>> AS
>>>>> c
>>>>> FROM csr offset 0) s;
>>>>>
>>>>> And, I just can't seem to get SQLAlchemy to emit a correlated scalar
>>>>> subquery like this.
>>>>>
>>>>> correlated scalar subqueries are illustrated in the Core tutorial at
>>>>>
>>>>> http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#scalar-selects.
>>>>> I'm not sure why inklesspen's example isn't making use of as_scalar()
>>>>> or
>>>>> stmt.label(), which would be required.     Query supports the same use
>>>>> if
>>>>> you are working with ORM queries.
>>>>>
>>>>> However, the statement "SELECT c FROM certificate c" and the
>>>>> "(c).csr_id"
>>>>> syntax appears to be using a very unusual postgresql-specific syntax
>>>>> which
>>>>> is rolling the whole inner row into a single-column tuple and then
>>>>> referencing it that way, which is odd because the row itself is already
>>>>> a
>>>>> tuple, not sure what forcing it into a tuple that contains exactly one
>>>>> tuple
>>>>> accomplishes here.
>>>>>
>>>>> While we can certainly use tuple_() and probably create some extra
>>>>> custom
>>>>> constructs around it to support this syntax, I see nothing in this
>>>>> specific
>>>>> statement to justify the use of these syntaxes, nor the need to produce
>>>>> correlated scalar subqueries when we just are looking for one row from
>>>>> the
>>>>> "certificate" table.  I don't at the moment see why you can't use this
>>>>> statement, which is easier to read, platform agnostic, and way more
>>>>> efficient:
>>>>>
>>>>> SELECT c.csr_id, c.not_before, c.not_after FROM certificate AS c JOIN
>>>>> csr
>>>>> ON
>>>>> csr.id = c.csr_id ORDER BY c.not_after DESC LIMIT 1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> There ought to be something, but for the life of me I can't seem to
>>>>> make
>>>>> it
>>>>> happen.
>>>>>
>>>>> InklessPen tried to help me on IRC, and we got something, but it's
>>>>> still
>>>>> not
>>>>> quite the same,
>>>>>    https://gist.github.com/inklesspen/49e69e1f33f3852d348a
>>>>>
>>>>> A reduced model is below,
>>>>>
>>>>>
>>>>> class CSR(Base):
>>>>>       certificates = _orm.relationship("Certificate", backref="csr")
>>>>>
>>>>> class Certificate(Base):
>>>>>       not_before = _sa.Column(_sa.DateTime, nullable=False)
>>>>>       not_after = _sa.Column(_sa.DateTime, nullable=False)
>>>>>       csr_id = _fkcolumn(CSR.id, nullable=False)
>>>>>
>>>>>
>>>>> Regards,
>>>>>     D.S.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups
>>>>> "sqlalchemy" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an
>>>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "sqlalchemy" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> sqlalchemy+unsubscr...@googlegroups.com.
>>>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "sqlalchemy" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> sqlalchemy+unsubscr...@googlegroups.com.
>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/ooNUhATaTEU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Transforming SQL to SQLAlchemy

Reply via email to