Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

vineet Wed, 11 Oct 2017 00:03:37 -0700

Hey Mike,

Thanks again for the detailed explanations!


I went ahead and tested the code snippet you gave me - I'm not sure I 
totally understand when this will change behavior though.

I tried the following code snippet:
(Snippet #1)
for i in xrange(10):
    db.session.add(A(id=i))
db.session.flush()

This calls through to the "do_executemany" handler and only executes 1 
insert statement with multiple VALUES. However, this was already the 
existing behavior right?

When I don't specify a primary key:
(Snippet #2)
for i in xrange(10):
    db.session.add(A()) # don't pre-specify a primary key
db.session.flush()

This does *not* call through to the do_executemany handler, and executes 10 
insert statements.

If I 
understand http://initd.org/psycopg/docs/extras.html#fast-execution-helpers 
correctly, execute_batch does *not* actually combine the insert statements 
into a single statement with multiple VALUES clauses, but instead does 
something like sends multiple INSERT statements to the server at once. We 
would obviously still expect to see performance benefits from the reduced 
network round-trips, but it looks like the above tests do not show any 
improvement there.

Let me know if I'm testing that incorrectly - I'm also happy to test this 
on our actual production use cases to give you some more accurate speed-up 
times.

On Tuesday, October 10, 2017 at 10:36:33 AM UTC-7, Mike Bayer wrote:
>
> I've added 
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4109/support-psycopg2-batch-mode
>  
> which will have a code review up soon. 
>
> On Tue, Oct 10, 2017 at 1:06 PM, Mike Bayer <mik...@zzzcomputing.com 
> <javascript:>> wrote: 
> > On Tue, Oct 10, 2017 at 1:47 AM,  <vin...@benchling.com <javascript:>> 
> wrote: 
> >> Hello! SQLAlchemy has the (really useful) behavior of batching together 
> >> inserts when all the primary keys are defined. It looks like it groups 
> >> similar tables together (while maintaining insert order ) to minimize 
> the 
> >> number of Insert statements sent to the database. 
> >> 
> >> I'm unsure what the expected behavior here is for single-table 
> inheritance 
> >> tables. Here's some example models: 
> >> class Base(db.Model): 
> >>     __tablename__ = 'base' 
> >>     id = db.Column(db.Integer, primary_key=True) 
> >>     base_type = db.Column(db.String) 
> >> 
> >>     __mapper_args__ = { 
> >>         'polymorphic_on': base_type, 
> >>     } 
> >> 
> >>     def __init__(self, id_): 
> >>         self.id = id_ 
> >> 
> >> 
> >> class SubBase1(Base): 
> >>     __mapper_args__ = { 
> >>         'polymorphic_identity': '1', 
> >>     } 
> >> 
> >>     col = db.Column(db.String) 
> >> 
> >> 
> >> class SubBase2(Base): 
> >>     __mapper_args__ = { 
> >>         'polymorphic_identity': '2', 
> >>     } 
> >> 
> >>     col2 = db.Column(db.String) 
> >> 
> >> 
> >> class OtherModel(db.Model): 
> >>     id = db.Column(db.Integer, primary_key=True) 
> >> 
> >>     def __init__(self, id_): 
> >>         self.id = id_ 
> >> 
> >> 
> >> Base, SubBase, and SubBase2 form a tree, while OtherModel is just 
> another 
> >> model. 
> >> 
> >> When creating alternating SubBase1s and OtherModels, SQLAlchemy batches 
> the 
> >> INSERTs such that only two INSERT statements are sent to the DB: 
> >> (Snippet #1) 
> >> for i in xrange(0, 10, 2): 
> >>     db.session.add(SubBase1(i)) 
> >>     db.session.add(OtherModel(i + 1)) 
> >> 
> >> db.session.flush()  # Performs 2 insert statements, each with 5 
> elements in 
> >> the VALUES clause 
> >> 
> >> However, when creating alternating SubBase1s and SubBase2s, it actually 
> >> performs 10 separate insert statements: 
> >> (Snippet #2) 
> >> for i in xrange(0, 10, 2): 
> >>     db.session.add(SubBase1(i)) 
> >>     db.session.add(SubBase2(i + 1)) 
> >> db.session.flush()  # Performs 10 insert statements, each with 1 
> element in 
> >> the VALUES clause 
> >> 
> >> 
> >> It seems like SQLAlchemy maintains the insert_order *across* SubBase1 
> and 
> >> SubBase2, but also isn't able to do a single INSERT statement with 10 
> >> elements in the VALUES clause (or 2 INSERT statements with 5 elements 
> each). 
> >> 
> >> Questions: 
> >> 1) Is Snippet #2 the desired behavior? I read through the unit-of-work 
> >> summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't 
> totally 
> >> sure what the desired behavior was for single-table inheritance models. 
> > 
> > yes because you'll note because sub1 and sub2 have different columns 
> > mapped, the actual columns in each INSERT statement are different. 
> > At the same time it is maintaining INSERT order into the table 
> > overall, so it has no choice but to break things up in this way. 
> > 
> >> 2) If this is desired behavior, do you have any suggestions of how to 
> >> optimize the number of queries here in e.g. a before_flush hook? 
> Obviously 
> >> one way would be to rewrite our application such that inserts of like 
> tables 
> >> happen together, but if there is a way to do this without rewriting the 
> >> application code, that would be even better. 
> > 
> > if your INSERTs are very simplistic then you can still consider using 
> > the bulk_save option, gathering your classes into lists and then doing 
> > your own bulk operation where you ensure all like-rows are grouped 
> > together ahead of time. 
> > 
> > Otherwise, if you really are trying to thread the needle through "I 
> > want full ORM flush convenience" plus "I need to optimize INSERTs into 
> > batches", you have one more option which is to manipulate the 
> > insert_order on the states like this: 
> > 
> > for i in xrange(0, 10, 2): 
> >     s.add(SubBase1(i)) 
> >     s.add(SubBase2(i + 1)) 
> > 
> > from sqlalchemy import inspect 
> > 
> > for idx, obj in enumerate( 
> >     sorted( 
> >         s.new, 
> >         key=lambda obj: (obj.__class__, inspect(obj).insert_order) 
> >     ) 
> > ): 
> >     inspect(obj).insert_order = idx 
> > 
> > 
> > I've not documented "insert_order" as an official thing but I can 
> support that. 
> > 
> > But also if you're really trying to maximize INSERT performance you 
> > need to use the psycopg2 executemany() extensions: 
> > http://initd.org/psycopg/docs/extras.html#fast-execution-helpers. 
> > These were added specifically from an email thread I started with them 
> > in response to user complaints about performance, but SQLAlchemy has 
> > not yet built these in as features. 
> > 
> > I've done no testing at all of this, however if you have the 
> > resources, you can turn on statement logging on your Postgresql 
> > database which will hopefully show INSERT statements being collapsed 
> > into a single statement with multiple VALUES: 
> > 
> > from psycopg2 import extras 
> > from sqlalchemy import event 
> > 
> > e = create_engine("postgresql://scott:tiger@localhost/test", echo=True) 
> > 
> > @event.listens_for(e, "do_executemany") 
> > def do_executemany(cursor, statement, parameters, context): 
> >     context.dialect.supports_sane_multi_rowcount = False 
> >     extras.execute_batch(cursor, statement, parameters) 
> >     return True 
> > 
> > if you get the time to test this, please let me know as I do want to 
> > add support for "execute_batch" directly into the psycopg2 dialect. 
> > thanks! 
> > 
> > 
> > 
> > 
> > 
> > 
> > One option I've thought of is: 
> >>   - Overwrite the insert_order in the before_flush hook to rearrange 
> the 
> >> models in the order I want. (Not sure if this has effects that I'm 
> unaware 
> >> of though) 
> >> 
> >> Thanks! 
> >> 
> >> -- 
> >> SQLAlchemy - 
> >> The Python SQL Toolkit and Object Relational Mapper 
> >> 
> >> http://www.sqlalchemy.org/ 
> >> 
> >> To post example code, please provide an MCVE: Minimal, Complete, and 
> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> >> description. 
> >> --- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "sqlalchemy" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an 
> >> email to sqlalchemy+...@googlegroups.com <javascript:>. 
> >> To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:>. 
> >> Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

Reply via email to