Hi Vineet, Mike, @Vineet, thank you for the interesting blog post on bulk insert with SQLAlchemy ORM: https://benchling.engineering/sqlalchemy-batch-inserts-a-module-for-when-youre-inserting-thousands-of-rows-and-it-s-slow-16ece0ef5bf7
A few questions: 1. Do we need to get the incremented IDs from Postgresql itself, or can we just fetch the current max ID on a table and increment IDs in Python without querying the DB for the incremented values? 2. I was intrigued by the following phrase: > * P.S. execute_values in psycopg2 v2.8 supports getting returned values back, so it’s possible that SQLAlchemy may support batch inserting these models (with an auto-incrementing primary key) in the future. @Mike @Vineet, do you know if this is the case, ie if bulk insert now works out of the box (without pre-assigning incremented IDs)? 3. Does this imply any change in case of bulk insert of multiple models with foreign keys referring to each other? This answer <https://stackoverflow.com/a/36387887/11750716> seems to suggest pre-assigning IDs for it to work. On Friday, February 21, 2020 at 3:49:54 PM UTC+1 Mike Bayer wrote: > Hi Vineet - > > glad that worked! I'll have to find some time to recall what we worked > out here and how it came out for you, I wonder where on the site this kind > of thing could be mentioned. we have 3rd party dialects listed out in > the docs but not yet a place for extensions. > > On Wed, Feb 19, 2020, at 9:28 PM, Vineet Gopal wrote: > > Hi Mike, > > Thanks for all of your help getting this working again. We've used this > solution in production for two years now, and it's helped our performance > significantly. > > We just open-sourced the solution that we built so others can use it, and > are also writing a blog post to cover some of the details. I'm attaching a > copy of the blog post here. Obviously not expected, but if you are > interested in taking a look, we are happy to incorporate any comments that > you may have before publishing. > > Here's a link to the repo as well: > https://github.com/benchling/sqlalchemy_batch_inserts > > Best, > Vineet > > On Mon, Oct 9, 2017 at 10:27 PM <vin...@benchling.com> wrote: > > if you're using Postgresql, there's a vastly easier technique to use > which is just to pre-fetch from the sequence: > identities = [ > val for val, in session.execute( > "select nextval('mytable_seq') from " > "generate_series(1,%s)" % len(my_objects)) > ) > ] > for ident, obj in zip(identities, my_objects): > obj.pk = ident > > Wow, that's a great idea! I got it working for most of our models. I have > some questions about how inserts for joined-table inheritance tables are > batched together, but I'll ask them in a separate post since they're > somewhat unrelated to this. > > So the complexity of adding multi-values insert with sequences would > benefit an extremely narrow set of use cases, would be very labor > intensive to implement and maintain, and is unnecessary for the single > target platform in which this case would work. > > That makes sense, thanks for the explanation! > > > > On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote: > > On Mon, Oct 9, 2017 at 4:15 AM, <vin...@benchling.com> wrote: > > Hello! I've spent some time looking at SQLAlchemy's ability to batch > > inserts, and have a few questions about bulk_save_objects (and flushing > in > > general). > > > > Two statements that I think are true: > > > > Right now, bulk_save_objects does not fetch primary keys for inserted > rows > > (unless return_defaults is passed in, which mitigates the performance > gains > > by performing one insert statement per model). > > the point of bulk save is that the objects passed to it are considered > as throwaway after the operation, to avoid the overhead of maintaining > the objects' persistence state in memory. Using it with > return_defaults is nearly pointless as this will blow away the > efficiency gains you might get from the bulk operation, and is there > mostly to allow the joined-table inheritance use case to be usable. > > > > When running db.session.flush(), it looks like SQLAlchemy can batch > model > > creation into a single multi-value insert statement *only if all of the > > primary keys are already defined on the model*. (Verified by looking at > > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) > > > > > > Questions: > > > > Would it be possible for bulk_save_objects to fetch primary keys while > still > > batching INSERTs into a single query? > > What do you mean "single query", do you mean, executemany()? the > answer is no because DBAPI drivers don't return result sets with > executemany(). Do you mean, a single INSERT..VALUES with all the > parameters in one statement ? The answer is no because SQLAlchemy > doesn't include multi-values as a transparent option; the DBAPI > drivers instead make use of the multi-values syntax within their > executemany() implementations, where again, they don't return result > sets. > > Right now, only the MySQL drivers do this by default, the psycopg2 > driver does it with a recently added option that SQLAlchemy does not > directly support, however you can set up via a connection event. For > psycopg2 it can vastly speed up inserts as psycopg2's normal > executemany() implementation has some performance issues. > > > (This > > would help with e.g. inserting a bunch of rows into a table with an > > auto-incrementing primary key). > > if you're using Postgresql, there's a vastly easier technique to use > which is just to pre-fetch from the sequence: > > identities = [ > val for val, in session.execute( > "select nextval('mytable_seq') from " > "generate_series(1,%s)" % len(my_objects)) > ) > ] > for ident, obj in zip(identities, my_objects): > obj.pk = ident > > > Now you don't need to do RETURNING or anything and the inserts can be > at their top efficiency. > > > > > > > At least in Postgres (haven't verified for other databases), it looks > like > > one can use RETURNING for inserts with multi-value VALUES clauses. At > the > > surface, it seems like one could extend the current behavior for a > single > > row INSERT: > > > > INSERT INTO table ... VALUES (1) RETURNING id > > > > to multi-value INSERTS: > > > > INSERT INTO table ... VALUES (1), (2), ... RETURNING id > > > > and get all the benefits of the ORM while still batching inserts. > > > This would be an enormous undertaking to implement, test, and release. > It would have to exist as an entirely additional series of codepaths > within persistence.py as multi-valued INSERT is not available on most > databases as well as on earlier versions of the databases that do > support it. For all of that complexity, the approach would work on > exactly: Postgresql only - SQLite and MySQL don't support RETURNING, > Oracle, SQL Server and others don't support INSERT..VALUES with > multiple sets. > > If you are using Postgresql, you can instead pre-fetch the sequence up > front. This can even be integrated into a before_flush() event. > > So the complexity of adding multi-values insert with sequences would > benefit an extremely narrow set of use cases, would be very labor > intensive to implement and maintain, and is unnecessary for the single > target platform in which this case would work. > > > > > > I'm sure this is something that was thought about already, so I'd mostly > > love to hear any thoughts about what makes this hard. We have a lot of > > complex relationships and joined-table inheritance, so working with > > bulk_save_objects has been a bit challenging. From the comments in > > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like > > SQLAlchemy is already aware that RETURNING for insert statements with > > multi-values VALUES clauses is supported, so it's possible there is a > reason > > that this is hard or can't be done. > > > > This is also something that, if it just a “missing feature”, I would > love to > > learn more and see if I could contribute. > > > > -- > > SQLAlchemy - > > The Python SQL Toolkit and Object Relational Mapper > > > > http://www.sqlalchemy.org/ > > > > To post example code, please provide an MCVE: Minimal, Complete, and > > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > > description. > > --- > > You received this message because you are subscribed to the Google > Groups > > "sqlalchemy" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to sqlalchemy+...@googlegroups.com. > > To post to this group, send email to sqlal...@googlegroups.com. > > Visit this group at https://groups.google.com/group/sqlalchemy. > > For more options, visit https://groups.google.com/d/optout. > > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to a topic in the > Google Groups "sqlalchemy" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/sqlalchemy/GyAZTThJi2I/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > sqlalchemy+...@googlegroups.com. > To post to this group, send email to sqlal...@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com > > <https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > > *Attachments:* > > - BatchInsertsBlogPost.pdf > > > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/8a216e1d-b421-48df-9739-9f261a745571n%40googlegroups.com.