On Fri, Feb 12, 2021, at 2:06 PM, Cristian Bulgaru wrote: > Hi Vineet, Mike, > > @Vineet, thank you for the interesting blog post on bulk insert with > SQLAlchemy ORM: > https://benchling.engineering/sqlalchemy-batch-inserts-a-module-for-when-youre-inserting-thousands-of-rows-and-it-s-slow-16ece0ef5bf7 > > A few questions: > > 1. Do we need to get the incremented IDs from Postgresql itself, or can we > just fetch the current max ID on a table and increment IDs in Python without > querying the DB for the incremented values? > > 2. I was intrigued by the following phrase: > > > * P.S. execute_values in psycopg2 v2.8 supports getting returned values > > back, so it’s possible that SQLAlchemy may support batch inserting these > > models (with an auto-incrementing primary key) in the future.
SQLAlchemy now supports this in version 1.4 including that's it's used by the ORM: https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#psycopg2-dialect-features-execute-values-with-returning-for-insert-statements-by-default basically it's now the most efficient INSERT + autoincrement PK we can get with the psycopg2 driver. > > @Mike @Vineet, do you know if this is the case, ie if bulk insert now works > out of the box (without pre-assigning incremented IDs)? > > 3. Does this imply any change in case of bulk insert of multiple models with > foreign keys referring to each other? This answer > <https://stackoverflow.com/a/36387887/11750716> seems to suggest > pre-assigning IDs for it to work. > On Friday, February 21, 2020 at 3:49:54 PM UTC+1 Mike Bayer wrote: >> __ >> Hi Vineet - >> >> glad that worked! I'll have to find some time to recall what we worked out >> here and how it came out for you, I wonder where on the site this kind of >> thing could be mentioned. we have 3rd party dialects listed out in the >> docs but not yet a place for extensions. >> >> On Wed, Feb 19, 2020, at 9:28 PM, Vineet Gopal wrote: >>> Hi Mike, >>> >>> Thanks for all of your help getting this working again. We've used this >>> solution in production for two years now, and it's helped our performance >>> significantly. >>> >>> We just open-sourced the solution that we built so others can use it, and >>> are also writing a blog post to cover some of the details. I'm attaching a >>> copy of the blog post here. Obviously not expected, but if you are >>> interested in taking a look, we are happy to incorporate any comments that >>> you may have before publishing. >>> >>> Here's a link to the repo as well: >>> https://github.com/benchling/sqlalchemy_batch_inserts >>> >>> Best, >>> Vineet >>> >>> On Mon, Oct 9, 2017 at 10:27 PM <vin...@benchling.com> wrote: >>>>> if you're using Postgresql, there's a vastly easier technique to use >>>>> which is just to pre-fetch from the sequence: >>>>> identities = [ >>>>> val for val, in session.execute( >>>>> "select nextval('mytable_seq') from " >>>>> "generate_series(1,%s)" % len(my_objects)) >>>>> ) >>>>> ] >>>>> for ident, obj in zip(identities, my_objects): >>>>> obj.pk = ident >>>> Wow, that's a great idea! I got it working for most of our models. I have >>>> some questions about how inserts for joined-table inheritance tables are >>>> batched together, but I'll ask them in a separate post since they're >>>> somewhat unrelated to this. >>>> >>>>> So the complexity of adding multi-values insert with sequences would >>>>> benefit an extremely narrow set of use cases, would be very labor >>>>> intensive to implement and maintain, and is unnecessary for the single >>>>> target platform in which this case would work. >>>> That makes sense, thanks for the explanation! >>>> >>>> >>>> >>>> On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote: >>>>> On Mon, Oct 9, 2017 at 4:15 AM, <vin...@benchling.com> wrote: >>>>> > Hello! I've spent some time looking at SQLAlchemy's ability to batch >>>>> > inserts, and have a few questions about bulk_save_objects (and flushing >>>>> > in >>>>> > general). >>>>> > >>>>> > Two statements that I think are true: >>>>> > >>>>> > Right now, bulk_save_objects does not fetch primary keys for inserted >>>>> > rows >>>>> > (unless return_defaults is passed in, which mitigates the performance >>>>> > gains >>>>> > by performing one insert statement per model). >>>>> >>>>> the point of bulk save is that the objects passed to it are considered >>>>> as throwaway after the operation, to avoid the overhead of maintaining >>>>> the objects' persistence state in memory. Using it with >>>>> return_defaults is nearly pointless as this will blow away the >>>>> efficiency gains you might get from the bulk operation, and is there >>>>> mostly to allow the joined-table inheritance use case to be usable. >>>>> >>>>> >>>>> > When running db.session.flush(), it looks like SQLAlchemy can batch >>>>> > model >>>>> > creation into a single multi-value insert statement *only if all of the >>>>> > primary keys are already defined on the model*. (Verified by looking at >>>>> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) >>>>> > >>>>> > >>>>> > Questions: >>>>> > >>>>> > Would it be possible for bulk_save_objects to fetch primary keys while >>>>> > still >>>>> > batching INSERTs into a single query? >>>>> >>>>> What do you mean "single query", do you mean, executemany()? the >>>>> answer is no because DBAPI drivers don't return result sets with >>>>> executemany(). Do you mean, a single INSERT..VALUES with all the >>>>> parameters in one statement ? The answer is no because SQLAlchemy >>>>> doesn't include multi-values as a transparent option; the DBAPI >>>>> drivers instead make use of the multi-values syntax within their >>>>> executemany() implementations, where again, they don't return result >>>>> sets. >>>>> >>>>> Right now, only the MySQL drivers do this by default, the psycopg2 >>>>> driver does it with a recently added option that SQLAlchemy does not >>>>> directly support, however you can set up via a connection event. For >>>>> psycopg2 it can vastly speed up inserts as psycopg2's normal >>>>> executemany() implementation has some performance issues. >>>>> >>>>> >>>>> (This >>>>> > would help with e.g. inserting a bunch of rows into a table with an >>>>> > auto-incrementing primary key). >>>>> >>>>> if you're using Postgresql, there's a vastly easier technique to use >>>>> which is just to pre-fetch from the sequence: >>>>> >>>>> identities = [ >>>>> val for val, in session.execute( >>>>> "select nextval('mytable_seq') from " >>>>> "generate_series(1,%s)" % len(my_objects)) >>>>> ) >>>>> ] >>>>> for ident, obj in zip(identities, my_objects): >>>>> obj.pk = ident >>>>> >>>>> >>>>> Now you don't need to do RETURNING or anything and the inserts can be >>>>> at their top efficiency. >>>>> >>>>> >>>>> >>>>> > >>>>> > At least in Postgres (haven't verified for other databases), it looks >>>>> > like >>>>> > one can use RETURNING for inserts with multi-value VALUES clauses. At >>>>> > the >>>>> > surface, it seems like one could extend the current behavior for a >>>>> > single >>>>> > row INSERT: >>>>> > >>>>> > INSERT INTO table ... VALUES (1) RETURNING id >>>>> > >>>>> > to multi-value INSERTS: >>>>> > >>>>> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id >>>>> > >>>>> > and get all the benefits of the ORM while still batching inserts. >>>>> >>>>> >>>>> This would be an enormous undertaking to implement, test, and release. >>>>> It would have to exist as an entirely additional series of codepaths >>>>> within persistence.py as multi-valued INSERT is not available on most >>>>> databases as well as on earlier versions of the databases that do >>>>> support it. For all of that complexity, the approach would work on >>>>> exactly: Postgresql only - SQLite and MySQL don't support RETURNING, >>>>> Oracle, SQL Server and others don't support INSERT..VALUES with >>>>> multiple sets. >>>>> >>>>> If you are using Postgresql, you can instead pre-fetch the sequence up >>>>> front. This can even be integrated into a before_flush() event. >>>>> >>>>> So the complexity of adding multi-values insert with sequences would >>>>> benefit an extremely narrow set of use cases, would be very labor >>>>> intensive to implement and maintain, and is unnecessary for the single >>>>> target platform in which this case would work. >>>>> >>>>> >>>>> > >>>>> > I'm sure this is something that was thought about already, so I'd >>>>> > mostly >>>>> > love to hear any thoughts about what makes this hard. We have a lot of >>>>> > complex relationships and joined-table inheritance, so working with >>>>> > bulk_save_objects has been a bit challenging. From the comments in >>>>> > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like >>>>> > SQLAlchemy is already aware that RETURNING for insert statements with >>>>> > multi-values VALUES clauses is supported, so it's possible there is a >>>>> > reason >>>>> > that this is hard or can't be done. >>>>> > >>>>> > This is also something that, if it just a “missing feature”, I would >>>>> > love to >>>>> > learn more and see if I could contribute. >>>>> > >>>>> > -- >>>>> > SQLAlchemy - >>>>> > The Python SQL Toolkit and Object Relational Mapper >>>>> > >>>>> > http://www.sqlalchemy.org/ >>>>> > >>>>> > To post example code, please provide an MCVE: Minimal, Complete, and >>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>>>> > description. >>>>> > --- >>>>> > You received this message because you are subscribed to the Google >>>>> > Groups >>>>> > "sqlalchemy" group. >>>>> > To unsubscribe from this group and stop receiving emails from it, send >>>>> > an >>>>> > email to sqlalchemy+...@googlegroups.com. >>>>> > To post to this group, send email to sqlal...@googlegroups.com. >>>>> > Visit this group at https://groups.google.com/group/sqlalchemy. >>>>> > For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> SQLAlchemy - >>>> The Python SQL Toolkit and Object Relational Mapper >>>> >>>> http://www.sqlalchemy.org/ >>>> >>>> To post example code, please provide an MCVE: Minimal, Complete, and >>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>>> description. >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "sqlalchemy" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/sqlalchemy/GyAZTThJi2I/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> sqlalchemy+...@googlegroups.com. >>>> To post to this group, send email to sqlal...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/sqlalchemy. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> SQLAlchemy - >>> The Python SQL Toolkit and Object Relational Mapper >>> >>> http://www.sqlalchemy.org/ >>> >>> To post example code, please provide an MCVE: Minimal, Complete, and >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> description. >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to sqlalchemy+...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>> >>> >>> *Attachments:* >>> * BatchInsertsBlogPost.pdf >> > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/8a216e1d-b421-48df-9739-9f261a745571n%40googlegroups.com > > <https://groups.google.com/d/msgid/sqlalchemy/8a216e1d-b421-48df-9739-9f261a745571n%40googlegroups.com?utm_medium=email&utm_source=footer>. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/4fc6b69f-8f17-491b-ab2a-5250e8329ce6%40www.fastmail.com.