Hi Vineet - glad that worked! I'll have to find some time to recall what we worked out here and how it came out for you, I wonder where on the site this kind of thing could be mentioned. we have 3rd party dialects listed out in the docs but not yet a place for extensions.
On Wed, Feb 19, 2020, at 9:28 PM, Vineet Gopal wrote: > Hi Mike, > > Thanks for all of your help getting this working again. We've used this > solution in production for two years now, and it's helped our performance > significantly. > > We just open-sourced the solution that we built so others can use it, and are > also writing a blog post to cover some of the details. I'm attaching a copy > of the blog post here. Obviously not expected, but if you are interested in > taking a look, we are happy to incorporate any comments that you may have > before publishing. > > Here's a link to the repo as well: > https://github.com/benchling/sqlalchemy_batch_inserts > > Best, > Vineet > > On Mon, Oct 9, 2017 at 10:27 PM <vin...@benchling.com> wrote: >>> if you're using Postgresql, there's a vastly easier technique to use >>> which is just to pre-fetch from the sequence: >>> identities = [ >>> val for val, in session.execute( >>> "select nextval('mytable_seq') from " >>> "generate_series(1,%s)" % len(my_objects)) >>> ) >>> ] >>> for ident, obj in zip(identities, my_objects): >>> obj.pk = ident >> Wow, that's a great idea! I got it working for most of our models. I have >> some questions about how inserts for joined-table inheritance tables are >> batched together, but I'll ask them in a separate post since they're >> somewhat unrelated to this. >> >>> So the complexity of adding multi-values insert with sequences would >>> benefit an extremely narrow set of use cases, would be very labor >>> intensive to implement and maintain, and is unnecessary for the single >>> target platform in which this case would work. >> That makes sense, thanks for the explanation! >> >> >> >> On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote: >>> On Mon, Oct 9, 2017 at 4:15 AM, <vin...@benchling.com> wrote: >>> > Hello! I've spent some time looking at SQLAlchemy's ability to batch >>> > inserts, and have a few questions about bulk_save_objects (and flushing >>> > in >>> > general). >>> > >>> > Two statements that I think are true: >>> > >>> > Right now, bulk_save_objects does not fetch primary keys for inserted >>> > rows >>> > (unless return_defaults is passed in, which mitigates the performance >>> > gains >>> > by performing one insert statement per model). >>> >>> the point of bulk save is that the objects passed to it are considered >>> as throwaway after the operation, to avoid the overhead of maintaining >>> the objects' persistence state in memory. Using it with >>> return_defaults is nearly pointless as this will blow away the >>> efficiency gains you might get from the bulk operation, and is there >>> mostly to allow the joined-table inheritance use case to be usable. >>> >>> >>> > When running db.session.flush(), it looks like SQLAlchemy can batch model >>> > creation into a single multi-value insert statement *only if all of the >>> > primary keys are already defined on the model*. (Verified by looking at >>> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) >>> > >>> > >>> > Questions: >>> > >>> > Would it be possible for bulk_save_objects to fetch primary keys while >>> > still >>> > batching INSERTs into a single query? >>> >>> What do you mean "single query", do you mean, executemany()? the >>> answer is no because DBAPI drivers don't return result sets with >>> executemany(). Do you mean, a single INSERT..VALUES with all the >>> parameters in one statement ? The answer is no because SQLAlchemy >>> doesn't include multi-values as a transparent option; the DBAPI >>> drivers instead make use of the multi-values syntax within their >>> executemany() implementations, where again, they don't return result >>> sets. >>> >>> Right now, only the MySQL drivers do this by default, the psycopg2 >>> driver does it with a recently added option that SQLAlchemy does not >>> directly support, however you can set up via a connection event. For >>> psycopg2 it can vastly speed up inserts as psycopg2's normal >>> executemany() implementation has some performance issues. >>> >>> >>> (This >>> > would help with e.g. inserting a bunch of rows into a table with an >>> > auto-incrementing primary key). >>> >>> if you're using Postgresql, there's a vastly easier technique to use >>> which is just to pre-fetch from the sequence: >>> >>> identities = [ >>> val for val, in session.execute( >>> "select nextval('mytable_seq') from " >>> "generate_series(1,%s)" % len(my_objects)) >>> ) >>> ] >>> for ident, obj in zip(identities, my_objects): >>> obj.pk = ident >>> >>> >>> Now you don't need to do RETURNING or anything and the inserts can be >>> at their top efficiency. >>> >>> >>> >>> > >>> > At least in Postgres (haven't verified for other databases), it looks >>> > like >>> > one can use RETURNING for inserts with multi-value VALUES clauses. At the >>> > surface, it seems like one could extend the current behavior for a single >>> > row INSERT: >>> > >>> > INSERT INTO table ... VALUES (1) RETURNING id >>> > >>> > to multi-value INSERTS: >>> > >>> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id >>> > >>> > and get all the benefits of the ORM while still batching inserts. >>> >>> >>> This would be an enormous undertaking to implement, test, and release. >>> It would have to exist as an entirely additional series of codepaths >>> within persistence.py as multi-valued INSERT is not available on most >>> databases as well as on earlier versions of the databases that do >>> support it. For all of that complexity, the approach would work on >>> exactly: Postgresql only - SQLite and MySQL don't support RETURNING, >>> Oracle, SQL Server and others don't support INSERT..VALUES with >>> multiple sets. >>> >>> If you are using Postgresql, you can instead pre-fetch the sequence up >>> front. This can even be integrated into a before_flush() event. >>> >>> So the complexity of adding multi-values insert with sequences would >>> benefit an extremely narrow set of use cases, would be very labor >>> intensive to implement and maintain, and is unnecessary for the single >>> target platform in which this case would work. >>> >>> >>> > >>> > I'm sure this is something that was thought about already, so I'd mostly >>> > love to hear any thoughts about what makes this hard. We have a lot of >>> > complex relationships and joined-table inheritance, so working with >>> > bulk_save_objects has been a bit challenging. From the comments in >>> > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like >>> > SQLAlchemy is already aware that RETURNING for insert statements with >>> > multi-values VALUES clauses is supported, so it's possible there is a >>> > reason >>> > that this is hard or can't be done. >>> > >>> > This is also something that, if it just a “missing feature”, I would love >>> > to >>> > learn more and see if I could contribute. >>> > >>> > -- >>> > SQLAlchemy - >>> > The Python SQL Toolkit and Object Relational Mapper >>> > >>> > http://www.sqlalchemy.org/ >>> > >>> > To post example code, please provide an MCVE: Minimal, Complete, and >>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> > description. >>> > --- >>> > You received this message because you are subscribed to the Google Groups >>> > "sqlalchemy" group. >>> > To unsubscribe from this group and stop receiving emails from it, send an >>> > email to sqlalchemy+...@googlegroups.com. >>> > To post to this group, send email to sqlal...@googlegroups.com. >>> > Visit this group at https://groups.google.com/group/sqlalchemy. >>> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> SQLAlchemy - >> The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "sqlalchemy" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/sqlalchemy/GyAZTThJi2I/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> sqlalchemy+unsubscr...@googlegroups.com. >> To post to this group, send email to sqlalchemy@googlegroups.com. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com > > <https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com?utm_medium=email&utm_source=footer>. > > > *Attachments:* > * BatchInsertsBlogPost.pdf -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/8fc1d56e-dc34-473b-8452-4359af640a6d%40www.fastmail.com.