Hi Vineet, Mike,

@Vineet, thank you for the interesting blog post on bulk insert with 
SQLAlchemy ORM: 
https://benchling.engineering/sqlalchemy-batch-inserts-a-module-for-when-youre-inserting-thousands-of-rows-and-it-s-slow-16ece0ef5bf7

A few questions:

1. Do we need to get the incremented IDs from Postgresql itself, or can we 
just fetch the current max ID on a table and increment IDs in Python 
without querying the DB for the incremented values?

2. I was intrigued by the following phrase:

> * P.S. execute_values in psycopg2 v2.8 supports getting returned values 
back, so it’s possible that SQLAlchemy may support batch inserting these 
models (with an auto-incrementing primary key) in the future. 

@Mike @Vineet, do you know if this is the case, ie if bulk insert now works 
out of the box (without pre-assigning incremented IDs)?

3. Does this imply any change in case of bulk insert of multiple models 
with foreign keys referring to each other? This answer 
<https://stackoverflow.com/a/36387887/11750716> seems to suggest 
pre-assigning IDs for it to work.
On Friday, February 21, 2020 at 3:49:54 PM UTC+1 Mike Bayer wrote:

> Hi Vineet -
>
> glad that worked!   I'll have to find some time to recall what we worked 
> out here and how it came out for you, I wonder where on the site this kind 
> of thing could be mentioned.    we have 3rd party dialects listed out in 
> the docs but not yet a place for extensions.
>
> On Wed, Feb 19, 2020, at 9:28 PM, Vineet Gopal wrote:
>
> Hi Mike,
>
> Thanks for all of your help getting this working again. We've used this 
> solution in production for two years now, and it's helped our performance 
> significantly.
>
> We just open-sourced the solution that we built so others can use it, and 
> are also writing a blog post to cover some of the details. I'm attaching a 
> copy of the blog post here. Obviously not expected, but if you are 
> interested in taking a look, we are happy to incorporate any comments that 
> you may have before publishing.
>
> Here's a link to the repo as well: 
> https://github.com/benchling/sqlalchemy_batch_inserts
>
> Best,
> Vineet
>
> On Mon, Oct 9, 2017 at 10:27 PM <vin...@benchling.com> wrote:
>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
> identities = [ 
>     val for val, in session.execute( 
>              "select nextval('mytable_seq') from " 
>              "generate_series(1,%s)" % len(my_objects)) 
>     ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
>     obj.pk = ident 
>
> Wow, that's a great idea! I got it working for most of our models. I have 
> some questions about how inserts for joined-table inheritance tables are 
> batched together, but I'll ask them in a separate post since they're 
> somewhat unrelated to this.
>
> So the complexity of adding multi-values insert with sequences would 
> benefit an extremely narrow set of use cases, would be very labor 
> intensive to implement and maintain, and is unnecessary for the single 
> target platform in which this case would work. 
>
> That makes sense, thanks for the explanation!
>
>  
>
> On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote:
>
> On Mon, Oct 9, 2017 at 4:15 AM,  <vin...@benchling.com> wrote: 
> > Hello! I've spent some time looking at SQLAlchemy's ability to batch 
> > inserts, and have a few questions about bulk_save_objects (and flushing 
> in 
> > general). 
> > 
> > Two statements that I think are true: 
> > 
> > Right now, bulk_save_objects does not fetch primary keys for inserted 
> rows 
> > (unless return_defaults is passed in, which mitigates the performance 
> gains 
> > by performing one insert statement per model). 
>
> the point of bulk save is that the objects passed to it are considered 
> as throwaway after the operation, to avoid the overhead of maintaining 
> the objects' persistence state in memory.   Using it with 
> return_defaults is nearly pointless as this will blow away the 
> efficiency gains you might get from the bulk operation, and is there 
> mostly to allow the joined-table inheritance use case to be usable. 
>
>
> > When running db.session.flush(), it looks like SQLAlchemy can batch 
> model 
> > creation into a single multi-value insert statement *only if all of the 
> > primary keys are already defined on the model*. (Verified by looking at 
> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) 
> > 
> > 
> > Questions: 
> > 
> > Would it be possible for bulk_save_objects to fetch primary keys while 
> still 
> > batching INSERTs into a single query? 
>
> What do you mean "single query", do you mean, executemany()?  the 
> answer is no because DBAPI drivers don't return result sets with 
> executemany().  Do you mean, a single INSERT..VALUES with all the 
> parameters in one statement ?  The answer is no because SQLAlchemy 
> doesn't include multi-values as a transparent option; the DBAPI 
> drivers instead make use of the multi-values syntax within their 
> executemany() implementations, where again, they don't return result 
> sets. 
>
> Right now, only the MySQL drivers do this by default, the psycopg2 
> driver does it with a recently added option that SQLAlchemy does not 
> directly support, however you can set up via a connection event.   For 
> psycopg2 it can vastly speed up inserts as psycopg2's normal 
> executemany() implementation has some performance issues. 
>
>
> (This 
> > would help with e.g. inserting a bunch of rows into a table with an 
> > auto-incrementing primary key). 
>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
>
> identities = [ 
>     val for val, in session.execute( 
>              "select nextval('mytable_seq') from " 
>              "generate_series(1,%s)" % len(my_objects)) 
>     ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
>     obj.pk = ident 
>
>
> Now you don't need to do RETURNING or anything and the inserts can be 
> at their top efficiency. 
>
>
>
> > 
> > At least in Postgres (haven't verified for other databases), it looks 
> like 
> > one can use RETURNING for inserts with multi-value VALUES clauses. At 
> the 
> > surface, it seems like one could extend the current behavior for a 
> single 
> > row INSERT: 
> > 
> > INSERT INTO table ... VALUES (1) RETURNING id 
> > 
> > to multi-value INSERTS: 
> > 
> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id 
> > 
> > and get all the benefits of the ORM while still batching inserts. 
>
>
> This would be an enormous undertaking to implement, test, and release. 
>   It would have to exist as an entirely additional series of codepaths 
> within persistence.py as multi-valued INSERT is not available on most 
> databases as well as on earlier versions of the databases that do 
> support it.     For all of that complexity, the approach would work on 
> exactly:  Postgresql only - SQLite and MySQL don't support RETURNING, 
> Oracle, SQL Server and others don't support INSERT..VALUES with 
> multiple sets. 
>
> If you are using Postgresql, you can instead pre-fetch the sequence up 
> front.  This can even be integrated into a before_flush() event. 
>
> So the complexity of adding multi-values insert with sequences would 
> benefit an extremely narrow set of use cases, would be very labor 
> intensive to implement and maintain, and is unnecessary for the single 
> target platform in which this case would work. 
>
>
> > 
> > I'm sure this is something that was thought about already, so I'd mostly 
> > love to hear any thoughts about what makes this hard. We have a lot of 
> > complex relationships and joined-table inheritance, so working with 
> > bulk_save_objects has been a bit challenging. From the comments in 
> > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like 
> > SQLAlchemy is already aware that RETURNING for insert statements with 
> > multi-values VALUES clauses is supported, so it's possible there is a 
> reason 
> > that this is hard or can't be done. 
> > 
> > This is also something that, if it just a “missing feature”, I would 
> love to 
> > learn more and see if I could contribute. 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com. 
> > To post to this group, send email to sqlal...@googlegroups.com. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/sqlalchemy/GyAZTThJi2I/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+...@googlegroups.com.
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> *Attachments:*
>
>    - BatchInsertsBlogPost.pdf
>    
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/8a216e1d-b421-48df-9739-9f261a745571n%40googlegroups.com.

Reply via email to