Hi Vineet -

glad that worked! I'll have to find some time to recall what we worked out here 
and how it came out for you, I wonder where on the site this kind of thing 
could be mentioned. we have 3rd party dialects listed out in the docs but not 
yet a place for extensions.

On Wed, Feb 19, 2020, at 9:28 PM, Vineet Gopal wrote:
> Hi Mike,
> 
> Thanks for all of your help getting this working again. We've used this 
> solution in production for two years now, and it's helped our performance 
> significantly.
> 
> We just open-sourced the solution that we built so others can use it, and are 
> also writing a blog post to cover some of the details. I'm attaching a copy 
> of the blog post here. Obviously not expected, but if you are interested in 
> taking a look, we are happy to incorporate any comments that you may have 
> before publishing.
> 
> Here's a link to the repo as well: 
> https://github.com/benchling/sqlalchemy_batch_inserts
> 
> Best,
> Vineet
> 
> On Mon, Oct 9, 2017 at 10:27 PM <vin...@benchling.com> wrote:
>>> if you're using Postgresql, there's a vastly easier technique to use 
>>> which is just to pre-fetch from the sequence: 
>>> identities = [ 
>>>  val for val, in session.execute( 
>>>  "select nextval('mytable_seq') from " 
>>>  "generate_series(1,%s)" % len(my_objects)) 
>>>  ) 
>>> ] 
>>> for ident, obj in zip(identities, my_objects): 
>>> obj.pk = ident 
>> Wow, that's a great idea! I got it working for most of our models. I have 
>> some questions about how inserts for joined-table inheritance tables are 
>> batched together, but I'll ask them in a separate post since they're 
>> somewhat unrelated to this.
>> 
>>> So the complexity of adding multi-values insert with sequences would 
>>> benefit an extremely narrow set of use cases, would be very labor 
>>> intensive to implement and maintain, and is unnecessary for the single 
>>> target platform in which this case would work. 
>> That makes sense, thanks for the explanation!
>> 
>> 
>> 
>> On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote:
>>> On Mon, Oct 9, 2017 at 4:15 AM, <vin...@benchling.com> wrote: 
>>> > Hello! I've spent some time looking at SQLAlchemy's ability to batch 
>>> > inserts, and have a few questions about bulk_save_objects (and flushing 
>>> > in 
>>> > general). 
>>> > 
>>> > Two statements that I think are true: 
>>> > 
>>> > Right now, bulk_save_objects does not fetch primary keys for inserted 
>>> > rows 
>>> > (unless return_defaults is passed in, which mitigates the performance 
>>> > gains 
>>> > by performing one insert statement per model). 
>>> 
>>> the point of bulk save is that the objects passed to it are considered 
>>> as throwaway after the operation, to avoid the overhead of maintaining 
>>> the objects' persistence state in memory. Using it with 
>>> return_defaults is nearly pointless as this will blow away the 
>>> efficiency gains you might get from the bulk operation, and is there 
>>> mostly to allow the joined-table inheritance use case to be usable. 
>>> 
>>> 
>>> > When running db.session.flush(), it looks like SQLAlchemy can batch model 
>>> > creation into a single multi-value insert statement *only if all of the 
>>> > primary keys are already defined on the model*. (Verified by looking at 
>>> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) 
>>> > 
>>> > 
>>> > Questions: 
>>> > 
>>> > Would it be possible for bulk_save_objects to fetch primary keys while 
>>> > still 
>>> > batching INSERTs into a single query? 
>>> 
>>> What do you mean "single query", do you mean, executemany()? the 
>>> answer is no because DBAPI drivers don't return result sets with 
>>> executemany(). Do you mean, a single INSERT..VALUES with all the 
>>> parameters in one statement ? The answer is no because SQLAlchemy 
>>> doesn't include multi-values as a transparent option; the DBAPI 
>>> drivers instead make use of the multi-values syntax within their 
>>> executemany() implementations, where again, they don't return result 
>>> sets. 
>>> 
>>> Right now, only the MySQL drivers do this by default, the psycopg2 
>>> driver does it with a recently added option that SQLAlchemy does not 
>>> directly support, however you can set up via a connection event. For 
>>> psycopg2 it can vastly speed up inserts as psycopg2's normal 
>>> executemany() implementation has some performance issues. 
>>> 
>>> 
>>> (This 
>>> > would help with e.g. inserting a bunch of rows into a table with an 
>>> > auto-incrementing primary key). 
>>> 
>>> if you're using Postgresql, there's a vastly easier technique to use 
>>> which is just to pre-fetch from the sequence: 
>>> 
>>> identities = [ 
>>>  val for val, in session.execute( 
>>>  "select nextval('mytable_seq') from " 
>>>  "generate_series(1,%s)" % len(my_objects)) 
>>>  ) 
>>> ] 
>>> for ident, obj in zip(identities, my_objects): 
>>> obj.pk = ident 
>>> 
>>> 
>>> Now you don't need to do RETURNING or anything and the inserts can be 
>>> at their top efficiency. 
>>> 
>>> 
>>> 
>>> > 
>>> > At least in Postgres (haven't verified for other databases), it looks 
>>> > like 
>>> > one can use RETURNING for inserts with multi-value VALUES clauses. At the 
>>> > surface, it seems like one could extend the current behavior for a single 
>>> > row INSERT: 
>>> > 
>>> > INSERT INTO table ... VALUES (1) RETURNING id 
>>> > 
>>> > to multi-value INSERTS: 
>>> > 
>>> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id 
>>> > 
>>> > and get all the benefits of the ORM while still batching inserts. 
>>> 
>>> 
>>> This would be an enormous undertaking to implement, test, and release. 
>>>  It would have to exist as an entirely additional series of codepaths 
>>> within persistence.py as multi-valued INSERT is not available on most 
>>> databases as well as on earlier versions of the databases that do 
>>> support it. For all of that complexity, the approach would work on 
>>> exactly: Postgresql only - SQLite and MySQL don't support RETURNING, 
>>> Oracle, SQL Server and others don't support INSERT..VALUES with 
>>> multiple sets. 
>>> 
>>> If you are using Postgresql, you can instead pre-fetch the sequence up 
>>> front. This can even be integrated into a before_flush() event. 
>>> 
>>> So the complexity of adding multi-values insert with sequences would 
>>> benefit an extremely narrow set of use cases, would be very labor 
>>> intensive to implement and maintain, and is unnecessary for the single 
>>> target platform in which this case would work. 
>>> 
>>> 
>>> > 
>>> > I'm sure this is something that was thought about already, so I'd mostly 
>>> > love to hear any thoughts about what makes this hard. We have a lot of 
>>> > complex relationships and joined-table inheritance, so working with 
>>> > bulk_save_objects has been a bit challenging. From the comments in 
>>> > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like 
>>> > SQLAlchemy is already aware that RETURNING for insert statements with 
>>> > multi-values VALUES clauses is supported, so it's possible there is a 
>>> > reason 
>>> > that this is hard or can't be done. 
>>> > 
>>> > This is also something that, if it just a “missing feature”, I would love 
>>> > to 
>>> > learn more and see if I could contribute. 
>>> > 
>>> > -- 
>>> > SQLAlchemy - 
>>> > The Python SQL Toolkit and Object Relational Mapper 
>>> > 
>>> > http://www.sqlalchemy.org/ 
>>> > 
>>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>> > description. 
>>> > --- 
>>> > You received this message because you are subscribed to the Google Groups 
>>> > "sqlalchemy" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send an 
>>> > email to sqlalchemy+...@googlegroups.com. 
>>> > To post to this group, send email to sqlal...@googlegroups.com. 
>>> > Visit this group at https://groups.google.com/group/sqlalchemy. 
>>> > For more options, visit https://groups.google.com/d/optout. 
>> 

>> --
>>  SQLAlchemy - 
>>  The Python SQL Toolkit and Object Relational Mapper
>> 
>> http://www.sqlalchemy.org/
>> 
>>  To post example code, please provide an MCVE: Minimal, Complete, and 
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> description.
>>  --- 
>>  You received this message because you are subscribed to a topic in the 
>> Google Groups "sqlalchemy" group.
>>  To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/sqlalchemy/GyAZTThJi2I/unsubscribe.
>>  To unsubscribe from this group and all its topics, send an email to 
>> sqlalchemy+unsubscr...@googlegroups.com.
>>  To post to this group, send email to sqlalchemy@googlegroups.com.
>>  Visit this group at https://groups.google.com/group/sqlalchemy.
>>  For more options, visit https://groups.google.com/d/optout.
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/CAJYnotsAJ39FuiVYS_UD6338AEktOZQmw1ChuQK2HOXPkMCwLw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> 
> 
> *Attachments:*
>  * BatchInsertsBlogPost.pdf

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/8fc1d56e-dc34-473b-8452-4359af640a6d%40www.fastmail.com.

Reply via email to