Re: [sqlalchemy] More efficient Insert mechanism

Mati Skiva Tue, 02 Feb 2010 08:11:22 -0800

Michael Bayer wrote:

Mati Skiva wrote:

Thank you for your feedback.


In the case I mentioned, I cannot just dump the 30K items. Because I
need their generated id's for other inserted rows (I have relations)
In more detail, I am inserting "resource" items, and also mission items,
each mission connects to a resource via id.
So dumping all the data, without connecting python object to id is not
an option.


About the strategy for resolving the matter - I need to have some
assumptions, such as table locking.
Obviously this is not a good general approach, however in many cases a
functionality of bulk-insert-and-acquire-generated-id is desired. If the
functionality requires a special configuration and error handling, than
we guarantee that the developers are not surprised by failures.

So, I would like to implement it. Hopefully with your guidance.

As I see it, I need to perform the following steps:
* group insert actions by destination table (and order, not breaking
dependencies)
   this will allow me to perform bulk operations, as I am working on a
list of items, rather than one item.
* perform configuration and environment defendant SQL operations, or
withdraw to original insert operation

I need your help in the followings:
* some description of how the session.add results in insert commands
* a pointer to the code that deals with the inserts


I think if you familiarize yourself with the workings of the unit of work,
you'll see that inserts are already grouped about as much as they can be.

Your series of steps does not take into account the main issue I raised,
that the list of insert statements are not all of the same structure, thus
making insertmany impossible regardless of primary key fetching unless
each statement were carefully grouped by what parameters or embedded SQL
expressions are present - a procedure that will usually just add needless
overhead, since executemany() can almost never be used except in this very
rare "lock the tables and assume sequential ids" scenario.

The biggest reason to keep exotic edge cases out of the core ORM is that
everything you want to do is already possible outside of the ORM.   You
can apply your "guess the generated IDs" scheme on top of an executemany
yourself.   I can show you the public API that would allow you to mark
your inserted objects as "persistent"/"clean" as well after the insert so
that it would look like a flush() just occurred.

Regards,
Mati

This mail was sent via Mobileye Mail-SeCure system.


--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.

I believe I took these issues into account.
But just to be sure, maybe you can highlight something for me.

I assume the followings about the process of session.add:

* after session.add is called, the objects are placed in a to-do pool(maybe)* for self-generating-id objects, a special process is used, one thatfetches the generated id after the insert

* for no self-generating-id objects, a regular process of insert is used

I came to this conclusion, because otherwise, after each insert all thedata of the row is retrieved and placed inside the object. Which coversthe newly generated id.



I'll also try to express these ideas in other words (pseudo code)
The most simple flow is

session.add(obj) -> add_object_to_pool(obj) .... -> for every obj inpool create insert and add to inserts list (by grouping) ->execute(inserts list)

The problem here is that no code is handling the generated id, unlessexecute does that. It can do that either by being aware of theself-generating-id column or simply by updating all the fields of theobject.

If the knowledge is not within the execute, it must be somewhere else.So another piece of code needs to be called after the insert.My assumption than is that either a pointer to a function is passed withthe insert, or that the execute call is invoked from the function thatis aware of the self-generating-id column.



Please correct/highlight me.

Thanks in advance,
Mati

This mail was sent via Mobileye Mail-SeCure system.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] More efficient Insert mechanism

Reply via email to