On Aug 4, 2011, at 9:22 AM, Michael Bayer wrote:

> 
> On Aug 3, 2011, at 8:38 PM, Mark Erbaugh wrote:
> 
>> I'm using SA (with SQLite) with a schema like:
>> 
>> A -< B -< C -< D
>> 
>> where -< means that the tables have a one to many relationship
>> 
>> I'm populating  a sample data set where there are 25 rows in A, 25 rows in B 
>> for each row in A, 25 rows in C for each row in B and 25 rows in D for each 
>> row in C.  This results in about 390k rows in D.  The database itself is 
>> only about 12 MB, but it takes a long time (several minutes) to write the 
>> data to the file.
>> 
>> I'm taking the approach of appending items to the table's relationship 
>> column.
>> 
>> for i in range(25):
>>      x = A()
>>      session.add(A)
>>              for j in range(25):
>>                      y = B()
>>                      x.b.append(y)
>>                      for k in range(25):
>>                              z = C()
>>                              y.c.append(z)
>>                              for  l in range(25):
>>                                      xx = D()
>>                                      z.d.append(xx)
>> session.flush()


Thanks again for the help.  I decided to time the various approaches.  My 
original approach took 4:23 (minutes: seconds). Note: all my times included 
data generation and insertion into a SQLite on-disk database.

> The biggest speed variable in a mass INSERT operation is whether or not 
> individual cursor.execute() calls, or a small handful of cursor.executemany() 
> calls each with thousands of rows, are used.
> 
> With the ORM, a mass executemany() is used for INSERT in the case that 
> primary key values are already present in the given objects.   If not, the 
> ORM has to INSERT each A() row one at a time, get the new primary key value, 
> and then later populate 25*25 B() object's foreign key with the A.id value; 
> then this goes down to the B()->C() area, etc.
> 
> So if A(), B(), C(), D() can be created with "id=5", "id=6", etc., assuming 
> "id" is the primary key, the ORM should be able to bunch lots of rows 
> together into one cursor.executemany() call and you'll see an immediate, 
> dramatic speedup.


This took 3:36

> 
> The next level would be if you populated the "a_id", "b_id", 
> foreign-key-to-parent columns directly instead of using append().    If you 
> did a profile on your script you'd see lots of time taken in many places, but 
> all those append() operations would be one of them, as well as lots of event 
> firing and bookkeeping that SQLAlchemy has to do when they occur, both at 
> append() time as well as within the flush() (populating the primary key 
> values to foreign key attributes).


This took 2:28

> By far the fastest way to do this would be to use 
> session.execute(a_table.insert(), [<rows>]), 
> session.execute(b_table.insert(), [<rows>]), etc.   That is, assemble the 
> whole set of A, B, C, D, directly in terms of the mapped table,  or better 
> yet do it in chunks, perhaps drill down through B, C, D for a single A then 
> insert everything, etc.    That way you optimize how these rows are 
> constructed in Python exactly to the pattern that corresponds directly to the 
> database structure, instead of having SQLAlchemy decode the database 
> structure from an object hierarchy.  An insertmany is documented at 
> http://www.sqlalchemy.org/docs/core/tutorial.html#executing-multiple-statements.
>    The ORM Session has an execute() method just like Connection does.

I did this with one session.execute for each table, rather than chunks. It took 
0:46

> The range of speedups here would be between 30% and 80%, with direct usage of 
> connection/session .execute() with Table metadata giving you the 80%.

Mark


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to