[sqlalchemy] Re: Relative speed of ORM vs. direct insertions (on simple tables)

Gregg Lind Wed, 08 Apr 2009 13:22:06 -0700

Thank you for the excellent and comprehensive answer!  I didn't
realize exactly how much work the session object does


As per your advice, I have taken steps to reduce the size of my
session object, and things run much more quickly now.

On Wed, Apr 8, 2009 at 12:50 PM, Michael Bayer <mike...@zzzcomputing.com> wrote:
>
> Gregg Lind wrote:
>>
>> Good morning!
>>
>> I hope I'm not tipping any sacred cows here, but for simple SqlA
>> objects, it seems to be a heckuva lot faster to just insert into the
>> table directly (using SqlA expression language) than to insert the
>> objects via session.flush().  In the attached tests, I'm observing a
>> 10x -20x  speedup.  I'm still new to SqlA, so I was hoping the list
>> here would be kind enough to verify my observation.
>
>
> verified
>
>>
>> My questions:
>>
>> 1.  If so, why?  I assume it's because session_flush() does seperate
>> insert statments (as verified when echo = True is on).
>
> session.flush() performs a topological sort of all dirty/pending/deleted
> objects based on foreign key dependencies between tables as well as
> between rows, checks all modified attributes and collections (the
> collections part sometimes requires a load of the collection, unless
> certain options are set) for a net change in value, issues
> INSERT/UPDATE/DELETE statements in an exact order based on dependencies,
> tailors individual INSERT and UPDATE statements based on the values which
> are present in memory vs. server side defaults (for inserts) or the values
> which have a net change (for updates).   It then issues all of these
> statements individually  (by necessity, since they all have different
> argument lists and sometimes inter-row dependencies, you also cannot fetch
> the "last inserted id" from an executemany()) which from a DBAPI point of
> view is slower in any case, since you are calling execute() many times.
> Newly inserted rows often require extra statements to fetch newly
> generated primary keys, which are then distributed to all the
> foreign-key-holding attributes which require it (which are then
> potentially inserted or updated in subsequent statements).  After all SQL
> is emitted, it then refreshes the bookkeeping status on all entities which
> were changed, and expires attributes whose values were generated within
> the DB but don't need to be fetched until needed.
>
> OTOH an executemany() call receives a pre-made list of parameters for any
> number of bind parameter sets, the DBAPI then prepares a single statement
> and runs it N times, usually within C code, and you can modify or insert
> tens of thousands of rows in a few seconds (the trick is that you've
> generated this huge dict of data beforehand, and that your parameters are
> all of identical structure).
>
>
>>
>> 2.  In test 3, is this a reasonable away to "convert" from "session"
>> to direct table insert?  Is there a simpler way than the Thing.to_dict
>> method I hacked together.
>
> for simple table mappings, its easy enough to deal with your rows as dicts
> and use execute() to change things.  For more complexity with relations to
> other tables in various ways, it becomes less trivial.   There are always
> tradeoffs to be navigated according to your specific needs.
>
>>
>> 3.  Are these valid tests?  I don't want to have all the embarrassment
>> of some others who have 'slammed' SqlA without a proper grounding.
>> I'm no expert, and I want to make sure what I have is something
>> approximating idiomatic SqlA.  I tried to be generous about what to
>> include in the timed section of each test.  I do have autoflush off,
>> and I'm using Sqlite (in memory), which might affect things.
>
> I didnt look closely but the general observation of "expressions are
> faster than ORM" is valid.   Your orders of magnitude might be off.
>
>>
>> 4.  If there is a faster way to flush out a session, I'm all ears!  I
>> understand the Big Win (tm) of the ORM is programmer simplicity and
>> power, but if I can get that without major hits to performance, I'd
>> like to be able to Be Greedy (tm) and have it all.
>
> try keeping the size of the session small, and look into options like
> "passive_deletes" and "passive_updates", which prevent rows from being
> loaded in order to accomodate cascades that can be established in the
> database directly.  In any case flushing tens of thousands of objects is
> unlikely to be performant.
>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Relative speed of ORM vs. direct insertions (on simple tables)

Reply via email to