Thank you for the excellent and comprehensive answer! I didn't realize exactly how much work the session object does
As per your advice, I have taken steps to reduce the size of my session object, and things run much more quickly now. On Wed, Apr 8, 2009 at 12:50 PM, Michael Bayer <mike...@zzzcomputing.com> wrote: > > Gregg Lind wrote: >> >> Good morning! >> >> I hope I'm not tipping any sacred cows here, but for simple SqlA >> objects, it seems to be a heckuva lot faster to just insert into the >> table directly (using SqlA expression language) than to insert the >> objects via session.flush(). In the attached tests, I'm observing a >> 10x -20x speedup. I'm still new to SqlA, so I was hoping the list >> here would be kind enough to verify my observation. > > > verified > >> >> My questions: >> >> 1. If so, why? I assume it's because session_flush() does seperate >> insert statments (as verified when echo = True is on). > > session.flush() performs a topological sort of all dirty/pending/deleted > objects based on foreign key dependencies between tables as well as > between rows, checks all modified attributes and collections (the > collections part sometimes requires a load of the collection, unless > certain options are set) for a net change in value, issues > INSERT/UPDATE/DELETE statements in an exact order based on dependencies, > tailors individual INSERT and UPDATE statements based on the values which > are present in memory vs. server side defaults (for inserts) or the values > which have a net change (for updates). It then issues all of these > statements individually (by necessity, since they all have different > argument lists and sometimes inter-row dependencies, you also cannot fetch > the "last inserted id" from an executemany()) which from a DBAPI point of > view is slower in any case, since you are calling execute() many times. > Newly inserted rows often require extra statements to fetch newly > generated primary keys, which are then distributed to all the > foreign-key-holding attributes which require it (which are then > potentially inserted or updated in subsequent statements). After all SQL > is emitted, it then refreshes the bookkeeping status on all entities which > were changed, and expires attributes whose values were generated within > the DB but don't need to be fetched until needed. > > OTOH an executemany() call receives a pre-made list of parameters for any > number of bind parameter sets, the DBAPI then prepares a single statement > and runs it N times, usually within C code, and you can modify or insert > tens of thousands of rows in a few seconds (the trick is that you've > generated this huge dict of data beforehand, and that your parameters are > all of identical structure). > > >> >> 2. In test 3, is this a reasonable away to "convert" from "session" >> to direct table insert? Is there a simpler way than the Thing.to_dict >> method I hacked together. > > for simple table mappings, its easy enough to deal with your rows as dicts > and use execute() to change things. For more complexity with relations to > other tables in various ways, it becomes less trivial. There are always > tradeoffs to be navigated according to your specific needs. > >> >> 3. Are these valid tests? I don't want to have all the embarrassment >> of some others who have 'slammed' SqlA without a proper grounding. >> I'm no expert, and I want to make sure what I have is something >> approximating idiomatic SqlA. I tried to be generous about what to >> include in the timed section of each test. I do have autoflush off, >> and I'm using Sqlite (in memory), which might affect things. > > I didnt look closely but the general observation of "expressions are > faster than ORM" is valid. Your orders of magnitude might be off. > >> >> 4. If there is a faster way to flush out a session, I'm all ears! I >> understand the Big Win (tm) of the ORM is programmer simplicity and >> power, but if I can get that without major hits to performance, I'd >> like to be able to Be Greedy (tm) and have it all. > > try keeping the size of the session small, and look into options like > "passive_deletes" and "passive_updates", which prevent rows from being > loaded in order to accomodate cascades that can be established in the > database directly. In any case flushing tens of thousands of objects is > unlikely to be performant. > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---