Re: [sqlalchemy] ORM performance

Michael Bayer Wed, 13 Apr 2011 18:07:36 -0700

On Apr 13, 2011, at 8:48 PM, Yang Zhang wrote:

> We've been finding the SA ORM to be a large component in our
> application execution time.  Our application makes frequent queries,
> and most of them simply query whole rows from a single table with some
> filters, but no joins (therefore, very basic queries).  We've found
> that an alarming amount of CPU time is being spent on three parts:
> 
> 1. Query compilation.  Even though the compiled_cache execution option
> exists, the way Query objects are commonly constructed makes compiled
> statements non-cacheable.  For example, the statements created often
> have the argument values baked in, and in any case, compiled_cache is
> keyed by instance-equality statements, so unless we pass in the exact
> same statement, we have to compile again.  We hacked around this by
> manually managing statement instances for some subset of our most
> common queries so that they can be looked up by compiled_cache.  This
> resulted in ~2x improvement.


that seems very strange - 2x improvement in....the overall speed of your 
application?  I've done an enormous amount of profiling - SQL compilation is 
miniscule compared to the SQL statement's execution itself and the fetching of 
rows.   To work around the "instance arguments being baked in", create the 
query like this:

query.filter(SomeClass.somerecord ==bindparam("somerecord"))

The params are then added using query.params(somerecord=x).

> 2. Instance mapping.  For these same relatively simple queries that
> don't need fancy mapping but are run over tables with a lot of
> columns, it takes a long time to create the ORM instances from rows.

that is very true.  You should query individual columns instead:

session.query(MyObject.id, MyObject.name)

you'll get named tuples back that will work very well for simple displays.   
Also you shouldn't be querying over large sets of columns even when you get 
objects back - use deferred() and/or exclude_properties/include_properties with 
your mappings. 

> 
> 3. ResultMetaData creation.  Since our tables have a lot of columns, a
> significant amount of time is actually spent in creating processors
> that read data off the cursor.  

query for individual columns as above, use deferred() in your mappings so that 
all those unneeded columns don't come back by default, don't map columns you 
don't need at all, using include_properties/exclude_properties.  Also you 
should be using the C extensions for those processors.   Also if you can 
upgrade to 0.7, the result processors are now cached globally per type/dialect.

> This work is sadly not reused in
> between queries.  

should be fixed in 0.7 - sad becomes happy !

> We hacked around this by building our own dialect
> execution_ctx_cls that caches ResultMetaData._keymap/keys for certain
> queries.  This resulted in ~3x performance improvement after applying
> 1 and 2.

you must have hundreds of columns in your tables for such a change to cause a 
3x improvement.   The large number of columns seems like the key factor here 
and you should use one of the techniques above to cut down on that.


> 
> All this feels like a lot of work, and we can't help but feel that we
> might be doing something wrong or missing something obvious.  We're
> wondering if there are any tips or anything built in to SA that could
> help improve the performance situation.

If your use case is really about querying individual rows for display, not 
dealing with relationships, related objects or collections, you will of course 
get vastly better performance using the C extensions with plain select() 
constructs, receiving ResultProxy objects.  Like the named tuple returned by 
query(X, Y, Z), the rows present in ResultProxy are also named tuples.    A 
view that does something simple like "row.x, row.y, row.z" should be able to 
receive results from query(MyObject), query(MyObject.x, MyObject.y, 
MyObject.z), and Session.execute(select([mytable.c.x, mytable.c.y, 
mytable.c.z])) equally.

For a quick introduction to the performance improvements in 0.7, see:

http://techspot.zzzeek.org/2010/12/12/a-tale-of-three-profiles/


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] ORM performance

Reply via email to