can u elaborate more on the C level programming? what are the bottleneck/s that can be redone in C? maybe i can help there.
On Friday 11 July 2008 18:26:59 Michael Bayer wrote: > On Jul 11, 2008, at 5:19 AM, Henk wrote: > > What we see here is that of the total of 5.965 seconds, the test > > spend only about 0.585 seconds performing the actual SQL querys, > > or about 10% of the total time. The other 90% of the time we are > > spending on SA overhead (except for the times.py line which is > > somewhere in the MySQLdb driver). > > > > If I look at the cummulative output of the profiler (ommitted for > > brevity) I see for instance that SA spends 1.3 seconds compiling > > the SQL expressions (it apparently recompiles the same SQL > > expression on each iteration). > > > > My question to the SA community would be how much ORM overhead do > > you find acceptable? and how does SA compare to for instance to > > java's Hibernate in this regard or other ORM solutions? > > As far as Hibernate, I work quite a bit with Hibernate and am > currently creating an almost exact port of a Java/Hibernate > application using Pylons and SQLAlchemy. My observation is that > SQLAlchemy is probably a little slower on execution overhead, but > is faster when fetching large numbers of rows. Your test is very > much focused on single execution speed. > > The funny thing is, the SQLAlchemy version of the application > overall is *much* faster than the Hibernate version, simply because > SQLAlchemy knows how to issue a lot fewer queries to get something > done. The main example is loading a list of A, each of which has > a relation to B and then a relation to C - you need to display a > slice of the "A" table (using LIMIT/OFFSET) as well as all the B's > and C's. > > This is probably the most common operation one needs to do with an > ORM, and Hibernate quite simply cannot do this efficiently - see > http://www.hibernate.org/117.html#A12 > > > "It should be also obvious why resultset row-based "limit" > operations, such as setFirstResult(5) and setMaxResults(10) do not > work with these kind of eager fetch queries. If you limit the > resultset to a certain number of rows, you cut off data randomly. > One day Hibernate might be smart enough to know that if you call > setFirstResult() or > setMaxResults() it should not use a join, but a second SQL SELECT. > Try it, your version of Hibernate might already be smart enough. If > not, write two queries, one for limiting stuff, the other for eager > fetching." > > As it turns out, SQLAlchemy was "smart enough" to do this correctly > from day one (though I'm not currently aware of any other ORM that > does this). When querying for rows with a limit/offset in > conjunction with eager joins, SQLA wraps the limited results in a > subquery, and applies the eager joins to that subquery. The > efficiency of the query is much greater than what you'd get with > Hibernate since the outer joins are against only a small slice of > the total rows, and of course you get the correct results back, and > there's no need to fall back to a second per-child-object query, > which I've found nearly impossible to prevent Hibernate from doing > in many cases. SQLA is very good at not "dropping back" into lazy > loading when eager loads are requested. Because of reasons like > this, most Hibernate applications speed up significantly when > converted to SQLAlchemy, since you'll almost certainly see at least > a few areas where Hibernate issued a few hundred queries that SQLA > issues only one. > > SQLA's very dynamic generation of queries is also a reason we don't > rely upon caching of the "compiled" form of a SQL query by default > - the complexity and memory overhead of doing so would negate any > speed improvements. While statement compilation adds some > overhead, its not really the primary overhead during normal > operation; fetching rows and converting to object instances, along > with the associated bookkeeping, is where most time gets spent. > In your own application, you can cache the "compiled" form of a SQL > query by using > statement.compile() and use it in conjunction with an ORM query > (via instances()), if there's a particular critical section in > which you're trying to optimize; but I think you'll find the > aggregate time savings (versus just issuing one less query) are > marginal. > > If you're still skeptical of the overhead of statement compilation, > see one very high capacity website which uses it regularly, > http://www.reddit.com . In their recently released source code, > you can see at > http://code.reddit.com/browser/r2/r2/lib/db/tdb_sql.py#L458 that > they are using a SQLAlchemy "select()" construct, built fresh on > each request and compiled, at the base of their SQL database > library. A few hundred function calls here are not the issue; > it's the fact that *any* SQL at all is issued which places a burden > on the application. So Reddit scales the way any website has to > scale; by adding processing power horizontally both at the database > level and at the webserver level, and by applying a second level > caching solution (more on this below). > > > I would love to help out trying to reduce this overhead, but I > > have no idea what the SA devs have in mind for the future of SA > > with regard to speed (e.g. speed vs. feature completeness). In > > the tutorials and documentation there is currently a strong > > emphasis on use cases highlighting the very dynamic nature of > > building complex queries using SA's ORM tools. Not so much on how > > to perform simple (and static) queries fast using the ORM. > > Maybe there is a way to meoize the results of the compile step so > > that this does not need to be redone all the time while the Query > > remains the same?. > > Well in your test, you are clearing the Session on each iteration. > If you weren't doing that, then the object you loaded would be > pulled straight from the Session each time, so in that case the > "memoization" is already complete. Similarly, a Query issued for > any number of rows runs much faster when the resulting objects are > already present in the Session since they do not need to be > re-created or populated again. > > For a more general form of "memoization" of queries and their > results, what you're looking for is a second level cache. > Hibernate suppllies a 2nd level caching solution but SQLAlchemy > currently does not, although simple ones are fairly straightforward > to create based on a product such as Memcached, or even just > well-maintained dicts. Any database-driven application which > needs very high performing display of data requires a second level > cache of some kind, whether or not an ORM is used; while SQLA does > not have a built in solution to the second level caching problem, > it also didn't create the issue. > > Overall, the issue of performance within SQLAlchemy has been > addressed exhaustively and continues to be on a daily basis; at > this point, the path to further speedups lies along two avenues - > one is that we constantly review the code and find small > incremental gains to be made; in particular we've squeezed out > 10-20% more speed in SQLA 0.5 versus 0.4. We have performance > tests in our test suite which run through a full battery of SQL and > ORM operations, ensuring that funciton call counts fall within an > expected range > (http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/test/profi >ling/zoomark_orm.py for example). > > The other avenue in which we are very interested is applying native > extensions to SQLA at key performance areas. Two that we have in > mind are result set processing and attribute instrumentation. This > is a longer road to travel (in particular since we are in need of a > C developer with some time to kill) but hopefully we'll get a > resource for this. There is another ORM called Storm which is in > the process of creating C extensions for their own product (the > development of which is fully funded by Storm's parent company > Canonical, a luxury SQLAlchemy unfortunately does not have), so > that's something else you might want to check out. A comparison > of SQLAlchemy and Storm's speeds (with the pre-C-extension version > of Storm) is at http://techspot.zzzeek.org/?p=17 , where you'll see > that Storm is generally a little faster than SQLA but SQLA loads > large numbers of rows more quickly (note that 0.5 is even faster > than the 0.4 tested in that post). I'm very satisfied with those > results since SQLA is philosophically more feature rich than Storm, > yet the speed of both products (sans any native extensions) are not > very different. > > In general, as long as speed is "decent", our users don't have any > issues with it. For this reason, while the C-based extension > route would be helpful, it's not a super-burning issue either. > Any web application, no matter what platform it's running on, is > scaled up to higher capacity through second level caching and via > the addition of more processing support horizontally. There's > plenty of applications running SQLAlchemy, SQLObject, Storm, etc. > at this point, all of which introduce significant overhead versus > raw SQL, and in general they don't seem to be falling over. > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---