Great, cheers.Let's update SQLAlchemy to 0.5.4 for the performance enhancement.
On Mon, May 18, 2009 at 9:20 AM, Michael Bayer <mike...@zzzcomputing.com>wrote: > > Hello list - > > SQLAlchemy 0.5.4 is released, and this release is *highly* recommended > for all users. For an indication of how high, lets just say, higher > than 0.5.3, 0.5.2, and 0.5.1combined. > > Not to worry, there are no security holes or memory leaks in previous > versions. But we have neutralized some major, major speed bumps in > the flush() process, as well as made significant improvements to > memory usage. Due to the removal of these bugs, large dataset > scenarios that were more or less impossible to work with for any > version of SQLAlchemy now run at top speed with the same rate > performance regardless of how large the Session grows. Other > performance issues that were proportional to the number of > interconnected mapped classes, memory/speed issues related to the > number of relation()s set up on mappers during a load, or just > wasteful overhead during the flush have been mitigated. The > improvements are of a magnitude such that some applications that had > abaondoned the ORM due to latency related to large sets of objects may > be able to come back to it and regain all its advantages. > > The key to all these improvements is, that i finally have a job using > SQLAlchemy full time where I've gotten the opportunity to use the ORM > with somewhat large amounts of data. None of these issues were very > deep and just required that I spend some more time with the profiler > and bigger datasets. My own use case here is a 6500 row spreadsheet > of interconnected objects, representing about 25K rows - the process > of ingesting that data, which requires that all of the objects need to > stay present in the session, has gone from 33 minutes to 8. The > key is that the number of method calls to flush X number of objects is > now the same for a session regardless of how many other non-dirty > items are present. Similarly, a mapping setup that has 30 mappers > configured will not be slowed down by unnecessary traversal of all the > mapper relations. > > The Session itself, which has for some time now has been "weak > referencing" with regards to its contents, has been repaired such that > the weak referencing behavior is now fully operational. Previously, > objects which were related via mutual backrefs would not get cleared > from the session when all external references were lost until you > expunged them. That is no longer necessary - the Session now has > no strong references whatsoever to its contents, as long as no changes > are pending on those objects. Pending changes as always are strongly > referenced until flushed. So now you can iterate through as many > tens of thousands of objects as you like (keeping in mind an > individual Query still loads each individual result fully in unless > yield_per is enabled) and there's no need to expunge the session in > between chunks. > > The loading of objects has also been sped up and reduced in memory > overhead by killing a wasteful structure of callables that was > generated on a per-relation()/per-object basis whenever > query.options() was used. > > In other news I've backported a convenient extension from the 0.6 > series which allows you to create custom SQL expression elements with > compiler functions. This is the "compiler" extension and is > described in the documentation. > > Download SQLAlchemy 0.5.4 (right now !! get rid of whatever buggy old > version you're using) at: > > http://www.sqlalchemy.org/download.html > > > 0.5.4 > ===== > > - orm > - Significant performance enhancements regarding Sessions/flush() > in conjunction with large mapper graphs, large numbers of > objects: > > - Removed all* O(N) scanning behavior from the flush() process, > i.e. operations that were scanning the full session, > including an extremely expensive one that was erroneously > assuming primary key values were changing when this > was not the case. > > * one edge case remains which may invoke a full scan, > if an existing primary key attribute is modified > to a new value. > > - The Session's "weak referencing" behavior is now *full* - > no strong references whatsoever are made to a mapped object > or related items/collections in its __dict__. Backrefs and > other cycles in objects no longer affect the Session's ability > to lose all references to unmodified objects. Objects with > pending changes still are maintained strongly until flush. > [ticket:1398] > > The implementation also improves performance by moving > the "resurrection" process of garbage collected items > to only be relevant for mappings that map "mutable" > attributes (i.e. PickleType, composite attrs). This removes > overhead from the gc process and simplifies internal > behavior. > > If a "mutable" attribute change is the sole change on an object > which is then dereferenced, the mapper will not have access to > other attribute state when the UPDATE is issued. This may > present > itself differently to some MapperExtensions. > > The change also affects the internal attribute API, but not > the AttributeExtension interface nor any of the publically > documented attribute functions. > > - The unit of work no longer genererates a graph of "dependency" > processors for the full graph of mappers during flush(), > instead > creating such processors only for those mappers which represent > objects with pending changes. This saves a tremendous number > of method calls in the context of a large interconnected > graph of mappers. > > - Cached a wasteful "table sort" operation that previously > occured multiple times per flush, also removing significant > method call count from flush(). > > - Other redundant behaviors have been simplified in > mapper._save_obj(). > > - Modified query_cls on DynamicAttributeImpl to accept a full > mixin version of the AppenderQuery, which allows subclassing > the AppenderMixin. > > - The "polymorphic discriminator" column may be part of a > primary key, and it will be populated with the correct > discriminator value. [ticket:1300] > > - Fixed the evaluator not being able to evaluate IS NULL clauses. > > - Fixed the "set collection" function on "dynamic" relations to > initiate events correctly. Previously a collection could only > be assigned to a pending parent instance, otherwise modified > events would not be fired correctly. Set collection is now > compatible with merge(), fixes [ticket:1352]. > > - Allowed pickling of PropertyOption objects constructed with > instrumented descriptors; previously, pickle errors would occur > when pickling an object which was loaded with a descriptor-based > option, such as query.options(eagerload(MyClass.foo)). > > - Lazy loader will not use get() if the "lazy load" SQL clause > matches the clause used by get(), but contains some parameters > hardcoded. Previously the lazy strategy would fail with the > get(). Ideally get() would be used with the hardcoded > parameters but this would require further development. > [ticket:1357] > > - MapperOptions and other state associated with query.options() > is no longer bundled within callables associated with each > lazy/deferred-loading attribute during a load. > The options are now associated with the instance's > state object just once when it's populated. This removes > the need in most cases for per-instance/attribute loader > objects, improving load speed and memory overhead for > individual instances. [ticket:1391] > > - Fixed another location where autoflush was interfering > with session.merge(). autoflush is disabled completely > for the duration of merge() now. [ticket:1360] > > - Fixed bug which prevented "mutable primary key" dependency > logic from functioning properly on a one-to-one > relation(). [ticket:1406] > > - Fixed bug in relation(), introduced in 0.5.3, > whereby a self referential relation > from a base class to a joined-table subclass would > not configure correctly. > > - Fixed obscure mapper compilation issue when inheriting > mappers are used which would result in un-initialized > attributes. > > - Fixed documentation for session weak_identity_map - > the default value is True, indicating a weak > referencing map in use. > > - Fixed a unit of work issue whereby the foreign > key attribute on an item contained within a collection > owned by an object being deleted would not be set to > None if the relation() was self-referential. [ticket:1376] > > - Fixed Query.update() and Query.delete() failures with eagerloaded > relations. [ticket:1378] > > - It is now an error to specify both columns of a binary > primaryjoin > condition in the foreign_keys or remote_side collection. Whereas > previously it was just nonsensical, but would succeed in a > non-deterministic way. > > - schema > - Added a quote_schema() method to the IdentifierPreparer class > so that dialects can override how schemas get handled. This > enables the MSSQL dialect to treat schemas as multipart > identifiers, such as 'database.owner'. [ticket: 594, 1341] > > - sql > - Back-ported the "compiler" extension from SQLA 0.6. This > is a standardized interface which allows the creation of custom > ClauseElement subclasses and compilers. In particular it's > handy as an alternative to text() when you'd like to > build a construct that has database-specific compilations. > See the extension docs for details. > > - Exception messages are truncated when the list of bound > parameters is larger than 10, preventing enormous > multi-page exceptions from filling up screens and logfiles > for large executemany() statements. [ticket:1413] > > - ``sqlalchemy.extract()`` is now dialect sensitive and can > extract components of timestamps idiomatically across the > supported databases, including SQLite. > > - Fixed __repr__() and other _get_colspec() methods on > ForeignKey constructed from __clause_element__() style > construct (i.e. declarative columns). [ticket:1353] > > - mysql > - Reflecting a FOREIGN KEY construct will take into account > a dotted schema.tablename combination, if the foreign key > references a table in a remote schema. [ticket:1405] > > - mssql > - Modified how savepoint logic works to prevent it from > stepping on non-savepoint oriented routines. Savepoint > support is still very experimental. > > - Added in reserved words for MSSQL that covers version 2008 > and all prior versions. [ticket:1310] > > - Corrected problem with information schema not working with a > binary collation based database. Cleaned up information schema > since it is only used by mssql now. [ticket:1343] > > - sqlite > - Corrected the SLBoolean type so that it properly treats only 1 > as True. [ticket:1402] > > - Corrected the float type so that it correctly maps to a > SLFloat type when being reflected. [ticket:1273] > > - extensions > > - Fixed adding of deferred or other column properties to a > declarative class. [ticket:1379] > > - Added "compiler" extension from 0.6 > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---