Thanks Michael, I really appreciate the response. This broke something of a mental log-jam I was having.
The original use I had in mind for this was for my web framework. I was using it to tag "Session" objects for caching and then attach user specific data and frequently accessed data to the session to minimize database requests for common data across all a users requests. The goal was to have the caching be fairly transparent to the application so that I wouldn’t have to track invalidations manually. This object cache pattern has been in the framework for a while but I haven't had lots of time to really make it work properly (kind of a side project). I'd looked at using the cascade but I was fixated (for some reason) on checking only the child items with changes (session.dirty/new/deleted) in the before_flush and after_commit hooks. I was looking for some way to annotate children with relationships with the top level parent container so that on a child modification I could operate on the container (A.B.C, C is modified by someone, mark A for invalidation) or to somehow walk the relationship backwards from child to parent. That last bit was where I got stuck and abandoned this approach. I had some implementations where I was trying to pass parent references down the chain but it got complicated when multiple ‘containers’ were referencing the same child and I never got the passing to work properly. (Any tips on the proper event to listen for for attribute attach and append?) Your comment gave me a forehead slap moment. I realized I could just check *all* of the objects tagged as cache container objects in the current session and cascade down from each and mark the container for write through or invalidation if any related objects were modified. The pattern I'm using only has a few of these 'container' cached objects per web request (like session above) so checking them all isn't crazy-painful. I did some quick timeit experiments and it looks like the overhead of these extra checks is pretty negligible and it will potentially save me a lot of complexity and db lookups in my current application. Another optimization I'm thinking of is to actually record the related children object list on fetch from cache and then compare to the new related object list on flush. That way new loaded relationships would get written through to the cache to be re-used on subsequent requests as well. The current implementation can’t see when an object has been lazy-loaded (but not modified) on the container. Any thoughts on this approach? Just for reference, here's a (slightly modified) version of how I'm doing the invalidation / modification check now. In my quick and dirty testing it seems to cover all of the stale data cases I was encountering before. The only other pieces to make this work is a cache_key attribute and a __cached__ tag on objects that will be cached and act as containers. I then have a cache.get method that I use to request these special containers. The get function checks memcached first, merges on hit, and requests from the database on miss. session = cache.get(Session, session_id="XXXXX") The item gets written through to the cache on successful commit if it's new or modified (and now, has related children that are new or modified). Putting it in after_commit also guarantees that invalid data that's rolled back doesn't accidentally get written to the cache. from sqlalchemy import event from sqlalchemy.orm.attributes import instance_state def is_modified(obj): '''Check if SQLAlchemy believes this instance is modified.''' return instance_state(obj).modified def child_relationships(obj): '''Get instances with child relationships to this obj''' state = instance_state(obj) mapper = obj.__mapper__ for related_obj, mapper, state, data in mapper.cascade_iterator("save-update", state): yield related_obj # Caching / invalidation event handlers and functions. def cached_objects(iter): '''Return objects from a session/iterator that are tagged for caching''' for obj in iter: # check if the sqlalchemy object is tagged for caching if hasattr(obj, '__cached__'): yield obj def cached_objects_with_updates(iter): '''Return objects from a session/iterator that are tagged for caching and that need an update.''' for obj in iter: if getattr(obj, "__needs_update__", False): yield obj def check_needs_update(session, flush_context, instances): '''Check for objects in the SQLAlchemy session that are modified and tag them for cache update.''' for obj in cached_objects(session): if obj in (session.new or session.dirty): obj.__needs_update__ = True for related_obj in child_relationships(obj): if (related_obj in (session.new or session.dirty or session.deleted) and is_modified(related_obj)): obj.__needs_update__ = True def update_cached_objects(session): '''Write out (or invalidate) cache entries for modified entitites''' for obj in cached_objects_with_updates(session): obj.__needs_update__ = False cache.write(obj.cache_key, obj) event.listen(models.session, "before_flush", check_needs_update) event.listen(models.session, "after_commit", update_cached_objects) -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/W2InlwgOpj0J. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.