Thanks Michael, I really appreciate the response. This broke something of a 
mental log-jam I was having.

The original use I had in mind for this was for my web framework. I was 
using it to tag "Session" objects for caching and then attach user specific 
data and frequently accessed data to the session to minimize database 
requests for common data across all a users requests. The goal was to have 
the caching be fairly transparent to the application so that I wouldn’t 
have to track invalidations manually. This object cache pattern has been in 
the framework for a while but I haven't had lots of time to really make it 
work properly (kind of a side project).

I'd looked at using the cascade but I was fixated (for some reason) on 
checking only the child items with changes (session.dirty/new/deleted) in 
the before_flush and after_commit hooks. I was looking for some way to 
annotate children with relationships with the top level parent container so 
that on a child modification I could operate on the container (A.B.C, C is 
modified by someone, mark A for invalidation) or to somehow walk the 
relationship backwards from child to parent. That last bit was where I got 
stuck and abandoned this approach. I had some implementations where I was 
trying to pass parent references down the chain but it got complicated when 
multiple ‘containers’ were referencing the same child and I never got the 
passing to work properly. (Any tips on the proper event to listen for for 
attribute attach and append?)

Your comment gave me a forehead slap moment. I realized I could just check 
*all* of the objects tagged as cache container objects in the current 
session and cascade down from each and mark the container for write through 
or invalidation if any related objects were modified. The pattern I'm using 
only has a few of these 'container' cached objects per web request (like 
session above) so checking them all isn't crazy-painful. I did some quick 
timeit experiments and it looks like the overhead of these extra checks is 
pretty negligible and it will potentially save me a lot of complexity and 
db lookups in my current application.

Another optimization I'm thinking of is to actually record the related 
children object list on fetch from cache and then compare to the new 
related object list on flush. That way new loaded relationships would get 
written through to the cache to be re-used on subsequent requests as well. 
The current implementation can’t see when an object has been lazy-loaded 
(but not modified) on the container. Any thoughts on this approach?

Just for reference, here's a (slightly modified) version of how I'm doing 
the invalidation / modification check now. In my quick and dirty testing it 
seems to cover all of the stale data cases I was encountering before.

The only other pieces to make this work is a cache_key attribute and a 
__cached__ tag on objects that will be cached and act as containers. I then 
have a cache.get method that I use to request these special containers. The 
get function checks memcached first, merges on hit, and requests from the 
database on miss.


session = cache.get(Session, session_id="XXXXX")

The item gets written through to the cache on successful commit if it's new 
or modified (and now, has related children that are new or modified). 
Putting it in after_commit also guarantees that invalid data that's rolled 
back doesn't accidentally get written to the cache.


from sqlalchemy import event
from sqlalchemy.orm.attributes import instance_state

def is_modified(obj):
    '''Check if SQLAlchemy believes this instance is modified.'''
    return instance_state(obj).modified

def child_relationships(obj):
    '''Get instances with child relationships to this obj'''
    state = instance_state(obj)
    mapper = obj.__mapper__
    for related_obj, mapper, state, data in 
mapper.cascade_iterator("save-update",
                                                                        
 state):
       yield related_obj

# Caching / invalidation event handlers and functions.
def cached_objects(iter):
    '''Return objects from a session/iterator that are tagged for caching'''
    for obj in iter:
        # check if the sqlalchemy object is tagged for caching
        if hasattr(obj, '__cached__'):
            yield obj

def cached_objects_with_updates(iter):
    '''Return objects from a session/iterator that are tagged for caching 
and
    that need an update.'''
    for obj in iter:
        if getattr(obj, "__needs_update__", False):
            yield obj

def check_needs_update(session, flush_context, instances):
    '''Check for objects in the SQLAlchemy session that are modified and
    tag them for cache update.'''
    for obj in cached_objects(session):
        if obj in (session.new or session.dirty):
            obj.__needs_update__ = True
        for related_obj in child_relationships(obj):
            if (related_obj in (session.new or session.dirty or 
session.deleted)
                                                 and 
is_modified(related_obj)):
                obj.__needs_update__ = True

def update_cached_objects(session):
    '''Write out (or invalidate) cache entries for modified entitites'''
    for obj in cached_objects_with_updates(session):
        obj.__needs_update__ = False
        cache.write(obj.cache_key, obj)

event.listen(models.session, "before_flush", check_needs_update)
event.listen(models.session, "after_commit", update_cached_objects)

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/W2InlwgOpj0J.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to