[sqlalchemy] Re: using Events to reduce memory footprint?

Victor Olex Mon, 07 Nov 2011 12:29:18 -0800

Thanks. I didn't know about yield_per. If I understand correctly,
"angles" 1 and 3 share the same weakness in that object identities may
be different between chunks. For situations where object state will
not be changed that is not terribly important. Also it seems that in
the first scenario as long as the containing object (User) remains in
session it will retain its correct identity while the collection of
contained objects (addresses) will be inconsistent between chunks.
Again, fair enough for chunked output. Now if I can generalize this to
n-level of relationship traversal we could be onto something good.


On Nov 7, 1:36 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Nov 7, 2011, at 7:14 AM, Victor Olex wrote:
>
>
>
> > I am curious if Events (new feature as of 0.7) could be used to reduce
> > memory footprint in certain situations or is it better achieved with
> > other features such as lazy loading.
>
> > For example, consider a complex mapped object (containing at least one
> > related table). Using joinedload_all option, SQLAlchemy conveniently
> > produces complex object from the query but the entire object resides
> > in session for the duration of the processing, which could take a lot
> > of memory.
>
> > In situations where complex mapped objects can be output as soon as
> > they are retrieved from the database one could begin streaming the
> > output as soon as the containing object gets loaded. The full list of
> > contained objects also need not be kept in memory -- once a contained
> > object gets output it could be removed from the list.
>
> > I would appreciate your thoughts on this.
>
> Well there's a few angles to this so far.   The Query can be instructed to 
> "yield" rows in chunks as they come in using yield_per().   Though when using 
> joinedload() for collections, it's pretty much guaranteed to produce 
> incorrect results:
>
> Rows 1-3, user / address:
>
> userid=1,  name='user',  address_id=1, email='a1'
> userid=1,  name='user',  address_id=2, email='a2'
> userid=1,  name='user',  address_id=3, email='a3'
>
> now suppose it yields.  You get back a User() object, with an "addresses" 
> collection of three elements.
>
> Next iteration, there's two more addresses.   So the collection you just got, 
> and potentially operated upon, was wrong.
>
> userid=1,  name='user',  address_id=4, email='a4'
> userid=1,  name='user',  address_id=5, email='a5'
>
> So two more User rows - this means you'd get back either the same User object 
> or another one, depending on if the previous one is still hanging around, and 
> a different addresses collection - also blowing away any changes you might 
> have made to .addresses.   So the data consistency here is totally broken.  
> This is why yield_per() says it can't be used safely with joinedload().
>
> In the "subqueryload" use case, yield_per() is mostly useless, as all related 
> collections are loaded at once.   If Users 1, 2, and 3 are yielded, the 
> "addresses" collection will proceed to load all the addresses for all users 
> in the result, so you still spend time loading a record set at least as large 
> as the whole thing before getting the first batch.
>
> Next angle, use events.    The use case here would be, I don't really care 
> about getting the result, I'd like to just intercept objects as they come in. 
>  You can do that using a handful of different events, including 
> append_result, populate_instance, and probably most usefully 
> load():http://www.sqlalchemy.org/docs/orm/events.html#sqlalchemy.orm.events.....
>     If you then turn on yield_per() the results will be chunked and not 
> stored in memory, but you'd still need to pretty much not access any 
> collections (or at least, not rely on them remaining consistent and not 
> mutating them) since they will be inconsistently populated.
>
>
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "sqlalchemy" group.
> > To post to this group, send email to sqlalchemy@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > sqlalchemy+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Re: using Events to reduce memory footprint?

Reply via email to