Re: [sqlalchemy] Reducing instrumentation overhead for read-only entities

Andrey Popp Sun, 19 Feb 2012 02:25:14 -0800

On Sat, Feb 18, 2012 at 10:48:30PM -0500, Michael Bayer wrote:
> On Feb 18, 2012, at 8:05 PM, Andrey Popp wrote:
> > On Sat, Feb 18, 2012 at 07:57:14PM -0500, Michael Bayer wrote:
> > > I don't know that customizing instrumentationmanager is all you'd need.
> > > There's lots of bookkeeping occurring with instancestate that takes time
> > > and is related to state tracking and persistence.  The whole need for
> > > __setstate__ is due to the instancestate object.   Not including it
> > > suggests an entirely new system that at most would consume rows from the
> > > orm.Query and route them into this alternate system.   
> > 
> > Well, I didn't want to implement entirely new system, but just to separate
> > part of ORM which manages persistence from part of ORM which just map
> > database columns to instance attrs.
>     
> As it is now, you can get named tuples out of Query that should be
> serializable and don't use InstanceState.   Rows from an execute() call also
> behave like named tuples and are serializable.  So I assume here you're
> looking for objects that also have collections and related instances,
> including lazy and eager loading of those, minus the usage of an
> InstanceState which you're saying is too expensive to deserialize, or a
> simplified InstanceState that is somehow not quite as expensive to
> deserialize, though really you'd be talking about shaving off maybe 10-20% of
> function calls if it were only a simplification of InstanceState.


Yeah, named tuples are not enough for me -- I have some methods attached for my
read-only models and I also use relationships.

> It would be a major new feature add requiring architectural changes, tests,
> and most importantly a clear documentation story that makes the rationale for
> this alternate mode very clear and makes it totally unambiguous when this
> mode might be used - else the entire project is diluted by echos of "too many
> ways to do it", "too confusing", "too complicated", etc.   It's complicating
> the internals and API for an use case that may very well be completely
> obsolete in a year or two due to Pypy and other performance techniques.   The
> actual performance savings may be marginal in any case.   

Separating persistence from mapping seems like it would be a good design
decision to me (unless I misunderstood something regarding SQLAlchemy usage
patterns).

Regarding rationale let me describe where I can find this feature useful:

  * You have a part of you tables read-only, so you only query data from them.

    SQLAlchemy doesn't have to track changes on objects of classes mapped to
    these tables. According to this stackoverflow post[1] we can get a "not so
    negligible" performance gain here.

  * We can map same table on same class using different mappers (one in
    "read-only mode" and other in "persistence mode").

    That way we can use "read-only mapper" to query data in case we don't want 
to
    change it and we can use "persistence mapper" otherwise. I believe this
    will also lead to better correctness of application itself.

While the latter use case might be a bit "advanced", the first one is pretty
usual, especially in apps with a strong separation into frontend and backend
parts (where backend updates read-only tables and might be implemented
completely differently from frontend application).

So the documentation story is "if you only read from table then you can use
read only mapper to map this table on a class, this will lead to improved
performance and save you from accidentally data corruption in this table".

I will try to hack see if I get something working out of this. So my goals are:

  * disable state tracking

  * do not install descriptors for columns and store values directly in
   __dict__

  * continue to use relationships as before

In the worst case I hope I'll just get a better understanding of SQLAlchemy
internals :-).

Thank you!

[1]: 
http://stackoverflow.com/questions/2322437/sqlalchemy-optimizations-for-read-only-object-models/2323267#2323267

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Reducing instrumentation overhead for read-only entities

Reply via email to