[sqlalchemy] Re: Optimizing performance of hierarchical data structure loading

Alec Munro Thu, 07 Oct 2010 07:51:14 -0700

Well, I finally had the time to try this out, and after a little
fiddling, I cut my time in half. I still think there must be further
optimization options (what David suggested probably won't work for our
purposes, but I may look into it), as when I only load the first level
of data, which is 6K, it takes 0.15 seconds, but loading the complete
hieararchy, which is just 2 more 2k, it takes 2 seconds (note that
this was a slightly different data set than what I used before, where
it was taking 9 seconds).


But it's nice to actually see some results, and given that these
measurements are from my laptop, which does all disk IO really slowly,
I think it's probably all the improvement I need for the moment.

Thanks!
Alec

On Sep 5, 9:55 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Sep 5, 2010, at 7:41 PM,AlecMunrowrote:
>
>
>
>
>
>
>
>
>
> > Hi list,
> > I've got a tree-type of data, representing somewhat of a method call
> > history. When this is loaded, in an instance where it represents about
> > 60 entries total, up to 2 deep, representing 14k of data, it takes
> > about 9 seconds (up from ~0). Now, my laptop HDD where I am testing
> > now is pretty terrible, but there's probably also some kind of
> > optimization I could do.
> > When I retrieve the data, I convert it to JSON, with a call like this
> > (where json_test_case_method is a recursive function I defined):
>
> > "methods":[json_test_case_method(method) for method in
> >                   test_case.methods]}
>
> > The hierarchical relationship is defined thusly:
>
> > TestCaseMethod.children = relation(TestCaseMethod, cascade="all",
> >                                   backref=backref("parent",
>
> > remote_side=[TestCaseMethod.id]),
> >                                   order_by=TestCaseMethod.id)
>
> > I've done zero SQLAlchemy optimization in my career thus far, and very
> > little ORM optimization (a bit with Hibernate several years ago), so
> > any advice is appreciated. :)
>
> You want to use "subqueryload()" or alternatively "joinedload()".   The 
> former is probably best as it will emit one query per level of hierarchy, for 
> the whole load, fetching only as many rows as there are nodes.   joinedload() 
> can load the whole thing in one query but it will see many duplicate parent 
> rows go by in result sets, which most DBAPIs pull over the wire 
> unconditionally even if skipped.
>
> That said, even if you were emitting a distinct query for every node (which 
> would be the case if your tree were one root node 60 levels deep), 9 seconds 
> sounds very slow.  On my macbookpro, I just ran a test that emits 60 distinct 
> ORM queries for unloaded objects, each with a string field 1000 characters in 
> length, and it takes .03 seconds.   So you might want to poke around at your 
> network or some other logic that's taking way too much time in there 
> (profiling would be a good start - here's a big post I wrote about it 
> -http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalc...).
>
>
>
>
>
>
>
>
>
> > Thanks,
> >Alec
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "sqlalchemy" group.
> > To post to this group, send email to sqlalch...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > sqlalchemy+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Re: Optimizing performance of hierarchical data structure loading

Reply via email to