Hi,
I've got a problem that takes a bit of explaining, but it's relatively
simple when you get down to it. This is another django-related thing,
but the issue itself is pure python.
I made a custom class, called an EventEngine, which represents a span
of time. You initialize it with a queryset/list of django model
instances, and the resulting EventEngine object has an 'events'
attribute, which is a queryset/list of model instances that fall
within that span of time, according to a datetime attribute on the
django model.
Next come EventEngine subclasses, representing commonly used spans of
time – Month, Week, Day, etc. The EventEngine __iter__ method is
overridden so that each of these subclasses, when iterated over,
produces child instances of the next smallest EventEngine subclass.
Iterating on a Month produces Weeks, iterating on a Week produces Days
and so on.
As each parent produces its children, the events in the 'events'
attribute get partitioned out to the children as appropriate to their
date ranges. Right now I'm doing this as stupidly as possible: for
each child 'c' in the parent, I loop over all the parent events, and
append them to c.events if their date attribute falls within the
child's date range:
for e in self.events:
if c.start <= getattr(e,self.start_attr) < c.stop:
c.events.append(e)
This is stupid for (at least) two reasons: 1) it evaluates the entire
queryset and sticks it in memory with the first child instance,
obviating the whole point of an iterable, and 2) it loops over the
entire event list for every child, even though the list is already
properly sorted by date, and we could break from the loop with the
first event that falls outside the child's date range. Performance is
important, as some madman could initialize a Year with a huge
queryset, and do nested iterations all the way down to Minute.
So I've got a few considerations:
1. The parent's events queryset should only be evaluated as much as is
needed to yield one child at a time.
2. The parent's events queryset should not be duplicated as it is
distributed among the children, ie it should make references and not
copies (I don't know how to make sure this happens).
3. I don't want to empty the parent's 'events' list as it is
distributed among the children, as other methods depend on the list
being there.
4. The events attribute of the 'top level' instance is a django
queryset, but it will be a list for all descendent instances (they can
be nested arbitrarily deeply). This limits the number of functions
available – ie you can't pop() a django queryset.
At first I thought the bisect module was the way to go, but it is too
tightly tied to integer list indices, and works very awkwardly when
bisecting on datetime attributes. List slices produce copies, unless
I'm mistaken. Something tells me that a plain old while loop with a
break might do the trick, but I can't quite see it clear. Ideally this
would be something that starts iterating where the date attribute
falls into the right range, and then stops iterating as soon as it
passes out of that range. Probably I'll have to settle for just one or
the other, but one can dream.
Thanks for reading all this. I'm hoping that all the restrictions and
conditions will make the solution more obvious, rather than less
obvious, to those who know more than I...
--Eric
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor