Hi Poul and all, Nils Goroll wrote: >>> What I would really like to see is that the waitinglist gets rescheduled >>> when >>> the busy object is actually becomes in the cache. I am suspecting this has >>> to do >>> with calling HSH_Deref(&Parent) in HSH_Unbusy and/or the fact that HSH_Drop >>> calls both Unbusy and Deref, but I don't understand this yet. >> That is how it is supposed to work, and I belive, how it works. > > Good. Then I am either messing up this behavior with my config, or I've hit a > corner case. I need to have a break now, but I will definitely get back to > you > on this when I have gained new insights.
I'm trying to sort my thoughts on this in public: - A fundamental issue seems to be that the waitinglist is attached to the object head, and if no proper match is found in the cache, we wait for whatever is to come, even if this is not what we are going to need. On the other hand, while the object is busy, not all selection criteria will be known a priori (in particular not the Vary header), so this design might just be as good as it can be. - The only way a session can get onto the waiting list is when there is a busy object being waited for - but hsh_rush is not only called when an object gets unbusied (HSH_Unbusy), but also whenever is it dereferenced (HSH_Deref) Call trees are: cnt_fetch -> HSH_Unbusy->hsh_rush ^ | / | HSH_Drop (parent) \ | V V HSH_Deref->hsh_rush HSH_Deref is called from cache_expire EXP_NukeOne and exp_timer, as well as cache_center cnt_hit (if not delivering), cnt_lookup (if it's a pass) and cnt_deliver. HSH_Drop is called from various functions in cache_center. So basically there are two different scenarios when hsh_rush is called. * Trigger delivery of an object which just got unbusied * and trigger delivery of more sessions which did not fire in the first round The point is that when many sessions are waiting on a busy object, there are many reasons for those to be rescheduled even if the object they are waiting for has not yet become available - in particular as many different objects may live under the object head. I think we need to change that. The only reason why we need to call hsh_rush outside cnt_fetch->HSH_Unbusy case is that we have the rush_exponent and limit the number of sessions to be rescheduled with each hsh_rush, so one option would be to do away with the rush_exponent and the the waiters loose all at once. This would also solve the case where, once a session get its thread, the cached content has become invalidated so it would itself fetch again. I am not sure about an alternative solution. When we unbusy an object, we have a good chance that it's actually worth rescheduling waiting sessions, but for the other cases, we can't easily tell if the session would wait again or not. What if we noted in the object head the number of busy objects so hsh_rush would only actually schedule sessions if there aren't any or when called from cnt_fetch? Any better ideas? Thank you for reading, Nils _______________________________________________ varnish-dev mailing list varnish-dev@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-dev