2012/5/3 Attilio Rao <atti...@freebsd.org>: > 2012/5/3 Konstantin Belousov <kostik...@gmail.com>: >> On Thu, May 03, 2012 at 10:06:53PM +0100, Attilio Rao wrote: >>> 2012/5/3 Konstantin Belousov <kostik...@gmail.com>: >>> > On Thu, May 03, 2012 at 02:14:20PM +0100, Attilio Rao wrote: >>> >> 2012/5/3, Konstantin Belousov <kostik...@gmail.com>: >>> >> > On Thu, May 03, 2012 at 12:02:08PM +0100, Attilio Rao wrote: >>> >> >> 2012/5/3, Konstantin Belousov <k...@freebsd.org>: >>> >> >> > Author: kib >>> >> >> > Date: Thu May 3 10:38:02 2012 >>> >> >> > New Revision: 234952 >>> >> >> > URL: http://svn.freebsd.org/changeset/base/234952 >>> >> >> > >>> >> >> > Log: >>> >> >> > When callout_reset_on() cannot immediately migrate a callout >>> >> >> > since it >>> >> >> > is running on other cpu, the CALLOUT_PENDING flag is temporarily >>> >> >> > cleared. Then, callout_stop() on this, in fact active, callout >>> >> >> > fails >>> >> >> > because CALLOUT_PENDING is not set, and callout_stop() returns 0. >>> >> >> > >>> >> >> > Now, in sleepq_check_timeout(), the failed callout_stop() causes >>> >> >> > the >>> >> >> > sleepq code to execute mi_switch() without even setting the wmesg, >>> >> >> > since the switch-out is supposed to be transient. In fact, the >>> >> >> > thread >>> >> >> > is put off the CPU for full timeout interval, instead of being >>> >> >> > put on >>> >> >> > runq immediately. Until timeout fires, the process is unkillable >>> >> >> > for >>> >> >> > obvious reasons. >>> >> >> > >>> >> >> > Fix this by marking the migrating callouts with >>> >> >> > CALLOUT_DFRMIGRATION >>> >> >> > flag. The flag is cleared by callout_stop_safe() when the function >>> >> >> > detects a migration, besides returning the success. The >>> >> >> > softclock() >>> >> >> > rechecks the flag for migrating callout and cancels its execution >>> >> >> > if >>> >> >> > the flag was cleared meantime. >>> >> >> >>> >> >> Can you please clarify why you cannot simply drop the deferred >>> >> >> migration in the case !CALLOUT_PENDING in callout_stop_safe()? >>> >> > >>> >> > I probably can, I think I went with the route of committed patch >>> >> > because it is slightly less work. Also, the comment in the while() >>> >> > loop suggested me to rely on softclock. >>> >> >>> >> I don't think this is more work at all, the attached patch >>> >> (pre-r234952, untested) should address it properly in few than 10 >>> >> lines: >>> >> http://www.freebsd.org/~attilio/callout_cancel_mig_stop.patch >>> >> >>> >> without the need to add further flags and re-using existing mechanisms. >>> > >>> > (cc->cc_curr != c) is not the case which caused the issue. It might be >>> > needed to treatened this way, but the reported case is opposite. >>> >>> Yes, of course, because the migration handover happens in the same >>> critical context of cc->cc_curr == c, but now I wonder if this fix is >>> really right. >>> >>> It seems to me that in the case you describe callout_stop() must >>> return 0 and the migration must not be cancelled because the callout >>> is not stopped. It is not stopped not because of the deferred >>> migration but because cc->cc_curr == c. It seems a perfectly valid >>> situation to me. >> Yes, and my patch makes the callout to be indeed stopped right after >> migration is finished. Did you looked at the patch itself ? >> >> What is the valid situation ? callout_stop returning 0 but not stopping >> a pending callout ? I have to disagree. > > The function callout_stop() cancels a callout if it is currently pending. > If the callout is pending, then callout_stop() will return a non-zero > value. If the callout is not set, has already been serviced or is cur‐ > rently being serviced, then zero will be returned. If the callout has an > associated mutex, then that mutex must be held when this function is > called. > > [ From the callout manpage ] > > If the "callout is currently being serviced" means cc->cc_curr == c > and it must return 0.
Elaborating some more, I see a discrepancy here in the callout interface, which is also present pre-your patch and pre-migration delay. Basically, immagine a callout rearmed during its callback (pretty typical) and a callout_stop() running just after the callout has been rearmed and it is *still* in the callback. What we find is that CALLOUT_PENDING is on and that cc_curr == cc. I don't think that the callout should stop successfully in this case. However, because of how _callout_stop_safe() is written, CALLOUT_PENDING check has precedence and wins, returning 1 and removing the CALLOUT_PENDING flag, but please note that the callback is still running (even if only for little time). I think this generally works ok because most of the callout callbacks rearm the callout as last thing in their operation. But I think this is highly fragile and we cannot really rely on this feature. You are seeing a problem in the deferred migration case because it does the other way around, it prefers the check over cc_curr == cc to the "pending" (callout migration in this case). I think this is only a problem, also, with callouts which don't have a lock associated with them, like the sleepqueue case, because otherwise the interlock would leave the state consistent. I think we need to think carefully about a pattern for this case that deals with all the races, I need to give this more thinking, but definitively it seems to me we need a patch at the callout policy level. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"