Hi Hans, On 26/08/15 10:12, Hans Petter Selasky wrote: > On 08/26/15 09:25, Hans Petter Selasky wrote: >> On 08/18/15 12:15, Julien Charbon wrote: >>> Author: jch >>> Date: Tue Aug 18 10:15:09 2015 >>> New Revision: 286880 >>> URL: https://svnweb.freebsd.org/changeset/base/286880 >>> >>> Log: >>> callout_stop() should return 0 (fail) when the callout is currently >>> being serviced and indeed unstoppable. >>> >>> A scenario to reproduce this case is: >>> >>> - the callout is being serviced and at same time, >>> - callout_reset() is called on this callout that sets >>> the CALLOUT_PENDING flag and at same time, >>> - callout_stop() is called on this callout and returns 1 (success) >>> even if the callout is indeed currently running and unstoppable. >>> >>> This issue was caught up while making r284245 (D2763) workaround, and >>> was discussed at BSDCan 2015. Once applied the r284245 workaround >>> is not needed anymore and will be reverted. >>> >>> Differential Revision: https://reviews.freebsd.org/D3078 >>> Reviewed by: jhb >>> Sponsored by: Verisign, Inc. >>> >>> Modified: >>> head/sys/kern/kern_timeout.c >>> >>> Modified: head/sys/kern/kern_timeout.c >>> ============================================================================== >>> >>> >>> --- head/sys/kern/kern_timeout.c Tue Aug 18 10:07:03 2015 >>> (r286879) >>> +++ head/sys/kern/kern_timeout.c Tue Aug 18 10:15:09 2015 >>> (r286880) >>> @@ -1150,7 +1150,7 @@ _callout_stop_safe(struct callout *c, in >>> struct callout_cpu *cc, *old_cc; >>> struct lock_class *class; >>> int direct, sq_locked, use_lock; >>> - int not_on_a_list; >>> + int not_on_a_list, not_running; >>> >>> if (safe) >>> WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, c->c_lock, >>> @@ -1378,8 +1378,15 @@ again: >>> } >>> } >>> callout_cc_del(c, cc); >>> + >>> + /* >>> + * If we are asked to stop a callout which is currently in progress >>> + * and indeed impossible to stop then return 0. >>> + */ >>> + not_running = !(cc_exec_curr(cc, direct) == c); >>> + >>> CC_UNLOCK(cc); >>> - return (1); >>> + return (not_running); >>> } >>> >>> void > > I'm sorry to say, but I think this change should be backed out as it > changes the behaviour of the callout API. The old code was correct. The > issues should be fixed in the TCP stack instead, like suggested by D1563 > and also r284245. > > This patch introduces new behaviour to the callout_stop() function, > which is contradicting "man 9 callout" and _not_ documented anywhere! > > Reading "man 9 callout" in 11-current: > >> The callout_reset() and callout_schedule() function families >> schedule a >> future function invocation for callout c. If c already has a >> pending >> callout, it is cancelled before the new invocation is scheduled. > > callout_reset() causes a _new_ callout invocation and it is logical that > the return value of callout_stop() reflect operations on the _new_ > callout invocation and not the previous one. > >> The function callout_stop() cancels a callout c if it is >> currently pend- >> ing. If the callout is pending, then callout_stop() returns a >> non-zero >> value. If the callout is not set, has already been serviced, or >> is cur- >> rently being serviced, then zero will be returned. If the >> callout has an >> associated lock, then that lock must be held when this function is >> called. > > If there are two callout invocations that callout_stop() needs to > handle, it should be called twice. > > The correct way is: > > callout_reset(); > callout_stop(); > callout_drain(); > > For the TCP stack's matter, it should be: > > callout_reset(); > callout_stop(); /* cancel _new_ callout invocation */ > callout_stop(); /* check for any pending callout invokation */
First thank for your time. I indeed agree with your analysis and I am not opposed to back out this change. The border between bug or feature can indeed be thin; below why I was more on "bug" side: o Pro "It is a feature": - This behavior is here since the beginning and nobody ever complains (Big one) o Pro "It is a bug": - Having callout_stop() returning 1 (success) while having the callout currently being serviced is counter intuitive. - You cannot call callout_stop() twice to address this case: callout_reset(); callout_stop(); /* cancel _new_ callout invocation */ callout_stop(); /* check for any pending callout invokation */ Because the second callout_stop() will always return 0 (Fail). In details: If the callout is currently being serviced (without r286880 i.e. the classical behavior): callout_reset() returns 0 (Fail) (PENDING flag set) callout_stop() returns 1 (Success) (PENDING flag removed) callout_stop() returns 0 (Always fail because not PENDING flag) In mpsafe mode, the only way a callout_stop() can return 1 (Success) is with the PENDING flags set. The way I found to address this case is with: fail_reset = !callout_reset(); fail_stop = !callout_stop(); if (fail_reset || fail_stop) { /* Callout not stopped */ } This is what I did in r284245: https://svnweb.freebsd.org/base/head/sys/netinet/tcp_timer.c?r1=284245&r2=284244&pathrev=284245 And it felt wrong to me because callout_reset() and callout_stop() calls can be far away. o Pro "it is 'neither'" (i.e. API unclear by design): In this case the same callout is indeed _both_ pending and currently being serviced. According to man page callout(9): (As you already noticed) If the callout is pending, then callout_stop() returns a non-zero ^^^^^^^^^^^^^^^^^^ value. Conclusion: It is a feature! And just after: If the callout is not set, has already been serviced, or is currently being serviced, then zero will be returned. ^^^^^^^^^^^^^^^^^^^^^^^^ Conclusion: It is a bug! Obviously, no indication about the case where the callout is indeed _both_ pending and currently being serviced. > Remember there are other OS'es out there using our TCP/IP stack which > do not have an identical callout subsystem. I was not aware of that. As I said, I am not opposed to back out this change, callout(9) API in mpsafe mode is a already complex/subtle API, it won't change too much the current complexity. Let say that if nobody screams until Friday 8/28, I will put back r284245 and revert this change _and_ I will make this case clear in the man page. -- Julien
signature.asc
Description: OpenPGP digital signature