Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-28 Thread Dario Faggioli
On Wed, 2017-09-27 at 15:39 +0200, Dario Faggioli wrote:
> On Wed, 2017-09-27 at 04:30 -0600, Jan Beulich wrote:
> >  Or wait -
> > wouldn't all you need be to avoid calling stop_timer() in the
> > call tree above, if the timer's expiry has passed (suitably
> > explained in a comment)?
> > 
> Yes. For the reason stated above, I addressed the problem at the
> generic code level. If that doesn't fly, I'll do like this. I had
> thought about that, and although I haven't tried, I think it works
> for
> this case.
> 
Anyway, given the timing, I'll send a v2 of this series, where I do
things as suggested above.

I may want to try again to persuade you (and others) that this needs to
be changed in timer's code. But that will be for another time. :-)

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-27 Thread Dario Faggioli
On Wed, 2017-09-27 at 04:30 -0600, Jan Beulich wrote:
> > > > On 27.09.17 at 12:18,  wrote:
> 
> > And that is because the following happens:
> > - the CPU wants to go idle
> > - sched_tick_suspend()
> > rcu_idle_timer_start()
> >   set_timer(RCU_idle_timer)
> > - the CPU goes idle
> >   ... ... ...
> > - RCU_idle_timer's IRQ arrives
> > - the CPU wakes up
> > - raise_softirq(TIMER_SOFTIRQ)
> > - sched_tick_resume()
> > rcu_idle_timer_stop()
> >   stop_timer(RCU_idle_timer)
> > deactivate_timer(RCU_idle_timer)
> >   remove_entry(RCU_idle_timer) // timer out of heap/list
> > - do_softirq() (we are inside idle_loop())
> > - softirq_handlers[TIMER_SOFTIRQ]()
> > - timer_softirq_action()
> > // but the timer is not in the heap/list!
> 
> But this is an extremely special case, not something likely to
> happen anywhere else. Hence I wonder whether it wouldn't
> be better to handle the special case in a special way, rather
> than making generic code fit the special case.
>
Well, yes. As said, this "new" timer is the first, and for now the
only, that follow this pattern. And I also agree that this is not
something we must expect to see to happen much more (if at all).

Still, I continue to think that with a timer already expired, its IRQ
already delivered and handled and the relative TIMER_SOFTIRQ already
risen, we should arrange for the timer handler to run, in the general
case.

>  Or wait -
> wouldn't all you need be to avoid calling stop_timer() in the
> call tree above, if the timer's expiry has passed (suitably
> explained in a comment)?
> 
Yes. For the reason stated above, I addressed the problem at the
generic code level. If that doesn't fly, I'll do like this. I had
thought about that, and although I haven't tried, I think it works for
this case.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-27 Thread Jan Beulich
>>> On 27.09.17 at 12:18,  wrote:
> On Wed, 2017-09-27 at 02:20 -0600, Jan Beulich wrote:
>> In the end what I think I'm missing is a clear description of an
>> actual
>> case where the current behavior causes breakage (plus of course
>> an explanation why the new behavior is unlikely to cause issues with
>> existing users).
>> 
> So, the problem is that the handler of the RCU idle_timer, introduced
> by 2b936ea7b716dc1a13c ("xen: RCU: avoid busy waiting until the end of
> grace period."), never runs.
> 
> And that is because the following happens:
> - the CPU wants to go idle
> - sched_tick_suspend()
> rcu_idle_timer_start()
>   set_timer(RCU_idle_timer)
> - the CPU goes idle
>   ... ... ...
> - RCU_idle_timer's IRQ arrives
> - the CPU wakes up
> - raise_softirq(TIMER_SOFTIRQ)
> - sched_tick_resume()
> rcu_idle_timer_stop()
>   stop_timer(RCU_idle_timer)
> deactivate_timer(RCU_idle_timer)
>   remove_entry(RCU_idle_timer) // timer out of heap/list
> - do_softirq() (we are inside idle_loop())
> - softirq_handlers[TIMER_SOFTIRQ]()
> - timer_softirq_action()
> // but the timer is not in the heap/list!

But this is an extremely special case, not something likely to
happen anywhere else. Hence I wonder whether it wouldn't
be better to handle the special case in a special way, rather
than making generic code fit the special case. Or wait -
wouldn't all you need be to avoid calling stop_timer() in the
call tree above, if the timer's expiry has passed (suitably
explained in a comment)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-27 Thread Dario Faggioli
On Wed, 2017-09-27 at 02:20 -0600, Jan Beulich wrote:
> > > > On 26.09.17 at 20:11,  wrote:
> > it means there is an event that *is* in progress right now (i.e.,
> > we're
> > stopping the timer on the path that goes from the interrupt that
> > raised
> > TIMER_SOFTIRQ, and the timer softirq handler).
> > 
> > So, basically, I'm trying to achieve exactly what you call
> > reasonable:
> > servicing an event which is in progress. :-)
> 
> I'm afraid I don't understand - if the processing of the timer is
> already in progress, why would you need to raise another
> softirq? Wouldn't it suffice to simply skip deactivate_timer() then?
>
Ah, yes! In the use case I'm trying to fix, that's indeed not
necessary, as it has already been risen.

Basically, as it's harmless to re-raise it, I thought to do so, with
the aim of making it easier to understand what the code was trying to
achieve. But now I actually agree with you that the effect is quite the
opposite. :-(

I will get rid of the re-raising, and explain the situation better in
the both the comment and the changelog.

> This raising of the softirq is what made me imply that, perhaps
> due to some minimal skew e.g. resulting from rounding, you
> assume the interrupt did not fire yet, which I'd then call the
> timer event not being in progress (yet).
> 
Sure, I totally see it now, and I also totally agree.

> In the end what I think I'm missing is a clear description of an
> actual
> case where the current behavior causes breakage (plus of course
> an explanation why the new behavior is unlikely to cause issues with
> existing users).
> 
So, the problem is that the handler of the RCU idle_timer, introduced
by 2b936ea7b716dc1a13c ("xen: RCU: avoid busy waiting until the end of
grace period."), never runs.

And that is because the following happens:
- the CPU wants to go idle
- sched_tick_suspend()
rcu_idle_timer_start()
  set_timer(RCU_idle_timer)
- the CPU goes idle
  ... ... ...
- RCU_idle_timer's IRQ arrives
- the CPU wakes up
- raise_softirq(TIMER_SOFTIRQ)
- sched_tick_resume()
rcu_idle_timer_stop()
  stop_timer(RCU_idle_timer)
deactivate_timer(RCU_idle_timer)
  remove_entry(RCU_idle_timer) // timer out of heap/list
- do_softirq() (we are inside idle_loop())
- softirq_handlers[TIMER_SOFTIRQ]()
- timer_softirq_action()
// but the timer is not in the heap/list!

Now, as far as the purpose of that patch goes, we're fine. In fact,
there, we "only" needed to avoid that a certain CPU (because of its
role in the grace period) would sleep too long/indefinitely. And the
fact that the CPU does wake up, because of the timer interrupt, is
enough.

However, it's been asked to try to make the logic a bit more clever,
basically for preventing RCU_idle_timer from firing too often. And
that's actually what this series is doing. But now, in order to achieve
that, I do need the timer handler to run.


About the (lack of) breakage of existing use cases. Well, hard to tell
like this, but I'd say that, if, right now, we are not missing any
timer event handling, it means that this situation --where you stop the
timer in between raise_softirq(TIMER_SOFTIRQ) and
softirq_handler[TIMER_SOFTIRQ]()-- never happens.

Inspecting the code, in fact, I can't spot any stop_timer() happening
within that window, which I think it means we're fine (IOW,
RCU_idle_timer is the first, and for now only, timer that is stopped on
the CPU wake-up-from-idle path).

It is important (in the new version of this patch) for deactivation to
be skipped only in stop_timer(), and not, e.g., in kill_timer(). As, if
someone, in future, will want to kill and free the timer during the
window, then in that case the handler *must* not run.

Makes sense?

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-27 Thread Jan Beulich
>>> On 26.09.17 at 20:11,  wrote:
> On Tue, 2017-09-26 at 08:59 -0600, Jan Beulich wrote:
>> > > > On 15.09.17 at 20:01,  wrote:
>> > @@ -326,7 +326,17 @@ void stop_timer(struct timer *timer)
>> >  return;
>> >  
>> >  if ( active_timer(timer) )
>> > -deactivate_timer(timer);
>> > +{
>> > +/*
>> > + * If the timer is expired already, 'call' the softirq
>> > handler to
>> > + * execute it (it will leave it inactive after that). If
>> > not, just
>> > + * deactivate it.
>> > + */
>> > +if ( timer->expires <= NOW() )
>> > +cpu_raise_softirq(timer->cpu, TIMER_SOFTIRQ);
>> > +else
>> > +deactivate_timer(timer);
>> > +}
>> 
>> Isn't it reasonable to expect current behavior, i.e. the timer not
>> raising any events other than those already in progress?
>> 
> Well, if we managed to get to here, with the timer that is both:
> - active,
> - with its expiry time in the past,
> 
> it means there is an event that *is* in progress right now (i.e., we're
> stopping the timer on the path that goes from the interrupt that raised
> TIMER_SOFTIRQ, and the timer softirq handler).
> 
> So, basically, I'm trying to achieve exactly what you call reasonable:
> servicing an event which is in progress. :-)

I'm afraid I don't understand - if the processing of the timer is
already in progress, why would you need to raise another
softirq? Wouldn't it suffice to simply skip deactivate_timer() then?
This raising of the softirq is what made me imply that, perhaps
due to some minimal skew e.g. resulting from rounding, you
assume the interrupt did not fire yet, which I'd then call the
timer event not being in progress (yet).

> The alternative is that the event happened, but we somehow managed to
> miss running the timer handler for it, and we realize this only now,
> potentially a long time after the miss. This scenario, as far as I can
> tell, can't happen, but if it could/does, I'd still say running the
> handler late is better than not running it at all.

Well, as said - what's better may very well depend on the particular
use case. As long as it's not clearly written down what the intended
behavior is, both behaviors are acceptable imo.

In the end what I think I'm missing is a clear description of an actual
case where the current behavior causes breakage (plus of course
an explanation why the new behavior is unlikely to cause issues with
existing users).

>> Additionally I'm not convinced the changed actually does what you
>> want: What if NOW() becomes equal to timer->expires immediately
>> after you've managed to obtain its value, before execution makes
>> it into deactivate_timer()? IOW you're shrinking a time window which
>> (I think) you really mean to eliminate.
>> 
> Well, I certainly don't like the window to be there, and closing it
> would be ideal, IMO.
> 
> However, in this patch, I'm addressing the specific situation of when
> we are stopping a timer for which an interrupt has already fired, the
> interrupt's ISR has already raised TIMER_SOFTIRQ, and yet we don't run
> the handler.
> 
> Yes, if an interrupt is about to be raised, and/or it arrives _while_
> we are inside this function (after the check), or already in
> deactivate_timer(), we also end up not running the handler.
> 
> I guess these can be seen as two aspects of the same problem, or as
> conceptually different issues, but whatever you consider them, getting
> rid of the former is something I consider an improvement.

It may be improvement, yes, but if there is
- no actual case breaking with the current code, I don't see
  the need for the change,
- an actual case breaking with the current code, it'll still break
  with the change as the window was merely shrunk.

>> Furthermore, taking both changes together: What if the same
>> you try to address in stop_timer() happens in set_timer()? Wouldn't
>> it then be only consistent to also require a timer even to fire for
>> the
>> old expiry value, before the new one is being set?
>> 
> Yes, personally, I think that, whenever it is that we figure out that
> we missed handling a timer interrupt, we should run it then.
> 
> Unfortunately, for achieving this, e.g., in the set_timer() case, I
> don't see much alternatives to call execute_timer() right in there. But
> that would violate the current invariant that execute_timer() only run
> from the TIMER_SOFTIRQ handler... which is probably doable, but is at
> the same time unrelated to the problem I'm tackling here, and I would
> rather do it in a dedicated series.

Indeed forcing the handler to run in the set_timer() case is more
difficult, yet as said - if there's a window to be concerned about,
the concern should be regarding all such windows, or none of
them imo.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-26 Thread Dario Faggioli
On Tue, 2017-09-26 at 08:59 -0600, Jan Beulich wrote:
> > > > On 15.09.17 at 20:01,  wrote:
> > --- a/xen/common/timer.c
> > +++ b/xen/common/timer.c
> > @@ -217,7 +217,7 @@ static inline void activate_timer(struct timer
> > *timer)
> >  timer->status = TIMER_STATUS_invalid;
> >  list_del(&timer->inactive);
> >  
> > -if ( add_entry(timer) )
> > +if ( add_entry(timer) || timer->expires <= NOW() )
> >  cpu_raise_softirq(timer->cpu, TIMER_SOFTIRQ);
> >  }
> 
> I'm not convinced of this change - it's unrelated to what title and
> description say, 
>
You're right. This should either be mentioned, or dropped (or live in
another patch).

> > @@ -326,7 +326,17 @@ void stop_timer(struct timer *timer)
> >  return;
> >  
> >  if ( active_timer(timer) )
> > -deactivate_timer(timer);
> > +{
> > +/*
> > + * If the timer is expired already, 'call' the softirq
> > handler to
> > + * execute it (it will leave it inactive after that). If
> > not, just
> > + * deactivate it.
> > + */
> > +if ( timer->expires <= NOW() )
> > +cpu_raise_softirq(timer->cpu, TIMER_SOFTIRQ);
> > +else
> > +deactivate_timer(timer);
> > +}
> 
> Isn't it reasonable to expect current behavior, i.e. the timer not
> raising any events other than those already in progress?
> 
Well, if we managed to get to here, with the timer that is both:
- active,
- with its expiry time in the past,

it means there is an event that *is* in progress right now (i.e., we're
stopping the timer on the path that goes from the interrupt that raised
TIMER_SOFTIRQ, and the timer softirq handler).

So, basically, I'm trying to achieve exactly what you call reasonable:
servicing an event which is in progress. :-)

The alternative is that the event happened, but we somehow managed to
miss running the timer handler for it, and we realize this only now,
potentially a long time after the miss. This scenario, as far as I can
tell, can't happen, but if it could/does, I'd still say running the
handler late is better than not running it at all.

> Additionally I'm not convinced the changed actually does what you
> want: What if NOW() becomes equal to timer->expires immediately
> after you've managed to obtain its value, before execution makes
> it into deactivate_timer()? IOW you're shrinking a time window which
> (I think) you really mean to eliminate.
> 
Well, I certainly don't like the window to be there, and closing it
would be ideal, IMO.

However, in this patch, I'm addressing the specific situation of when
we are stopping a timer for which an interrupt has already fired, the
interrupt's ISR has already raised TIMER_SOFTIRQ, and yet we don't run
the handler.

Yes, if an interrupt is about to be raised, and/or it arrives _while_
we are inside this function (after the check), or already in
deactivate_timer(), we also end up not running the handler.

I guess these can be seen as two aspects of the same problem, or as
conceptually different issues, but whatever you consider them, getting
rid of the former is something I consider an improvement.

I certainly can try to state the problem better, and describe the
situation more clearly in the changelog.

> Furthermore, taking both changes together: What if the same
> you try to address in stop_timer() happens in set_timer()? Wouldn't
> it then be only consistent to also require a timer even to fire for
> the
> old expiry value, before the new one is being set?
> 
Yes, personally, I think that, whenever it is that we figure out that
we missed handling a timer interrupt, we should run it then.

Unfortunately, for achieving this, e.g., in the set_timer() case, I
don't see much alternatives to call execute_timer() right in there. But
that would violate the current invariant that execute_timer() only run
from the TIMER_SOFTIRQ handler... which is probably doable, but is at
the same time unrelated to the problem I'm tackling here, and I would
rather do it in a dedicated series.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen: timers: don't miss a timer event because of stop_timer()

2017-09-26 Thread Jan Beulich
>>> On 15.09.17 at 20:01,  wrote:
> --- a/xen/common/timer.c
> +++ b/xen/common/timer.c
> @@ -217,7 +217,7 @@ static inline void activate_timer(struct timer *timer)
>  timer->status = TIMER_STATUS_invalid;
>  list_del(&timer->inactive);
>  
> -if ( add_entry(timer) )
> +if ( add_entry(timer) || timer->expires <= NOW() )
>  cpu_raise_softirq(timer->cpu, TIMER_SOFTIRQ);
>  }

I'm not convinced of this change - it's unrelated to what title and
description say, and I'm not sure a timer being activated in a way
possibly clashing with its expiry is actually intended to fire.

> @@ -326,7 +326,17 @@ void stop_timer(struct timer *timer)
>  return;
>  
>  if ( active_timer(timer) )
> -deactivate_timer(timer);
> +{
> +/*
> + * If the timer is expired already, 'call' the softirq handler to
> + * execute it (it will leave it inactive after that). If not, just
> + * deactivate it.
> + */
> +if ( timer->expires <= NOW() )
> +cpu_raise_softirq(timer->cpu, TIMER_SOFTIRQ);
> +else
> +deactivate_timer(timer);
> +}

Isn't it reasonable to expect current behavior, i.e. the timer not
raising any events other than those already in progress?

Additionally I'm not convinced the changed actually does what you
want: What if NOW() becomes equal to timer->expires immediately
after you've managed to obtain its value, before execution makes
it into deactivate_timer()? IOW you're shrinking a time window which
(I think) you really mean to eliminate.

Furthermore, taking both changes together: What if the same
you try to address in stop_timer() happens in set_timer()? Wouldn't
it then be only consistent to also require a timer even to fire for the
old expiry value, before the new one is being set?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel