[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

jhrozek Tue, 13 Dec 2016 08:25:22 -0800

  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler


jhrozek commented:
"""
On Tue, Dec 13, 2016 at 08:16:33AM -0800, Simo Sorce wrote:
> On Tue, 2016-12-13 at 08:02 -0800, Jakub Hrozek wrote:
> > On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote:
> > > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote:
> > > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> > > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > > > > > Pavel,
> > > > > > 
> > > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > > > > > <notificati...@github.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > There are two scenarios:
> > > > > > >
> > > > > > >    1. timeshift during system boot -- it is very common to be 
> > > > > > > several
> > > > > > >    hours
> > > > > > >    2. timeshift due to an ntp update when booted up -- usually 
> > > > > > > only few
> > > > > > >    seconds, not a big deal
> > > > > > >
> > > > > > > The problem with tevent timer is that if we shift backwards the 
> > > > > > > timer
> > > > > > > remains too far in the future. This applies to all timers, not 
> > > > > > > only for the
> > > > > > > watchdog. Forward shift is not a problem, it just executes the 
> > > > > > > timers
> > > > > > > immediately. Resetting the watchdog helps in a way that sssd is 
> > > > > > > not killed,
> > > > > > > we don't have any capability to reschedule all timed event and we 
> > > > > > > actually
> > > > > > > can not tell that sssd will be functioning properly (dyndns, sudo 
> > > > > > > refresh,
> > > > > > > enumeration, domain refresh, even idle timer on socket 
> > > > > > > activation)... all
> > > > > > > those operations that depends on time() would become unreliable.
> > > > > > >
> > > > > > > I think the best thing to do would be restart the process 
> > > > > > > (although the
> > > > > > > question is how would this affect the boot up) and patch tevent 
> > > > > > > to deal
> > > > > > > with timeshift either by using monotonic clock or by detecting 
> > > > > > > them and
> > > > > > > altering timers accordingly.
> > > > > > >
> > > > > > 
> > > > > > In the latest version of patch I've just called _exit(1) when the 
> > > > > > timeshift
> > > > > > is detected.
> > > > > > About patching tevent, I've seen some old discussions happening and 
> > > > > > it
> > > > > > doesn't seem a trivial thing to do. Would the patch, as it is right 
> > > > > > now, be
> > > > > > acceptable and then a work on tevent could be done later (yes, I'd 
> > > > > > add it
> > > > > > to my queue and do it as soon as we have an agreement on doing 
> > > > > > this)?
> > > > > 
> > > > > This is really a blunt tool (calling exit()), but until tevent can be
> > > > > fixed the only other option would be to use some wrapper to keep track
> > > > > of all existing timed events and cancel and restart them all if the
> > > > > clock changes abruptly.
> > > > 
> > > > that's why I suggested signaling self to a tevent-driven signal handler
> > > > from where we can just set up the timer anew. 
> > > > 
> > > > If there is any other way to 'break out' of the POSIX signal handler
> > > > into somewhere where we can call tevent/talloc (or in general unsafe
> > > > calls) I'm all ears.
> > > 
> > > I guess I need to understand better what exactly you want to do to be
> > > able to advice on something. I can think of a coulpe of options, none of
> > > them particularly elegant :)
> > 
> > OK, let me try to explain better.
> > 
> > A machine drifts time. Then an SSSD process receives SIGRT in
> > watchdog_handler() and detects the time has drifted, so it avoids
> > increasing the watchdog ticks counter -- this is done in
> > watchdog_detect_timeshift() at the moment.
> > 
> > At that point, in the current master, we call teardown_watchdog() and
> > setup_watchdog() to set a new watchdog (the part that is based on tevent
> > timers). This is unsafe to do in a signal handler because it involves
> > malloc and free among others called from tevent.
> > 
> > What I'm trying to figure out is how to reset the watchdog when I detect
> > in watchdog_detect_timeshift() the time is out of sync and the tevent
> > timer that resets the ticks will not arrive until the sssd process
> > receives enough SIGRT signals to get itself killed.
> > 
> > Does the question make sense now?
> 
> Yes,
> I see a few ways, a file descriptor (like tevent does for handling
> signals) used to fire a tevent_fd event that will perform these actions.
> Or a global variable, that is checked in the main loop at idle times and
> resets the watchdog, finally a mutex on which a dedicated thread waits,
> the thread only job is to run the reset if the mutex is released (but
> then the mutex needs to be re-acquired, probably on the next tick), but
> then again not sure if the code run in the setup_watchdog() is thread
> safe.

Thank you.

Without doing any coding, I like the idea of the fd the best.

"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266785347

_______________________________________________
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org

[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

Reply via email to