[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

simo5 Tue, 13 Dec 2016 08:17:25 -0800

  URL: https://github.com/SSSD/sssd/pull/107
Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler


simo5 commented:
"""
On Tue, 2016-12-13 at 08:02 -0800, Jakub Hrozek wrote:
> On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote:
> > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote:
> > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote:
> > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote:
> > > > > Pavel,
> > > > > 
> > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina 
> > > > > <notificati...@github.com>
> > > > > wrote:
> > > > > 
> > > > > > There are two scenarios:
> > > > > >
> > > > > >    1. timeshift during system boot -- it is very common to be 
> > > > > > several
> > > > > >    hours
> > > > > >    2. timeshift due to an ntp update when booted up -- usually only 
> > > > > > few
> > > > > >    seconds, not a big deal
> > > > > >
> > > > > > The problem with tevent timer is that if we shift backwards the 
> > > > > > timer
> > > > > > remains too far in the future. This applies to all timers, not only 
> > > > > > for the
> > > > > > watchdog. Forward shift is not a problem, it just executes the 
> > > > > > timers
> > > > > > immediately. Resetting the watchdog helps in a way that sssd is not 
> > > > > > killed,
> > > > > > we don't have any capability to reschedule all timed event and we 
> > > > > > actually
> > > > > > can not tell that sssd will be functioning properly (dyndns, sudo 
> > > > > > refresh,
> > > > > > enumeration, domain refresh, even idle timer on socket 
> > > > > > activation)... all
> > > > > > those operations that depends on time() would become unreliable.
> > > > > >
> > > > > > I think the best thing to do would be restart the process (although 
> > > > > > the
> > > > > > question is how would this affect the boot up) and patch tevent to 
> > > > > > deal
> > > > > > with timeshift either by using monotonic clock or by detecting them 
> > > > > > and
> > > > > > altering timers accordingly.
> > > > > >
> > > > > 
> > > > > In the latest version of patch I've just called _exit(1) when the 
> > > > > timeshift
> > > > > is detected.
> > > > > About patching tevent, I've seen some old discussions happening and it
> > > > > doesn't seem a trivial thing to do. Would the patch, as it is right 
> > > > > now, be
> > > > > acceptable and then a work on tevent could be done later (yes, I'd 
> > > > > add it
> > > > > to my queue and do it as soon as we have an agreement on doing this)?
> > > > 
> > > > This is really a blunt tool (calling exit()), but until tevent can be
> > > > fixed the only other option would be to use some wrapper to keep track
> > > > of all existing timed events and cancel and restart them all if the
> > > > clock changes abruptly.
> > > 
> > > that's why I suggested signaling self to a tevent-driven signal handler
> > > from where we can just set up the timer anew. 
> > > 
> > > If there is any other way to 'break out' of the POSIX signal handler
> > > into somewhere where we can call tevent/talloc (or in general unsafe
> > > calls) I'm all ears.
> > 
> > I guess I need to understand better what exactly you want to do to be
> > able to advice on something. I can think of a coulpe of options, none of
> > them particularly elegant :)
> 
> OK, let me try to explain better.
> 
> A machine drifts time. Then an SSSD process receives SIGRT in
> watchdog_handler() and detects the time has drifted, so it avoids
> increasing the watchdog ticks counter -- this is done in
> watchdog_detect_timeshift() at the moment.
> 
> At that point, in the current master, we call teardown_watchdog() and
> setup_watchdog() to set a new watchdog (the part that is based on tevent
> timers). This is unsafe to do in a signal handler because it involves
> malloc and free among others called from tevent.
> 
> What I'm trying to figure out is how to reset the watchdog when I detect
> in watchdog_detect_timeshift() the time is out of sync and the tevent
> timer that resets the ticks will not arrive until the sssd process
> receives enough SIGRT signals to get itself killed.
> 
> Does the question make sense now?

Yes,
I see a few ways, a file descriptor (like tevent does for handling
signals) used to fire a tevent_fd event that will perform these actions.
Or a global variable, that is checked in the main loop at idle times and
resets the watchdog, finally a mutex on which a dedicated thread waits,
the thread only job is to run the reset if the mutex is released (but
then the mutex needs to be re-acquired, probably on the next tick), but
then again not sure if the code run in the setup_watchdog() is thread
safe.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York


"""

See the full comment at 
https://github.com/SSSD/sssd/pull/107#issuecomment-266782930

_______________________________________________
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org

[SSSD] [sssd PR#107][comment] WATCHDOG: Avoid non async-signal-safe from the signal_handler

Reply via email to