URL: https://github.com/SSSD/sssd/pull/107 Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler
simo5 commented: """ On Tue, 2016-12-13 at 08:02 -0800, Jakub Hrozek wrote: > On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote: > > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote: > > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote: > > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote: > > > > > Pavel, > > > > > > > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina > > > > > <notificati...@github.com> > > > > > wrote: > > > > > > > > > > > There are two scenarios: > > > > > > > > > > > > 1. timeshift during system boot -- it is very common to be > > > > > > several > > > > > > hours > > > > > > 2. timeshift due to an ntp update when booted up -- usually only > > > > > > few > > > > > > seconds, not a big deal > > > > > > > > > > > > The problem with tevent timer is that if we shift backwards the > > > > > > timer > > > > > > remains too far in the future. This applies to all timers, not only > > > > > > for the > > > > > > watchdog. Forward shift is not a problem, it just executes the > > > > > > timers > > > > > > immediately. Resetting the watchdog helps in a way that sssd is not > > > > > > killed, > > > > > > we don't have any capability to reschedule all timed event and we > > > > > > actually > > > > > > can not tell that sssd will be functioning properly (dyndns, sudo > > > > > > refresh, > > > > > > enumeration, domain refresh, even idle timer on socket > > > > > > activation)... all > > > > > > those operations that depends on time() would become unreliable. > > > > > > > > > > > > I think the best thing to do would be restart the process (although > > > > > > the > > > > > > question is how would this affect the boot up) and patch tevent to > > > > > > deal > > > > > > with timeshift either by using monotonic clock or by detecting them > > > > > > and > > > > > > altering timers accordingly. > > > > > > > > > > > > > > > > In the latest version of patch I've just called _exit(1) when the > > > > > timeshift > > > > > is detected. > > > > > About patching tevent, I've seen some old discussions happening and it > > > > > doesn't seem a trivial thing to do. Would the patch, as it is right > > > > > now, be > > > > > acceptable and then a work on tevent could be done later (yes, I'd > > > > > add it > > > > > to my queue and do it as soon as we have an agreement on doing this)? > > > > > > > > This is really a blunt tool (calling exit()), but until tevent can be > > > > fixed the only other option would be to use some wrapper to keep track > > > > of all existing timed events and cancel and restart them all if the > > > > clock changes abruptly. > > > > > > that's why I suggested signaling self to a tevent-driven signal handler > > > from where we can just set up the timer anew. > > > > > > If there is any other way to 'break out' of the POSIX signal handler > > > into somewhere where we can call tevent/talloc (or in general unsafe > > > calls) I'm all ears. > > > > I guess I need to understand better what exactly you want to do to be > > able to advice on something. I can think of a coulpe of options, none of > > them particularly elegant :) > > OK, let me try to explain better. > > A machine drifts time. Then an SSSD process receives SIGRT in > watchdog_handler() and detects the time has drifted, so it avoids > increasing the watchdog ticks counter -- this is done in > watchdog_detect_timeshift() at the moment. > > At that point, in the current master, we call teardown_watchdog() and > setup_watchdog() to set a new watchdog (the part that is based on tevent > timers). This is unsafe to do in a signal handler because it involves > malloc and free among others called from tevent. > > What I'm trying to figure out is how to reset the watchdog when I detect > in watchdog_detect_timeshift() the time is out of sync and the tevent > timer that resets the ticks will not arrive until the sssd process > receives enough SIGRT signals to get itself killed. > > Does the question make sense now? Yes, I see a few ways, a file descriptor (like tevent does for handling signals) used to fire a tevent_fd event that will perform these actions. Or a global variable, that is checked in the main loop at idle times and resets the watchdog, finally a mutex on which a dedicated thread waits, the thread only job is to run the reset if the mutex is released (but then the mutex needs to be re-acquired, probably on the next tick), but then again not sure if the code run in the setup_watchdog() is thread safe. Simo. -- Simo Sorce * Red Hat, Inc * New York """ See the full comment at https://github.com/SSSD/sssd/pull/107#issuecomment-266782930
_______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org