URL: https://github.com/SSSD/sssd/pull/107 Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler
jhrozek commented: """ On Tue, Dec 13, 2016 at 08:16:33AM -0800, Simo Sorce wrote: > On Tue, 2016-12-13 at 08:02 -0800, Jakub Hrozek wrote: > > On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote: > > > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote: > > > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote: > > > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote: > > > > > > Pavel, > > > > > > > > > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina > > > > > > <notificati...@github.com> > > > > > > wrote: > > > > > > > > > > > > > There are two scenarios: > > > > > > > > > > > > > > 1. timeshift during system boot -- it is very common to be > > > > > > > several > > > > > > > hours > > > > > > > 2. timeshift due to an ntp update when booted up -- usually > > > > > > > only few > > > > > > > seconds, not a big deal > > > > > > > > > > > > > > The problem with tevent timer is that if we shift backwards the > > > > > > > timer > > > > > > > remains too far in the future. This applies to all timers, not > > > > > > > only for the > > > > > > > watchdog. Forward shift is not a problem, it just executes the > > > > > > > timers > > > > > > > immediately. Resetting the watchdog helps in a way that sssd is > > > > > > > not killed, > > > > > > > we don't have any capability to reschedule all timed event and we > > > > > > > actually > > > > > > > can not tell that sssd will be functioning properly (dyndns, sudo > > > > > > > refresh, > > > > > > > enumeration, domain refresh, even idle timer on socket > > > > > > > activation)... all > > > > > > > those operations that depends on time() would become unreliable. > > > > > > > > > > > > > > I think the best thing to do would be restart the process > > > > > > > (although the > > > > > > > question is how would this affect the boot up) and patch tevent > > > > > > > to deal > > > > > > > with timeshift either by using monotonic clock or by detecting > > > > > > > them and > > > > > > > altering timers accordingly. > > > > > > > > > > > > > > > > > > > In the latest version of patch I've just called _exit(1) when the > > > > > > timeshift > > > > > > is detected. > > > > > > About patching tevent, I've seen some old discussions happening and > > > > > > it > > > > > > doesn't seem a trivial thing to do. Would the patch, as it is right > > > > > > now, be > > > > > > acceptable and then a work on tevent could be done later (yes, I'd > > > > > > add it > > > > > > to my queue and do it as soon as we have an agreement on doing > > > > > > this)? > > > > > > > > > > This is really a blunt tool (calling exit()), but until tevent can be > > > > > fixed the only other option would be to use some wrapper to keep track > > > > > of all existing timed events and cancel and restart them all if the > > > > > clock changes abruptly. > > > > > > > > that's why I suggested signaling self to a tevent-driven signal handler > > > > from where we can just set up the timer anew. > > > > > > > > If there is any other way to 'break out' of the POSIX signal handler > > > > into somewhere where we can call tevent/talloc (or in general unsafe > > > > calls) I'm all ears. > > > > > > I guess I need to understand better what exactly you want to do to be > > > able to advice on something. I can think of a coulpe of options, none of > > > them particularly elegant :) > > > > OK, let me try to explain better. > > > > A machine drifts time. Then an SSSD process receives SIGRT in > > watchdog_handler() and detects the time has drifted, so it avoids > > increasing the watchdog ticks counter -- this is done in > > watchdog_detect_timeshift() at the moment. > > > > At that point, in the current master, we call teardown_watchdog() and > > setup_watchdog() to set a new watchdog (the part that is based on tevent > > timers). This is unsafe to do in a signal handler because it involves > > malloc and free among others called from tevent. > > > > What I'm trying to figure out is how to reset the watchdog when I detect > > in watchdog_detect_timeshift() the time is out of sync and the tevent > > timer that resets the ticks will not arrive until the sssd process > > receives enough SIGRT signals to get itself killed. > > > > Does the question make sense now? > > Yes, > I see a few ways, a file descriptor (like tevent does for handling > signals) used to fire a tevent_fd event that will perform these actions. > Or a global variable, that is checked in the main loop at idle times and > resets the watchdog, finally a mutex on which a dedicated thread waits, > the thread only job is to run the reset if the mutex is released (but > then the mutex needs to be re-acquired, probably on the next tick), but > then again not sure if the code run in the setup_watchdog() is thread > safe. Thank you. Without doing any coding, I like the idea of the fd the best. """ See the full comment at https://github.com/SSSD/sssd/pull/107#issuecomment-266785347
_______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org