On Sun, Aug 23, 2020 at 11:59 PM Rob Landley <r...@landley.net> wrote:
> On 8/22/20 3:05 PM, enh via Toybox wrote: > > On Sat, Aug 22, 2020 at 11:22 AM Chris Sarra via Toybox > > <toybox@lists.landley.net <mailto:toybox@lists.landley.net>> wrote: > > > > This patch introduces a simple watchdog implementation for toybox. We > > send the appropriate ioctls to set the relevant timeouts, and > intercept > > signals to safely shut down if required. > > --- > > toys/pending/watchdog.c | 128 > ++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 128 insertions(+) > > create mode 100644 toys/pending/watchdog.c > > > > diff --git a/toys/pending/watchdog.c b/toys/pending/watchdog.c > > new file mode 100644 > > index 00000000..0813fe69 > > --- /dev/null > > +++ b/toys/pending/watchdog.c > > @@ -0,0 +1,128 @@ > > +/* watchdog - start a watchdog timer with configurable kick > frequencies > > + > > + Author: Chris Sarra, chrissa...@google.com <mailto: > chrissa...@google.com> > > > > not sure why we're both working on a Saturday, but... > > Only reason I didn't get to it yesterday is I took a day off like you > suggested. :) > > > + Date: 25 July 2019 > > + Ref: kernel.org/doc/Documentation/watchdog/watchdog-api.txt > > <http://kernel.org/doc/Documentation/watchdog/watchdog-api.txt> > > > > use the " * " indent from the other source files? > > I already cleaned this up before spotting this email. :) > > > also mention that there's precedent in a busybox "watchdog", which this > is > > command-line compatible with? > > Not a clue. Last system I dealt with using watchdog stuff had a 5 line > function > in the main loop of the app poking the watchdog so if the app went down the > system rebooted. I've always been mildly confused by standalone watchdog > binaries: what do they prove exactly? > > > +USE_WATCHDOG(NEWTOY(watchdog, "Ft#T#", > TOYFLAG_NEEDROOT|TOYFLAG_BIN)) > > > > huh. i didn't realize that USE_ worked _inside_ the comment! probably > best to > > move it below the comment as is the convention though. > > It's a grep pulling NEWTOY()/OLDTOY() lines with no leading whitespace out > of > toys/*/*.c and running them through sort to make generated/newtoys.h. > > > +config WATCHDOG > > + bool "watchdog" > > + default y > > + help > > + usage: watchdog [-F] [-t SW_TIMER_S] [-T HW_TIMER_S] DEV > > + > > + Start the watchdog timer at DEV with optional timeout > parameters. > > > > (blank line here.) > > > > + -F run in the foreground (do not daemonize) > > + -t software timer (in seconds) > > + -T hardware timer (in seconds) > > > > > > say what the defaults are? > > Ooh, good suggestion. > > > the busybox help implies that it's possible to use more precision than > seconds? > > afaik from the kernel source and kernel docs, though, that's a lie? > > You can for the sleep, not for this watchdog ioctl. Might be a newer ioctl > with > more precision though... Not spotting an obvious candidate in the current > kernel > source though. (What's a PRETIMEOUT?) > strace says they're using the same ioctl, so maybe just over-enthusiastic copy & paste in the help text... > > i think it would help to explain that -T is what you set the hardware > watchdog > > to, and -t is how often you kick it. i didn't get that from either your > --help > > or the busybox --help. > > > > so something like: > > > > -T N Set hardware watchdog to N seconds (default 60). > > -t N Kick watchdog every N seconds (default 30). > > Is kick the official term for this? > it's the one i'm familiar with, and wikipedia says it's not just me, but `busybox watchdog --help` says "reset" and wikipedia calls the section "restart" but then _don't_ use the verb "restart" but do use both "kick" (the most) and "reset": https://en.wikipedia.org/wiki/Watchdog_timer#Watchdog_restart (they reserve "restart" for what the device does if the watchdog _isn't_ kicked, despite naming the section "Watchdog restart". i think that was also why it wasn't obvious to me what "reset" meant in the busybox help. Maybe "Kick/reset/restart the timer every N seconds (default 30)"? actually, if busybox had just included "the timer", any of the three verbs would have been fine.) > > ? (note that it should be a tab between the "-T N" part and the "Set > ..." part.) > > > > > > +*/ > > +#define FOR_watchdog > > +#include "toys.h" > > +#include "linux/watchdog.h" > > + > > +// Make sure no DEBUG variable is set; change this if you need > debug prints! > > +#undef WATCHDOG_DEBUG > > > > > > delete this or turn it into a -v flag, depending on whether it's still > > useful now the toy has been written, or was just useful during > development? > > (rtcwake, for example, has this kind of thing in -v, but sleep doesn't.) > > rtcwake -v is in the man page. :) > > > + } > > + } > > + > > + // Intercept terminating signals so we can call our shutdown > routine first. > > + if ( intercept_signals(safe_shutdown) ) { > > + perror_exit("failed to intercept desired signals: %d", rc); > > + } > > +#if defined(WATCHDOG_DEBUG) > > + printf("Successfully intercepted signals.\n"); > > +#endif > > > > > > sigatexit() > > six of one... > > > + TT.fd = open(watchdog_dev, O_WRONLY); > > + if ( TT.fd == -1 ) { > > + perror_exit("failed to open '%s'", watchdog_dev); > > + } > > > > > > xopen() > > > > > > +#if defined(WDIOC_SETTIMEOUT) > > > > > > as far as i can tell, this has existed since 2.6 kernels, so you don't > need the #if? > > I was wondering if maybe it was something like the way uclibc used to be > able to > config out support for stuff, but in that case the header would go away...? > > > > > + // SETTIMEOUT takes time value in seconds: s = ms / (1000 ms/s) > > + hw_timer_sec = TT.hw_timer_s; > > + xioctl(TT.fd, WDIOC_SETTIMEOUT, (void *)&hw_timer_sec); > > > > > > (no need for this local if you let USE_TOY do the default for you. even > when you > > can't, you can just assign to TT.whatever.) > > > > > > +#endif // WDIOC_SETTIMEOUT > > + > > + // Now that we've got the watchdog device open, kick it > periodically. > > + while (1) { > > + write(TT.fd, "\0", 1); > > > > > > xwrite? > > Exiting on failure to write here seems... unhelpful. :) > yeah, good point :-) > You could do writeall() which retries short writes, but you're only going > to get > a 0 length EAGAIN write from a signal without SA_RESTART (which I believe > all > SIGDFL does in modern Linux) so the only culprit would be signals we set, > which > in this case is SIGTERM which doesn't return. > > That said, SIG_IGN on SIGSTOP might be useful. (Unless somebody WANTS to > crash > the system by stopping the watchdog?) > > But the main defense here is having more than one "wake up and poke" period > before the hardware reboot triggers. If something did go wrong (which it > shouldn't), you'd get another bite at the apple before getting thrown out > of the > garden. > > I do note that this doesn't seem to have a SOFTWARE reboot mechanism. But > there's no app sending you periodic proof-of-life photos of the kidnapee > with > today's newspaper, so how would it know when to? (Again, I'm used to the > embedded system's main app doing this. All _this_ process poking the > watchdog > proves is that the process scheduler is still running, not that the system > is > capable doing anything useful. The logical process to be doing this in > android > would probably be... I dunno, the zygote spawner? What responds to UI touch > screen events, and what launches new apps? I watched several hours of > tutorial > on this years ago and then you rewrote it all like 8 months later.) > > Also, I thought you guys quiesce the HELL out of the system to keep battery > usage low when the screen's off, and you're going to wake up a process to > poke > the watchdog? (Baseband checks for incoming cell data, that's not even the > same > chip? Maybe it's different now...) > (this isn't for Android.) > Rob >
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net