On Mon, 10.08.15 08:03, Rich Freeman (r-syst...@thefreemanclan.net) wrote: > Occassionally I'll have nspawn containers that freeze up when they're > loading. What is the best way to troubleshoot these and get useful > info to devs?
Well, not sure what "freeze" means here... I'd always start by getting a stack trace of the processes tha hang. Try the "pstack" tool on the processes to get a backtrace. > This is on systemd-218, on Gentoo. Upstream we try to focus on very recent systemd only... > Also, is there any way to detect these freezes, perhaps getting the > service launching it to at least fail? Short of installing nagios/etc > something like this is hard to spot right now. We have watchdog (see WatchdogSec= documentation in systemd.service(5)) support in all our long-running daemons, and PID 1 will kill the service and generate a backtrace for them if they don't send a watchdog message often enough. So actually we should be pretty good here... > Example of a frozen container: > > systemctl status mariadb-contain > ● mariadb-contain.service - mariadb container > Loaded: loaded (/etc/systemd/system/mariadb-contain.service; > enabled; vendor preset: enabled) > Active: active (running) since Mon 2015-08-10 07:21:48 EDT; 37min ago > Docs: man:systemd-nspawn(1) > Main PID: 1033 (systemd-nspawn) > Status: "Container running." > CGroup: /system.slice/mariadb-contain.service > ├─1033 /usr/bin/systemd-nspawn --quiet --keep-unit --boot > --link-journal=guest --directory=/sstorage3/cont... > ├─1044 /usr/lib/systemd/systemd > └─system.slice > ├─systemd-journald.service > │ └─1407 /usr/lib/systemd/systemd-journald > └─systemd-journal-flush.service > └─1340 /usr/bin/journalctl --flush Hmm, this is really weird... Would be good to get a backtrac of both journald and journalctl here. Note that journald has a much higher PID that journalctl though, which indicates that it might have gotten restarted by systemd already... journalctl --flush actually pretty much only sends SIGUSR1 to journald, but does this through PID1's bus APIs... It then waits for a file in /run/systemd/journal/flushed to appear... For some reason that doesn't work here... Weird... Anyway, before tracking this down further, could you update to a more recent systemd version? Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel