Hi On Sat, Sep 13, 2014 at 1:44 AM, Michael Biebl <mbi...@gmail.com> wrote: > Hi, > > we are currently trying to debug an upgrade failure from 208 to 215 > [1] in Debian related to the sd_notify/watchdog feature. > > This bug is not reliably reproducibly, we suspect a race somewhere > when systemd is re-exec'ed and a daemon currently tries to talk to > systemd via sd_notify. > > In my attempt to trigger the issue, I changed the bluetooth.service > (which uses the watchdog feature) and set WatchdogSec=1, i.e. it > notifies systemd every second. > > As a result, systemd is using 25% of my CPU time, and bluetoothd 12%: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 1 root 20 0 110716 5256 3224 R 24,6 0,1 1:42.23 > systemd > 13503 root 20 0 23556 3876 3568 S 12,0 0,0 0:48.52 > bluetoothd > > > This seems highly inefficient and looks like a codepath which isn't > particularly optimized.
I can reproduce this with bluetoothd, but not with any systemd service. I looked at some perf-traces and there's nothing particularly surprising there. I mean, sd_notify() is quite heavy. We create a new socket for each call. It's not meant to be used in tight loops. I also created some flame-graphs: It's all layed out even. 30% bluetoothd, 50% systemd, 20% swapper (mostly CPU idle). And for bluetoothd and systemd it's split between socket(), sendmsg/recvmsg and close() quite evenly again (plus some VFS cgroup reads on systemd side). So, lets look at the bluetoothd code: watchdog_usec = getenv("WATCHDOG_USEC"); seconds = atoi(watchdog_usec) / (1000 * 1000); watchdog = g_timeout_add_seconds_full(G_PRIORITY_HIGH, seconds / 2, watchdog_callback, NULL, NULL); If WATCHDOG_USEC is smaller than 2,000,000 (2s), then "seconds" is <=1 and thus "seconds / 2" is 0. Thus, bluetoothd sends watchdog notifications without any timeout. Two solutions: 1) Don't set such watchdog values for bluetoothd. 2) Fix bluetoothd to use "usec" instead of seconds. We could ignore excessive watchdog reports on the systemd side, but that seams like working around problems at the wrong end. And in this case, it would just cause systemd to spend less time on processing the requests and bluetoothd sending more requests instead. Regardless of the bluez behavior, I don't think there's any need to optimize sd_notify(). But comments welcome! Thanks David _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel