On Wed, Apr 29, 2015 at 11:46:50AM +0200, Lennart Poettering wrote: > On Tue, 28.04.15 19:25, John Morrissey (j...@horde.net) wrote: > > On 18 Feb 2015, at 18:47, Lennart Poettering <lennart at poettering.net> > > wrote: > > > Hmm, this appears to be caused by a timer that is not reset. First the > > > timer fd is set to the earliest possible trigger, then epoll_wait() is > > > entered, which immediately quites. Then the tiemrfd elapse counter is > > > read which is 1. > > > > > > It would be interesting to figure out which timer this is. > > > > > > To make this work, can you reproduce the issue, then use gdb: > > > > > > 1. Type "gdb" to start it > > > 2. Type "attach 1" to attach to PID 1 > > > 3. Type "b source_dispatch" to set a break point on the source_dispatch > > > function > > > 4. Type "c" to continue execution > > > 5. This should then break on the next execution of the source_dispatch > > > function > > > 6. This should happen immediately, after all PID 1 is busy looping > > > around a timer. Use "p s->description" to get a short description > > > string for the event that is being dispatched. In fact, please use > > > "p *s" to get all data about the event, and paste it here. > > > > I noticed this behavior recently on a Debian jessie system running systemd > > 215-17. systemd got itself in a loop like the previous reporter's: [snip] > > -- > > recvmsg(42, 0x7fff3ea64d00, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = > > -1 EAGAIN (Resource temporarily unavailable) > > timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, > > 1}}, NULL) = 0 > > epoll_wait(4, {{EPOLLIN, {u32=3, u64=3}}}, 36, 0) = 1 > > clock_gettime(CLOCK_BOOTTIME, {1959524, 957887776}) = 0 > > read(3, "\1\0\0\0\0\0\0\0", 8) = 8 > > recvmsg(42, 0x7fff3ea64d00, > > MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource > > temporarily unavailable) > > Any chance you can check what fd 42 refers to? See /proc/1/fd/42 and lsof?
Not easily. The system was basically unusable when it got into this state, since it was a production machine and being able to start and stop services is in the critical path. Thankfully, a reboot fixed it and it hasn't recurred in the couple of days since, but I thought I'd follow up with the struct output since someone else reported exactly the same behavior a couple of months ago. -john _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel