Am 28.04.2014 13:33, schrieb Jimmy Assarsson: > Hi, > > We stumbled upon a freeze/block in systemd. > The problem occurs when a rshd (socket activated) execution is completed, the > network connection is down and systemd is closing the socket. > This causes a long (60 seconds) freeze where it's not possible to communicate > with systemd. > Do you have any idea on what is causing this or how we can investigate this > further? > > > To reproduce the problem: > 1) Get latest Arch Linux > 2) On remote machine execute > rsh $target_ip -l root 'sleep 40' > 3) Set link down on the interface which is assigned with $target_ip, on > systemd machine > ip link set down dev $if > 4) On systemd machine, wait for 'sleep 40' to be completed. Then execute any > systemd command > systemctl list-jobs > 5) After 60 seconds systemd is responding again > > > By looking at the stack trace (see bellow), one can see that we are trying to > close a socket and waiting on a system close call. So it's probably not a > systemd problem, however systemd is affected by it. > > We've succesfully reproduced the problem on different hardware architectures > (x86_64, arm, cris), systemd versions (208, 210, 212) and rshd > implementations (netkit-rsh-0.17, inetutils 1.9.2-1). The problem occurs not > only when the interface's link is set down, also when the IP address is > removed or the ethernet cable is unplugged. ssh seems not to be affected by > the problem. > > > We generated a core dump: > kill -SIGABRT 1 > > Here is the stack trace (the machine is running systemd 210). > (gdb) bt > #0 0xb6f4d830 in raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:46 > #1 0x000527e8 in crash.4282 (sig=6) at > apps/systemd/systemd/src/core/main.c:156 > #2 <signal handler called> > #3 0xb6f4c28c in close () from > target/armv6-axis-linux-gnueabi/lib/libpthread.so.0 > #4 0x0009417c in close_nointr (fd=<optimized out>) at > apps/systemd/systemd/src/shared/util.c:167 > #5 0x00094250 in close_nointr_nofail (fd=<optimized out>) at > apps/systemd/systemd/src/shared/util.c:191 > #6 0x00073e0c in service_close_socket_fd.9824 (s=s@entry=0x1b6f918) at > apps/systemd/systemd/src/core/service.c:229 > #7 0x00079728 in service_set_state.9835 (s=s@entry=0x1b6f918, > state=SERVICE_DEAD) at apps/systemd/systemd/src/core/service.c:1496 > #8 0x00079b70 in service_enter_dead.9847 (s=0x1b6f918, f=<optimized out>, > allow_restart=<optimized out>) > at apps/systemd/systemd/src/core/service.c:1852 > #9 0x00065470 in service_sigchld_event (u=0x1b6f918, pid=<optimized out>, > code=1, status=0) > at apps/systemd/systemd/src/core/service.c:3037 > #10 0x00073490 in invoke_sigchld_event.5410 (m=m@entry=0x1ad7360, > u=0x1b6f918, si=0xbe862670, si@entry=0xbe862668) > at apps/systemd/systemd/src/core/manager.c:1430 > #11 0x00054084 in manager_dispatch_sigchld.5415 (m=m@entry=0x1ad7360) at > apps/systemd/systemd/src/core/manager.c:1477 > #12 0x000629b0 in manager_dispatch_signal_fd.part.32 (userdata=<optimized > out>) at apps/systemd/systemd/src/core/manager.c:1723 > #13 manager_dispatch_signal_fd.5363 (source=<optimized out>, fd=<optimized > out>, revents=<optimized out>, userdata=0x1ad7360) > at apps/systemd/systemd/src/core/manager.c:1508 > #14 0x0003e880 in source_dispatch (s=0x1ad7758) at > apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:1861 > #15 0x00041288 in sd_event_run (e=0x1ad61d8, timeout=<optimized out>) at > apps/systemd/systemd/src/libsystemd/sd-event/sd-event.c:2117 > #16 0x000103c8 in manager_loop (m=0x1ad7360) at > apps/systemd/systemd/src/core/manager.c:1844 > #17 main (argc=1, argv=0xbe862ee4) at > apps/systemd/systemd/src/core/main.c:1704 > > Thanks, > Jimmy
Hmm, reminds me of: http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required http://oroboro.com/dealing-with-network-port-abuse-in-sockets-in-c/ _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel