On Fri, Jan 22, 2016 at 07:57:52PM +0300, Vladislav Bogdanov wrote: > Tried reverting this one and a51b2bb ("If an error occurs unlink the > lock file and exit with status 1") one-by-one and both together, the > same result. > > So problem seems to be somewhere deeper.
I've got the same fencing problem with dlm-4.0.4 on Debian. Looking at the strace of the dlm_controld process it exits right after returning from the poll call due to SIGCHLD signal: wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = 0 (Timeout) wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = 0 (Timeout) wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2279, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- rt_sigreturn() = -1 EINTR (Interrupted system call) close(11) = 0 sendto(10, "\240", 1, MSG_NOSIGNAL, NULL, 0) = 1 sendto(17, "\20", 1, MSG_NOSIGNAL, NULL, 0) = 1 poll([{fd=17, events=POLLIN}], 1, 0) = 0 (Timeout) shutdown(17, SHUT_RDWR) = 0 close(17) = 0 munmap(0x7f5f45c26000, 2105344) = 0 munmap(0x7f5f4aeea000, 8248) = 0 munmap(0x7f5f45a24000, 2105344) = 0 munmap(0x7f5f4aee7000, 8248) = 0 munmap(0x7f5f45822000, 2105344) = 0 and in fact there is a recent change in 4.0.4 modifying that part of code: If an error occurs unlink the lock file and exit with status 1 https://git.fedorahosted.org/cgit/dlm.git/commit/?id=a51b2bbe413222829778698e62af88a73ebec233 The bug is caused by the missing braces in the expanded if statement. Do you think we can get a new version out with this patch as the fencing in 4.0.4 does not work properly due to this issue? -- Valentin
Index: dlm-4.0.4/dlm_controld/main.c =================================================================== --- dlm-4.0.4.orig/dlm_controld/main.c +++ dlm-4.0.4/dlm_controld/main.c @@ -1028,9 +1028,10 @@ static int loop(void) for (;;) { rv = poll(pollfd, client_maxi + 1, poll_timeout); if (rv == -1 && errno == EINTR) { - if (daemon_quit && list_empty(&lockspaces)) + if (daemon_quit && list_empty(&lockspaces)) { rv = 0; goto out; + } if (daemon_quit) { log_error("shutdown ignored, active lockspaces"); daemon_quit = 0;
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org