On Fri, Jan 22, 2016 at 07:57:52PM +0300, Vladislav Bogdanov wrote:
> Tried reverting this one and a51b2bb ("If an error occurs unlink the 
> lock file and exit with status 1") one-by-one and both together, the 
> same result.
> 
> So problem seems to be somewhere deeper.

I've got the same fencing problem with dlm-4.0.4 on Debian.  Looking
at the strace of the dlm_controld process it exits right after returning
from the poll call due to SIGCHLD signal:

wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, 
{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, 
events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, 
events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, 
{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, 
events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, 
events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, 
{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, 
events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, 
events=POLLIN}], 10, 1000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2279, si_uid=0, 
si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
close(11)                               = 0
sendto(10, "\240", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(17, "\20", 1, MSG_NOSIGNAL, NULL, 0) = 1
poll([{fd=17, events=POLLIN}], 1, 0)    = 0 (Timeout)
shutdown(17, SHUT_RDWR)                 = 0
close(17)                               = 0
munmap(0x7f5f45c26000, 2105344)         = 0
munmap(0x7f5f4aeea000, 8248)            = 0
munmap(0x7f5f45a24000, 2105344)         = 0
munmap(0x7f5f4aee7000, 8248)            = 0
munmap(0x7f5f45822000, 2105344)         = 0

and in fact there is a recent change in 4.0.4 modifying that part
of code:

  If an error occurs unlink the lock file and exit with status 1
  
https://git.fedorahosted.org/cgit/dlm.git/commit/?id=a51b2bbe413222829778698e62af88a73ebec233

The bug is caused by the missing braces in the expanded if
statement.

Do you think we can get a new version out with this patch as the
fencing in 4.0.4 does not work properly due to this issue?

-- 
Valentin
Index: dlm-4.0.4/dlm_controld/main.c
===================================================================
--- dlm-4.0.4.orig/dlm_controld/main.c
+++ dlm-4.0.4/dlm_controld/main.c
@@ -1028,9 +1028,10 @@ static int loop(void)
 	for (;;) {
 		rv = poll(pollfd, client_maxi + 1, poll_timeout);
 		if (rv == -1 && errno == EINTR) {
-			if (daemon_quit && list_empty(&lockspaces))
+			if (daemon_quit && list_empty(&lockspaces)) {
 				rv = 0;
 				goto out;
+			}
 			if (daemon_quit) {
 				log_error("shutdown ignored, active lockspaces");
 				daemon_quit = 0;
_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to