On 27/06/18 08:35, Salvatore D'angelo wrote: > Hi, > > Thanks for reply and detailed explaination. I am not using the > —network=host option. > I have a docker image based on Ubuntu 14.04 where I only deploy this > additional software: > > *RUN apt-get update && apt-get install -y wget git xz-utils > openssh-server \* > *systemd-services make gcc pkg-config psmisc fuse libpython2.7 > libopenipmi0 \* > *libdbus-glib-1-2 libsnmp30 libtimedate-perl libpcap0.8* > > configure ssh with key pairs to communicate easily. The containers are > created with these simple commands: > > *docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device > /dev/loop0 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish > ${PG1_SSH_PORT}:22 --ip ${PG1_PUBLIC_IP} --name ${PG1_PRIVATE_NAME} > --hostname ${PG1_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash* > > *docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device > /dev/loop1 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish > ${PG2_SSH_PORT}:22 --ip ${PG2_PUBLIC_IP} --name ${PG2_PRIVATE_NAME} > --hostname ${PG2_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash* > > *docker create -it --cap-add=MKNOD --cap-add SYS_ADMIN --device > /dev/loop2 --device /dev/fuse --net ${PUBLIC_NETWORK_NAME} --publish > ${PG3_SSH_PORT}:22 --ip ${PG3_PUBLIC_IP} --name ${PG3_PRIVATE_NAME} > --hostname ${PG3_PRIVATE_NAME} -v ${MOUNT_FOLDER}:/Users ngha /bin/bash* > > /dev/fuse is used to configure glusterfs on two others nodes and > /dev/loopX just to simulate better my bare metal env. > > One thing that I do not understand is that I tried to compare corosync > 2.3.5 (the old version that worked fine) and 2.4.4 to understand > differences but I haven’t found anything related to the piece of code > that affects the issue. The quorum tool.c and cfg.c are almost the same. > Probably the issue is somewhere else. >
This might be asking a bit much, but would it be possible to try this using Virtual Machines rather than Docker images? That would at least eliminate a lot of complex variables. Chrissie > >> On 27 Jun 2018, at 08:34, Jan Pokorný <jpoko...@redhat.com >> <mailto:jpoko...@redhat.com>> wrote: >> >> On 26/06/18 17:56 +0200, Salvatore D'angelo wrote: >>> I did another test. I modified docker container in order to be able >>> to run strace. >>> Running strace corosync-quorumtool -ps I got the following: >> >>> [snipped] >>> connect(5, {sa_family=AF_LOCAL, sun_path=@"cfg"}, 110) = 0 >>> setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 >>> sendto(5, >>> "\377\377\377\377\0\0\0\0\30\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0", 24, >>> MSG_NOSIGNAL, NULL, 0) = 24 >>> setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 >>> recvfrom(5, 0x7ffd73bd7ac0, 12328, 16640, 0, 0) = -1 EAGAIN (Resource >>> temporarily unavailable) >>> poll([{fd=5, events=POLLIN}], 1, 4294967295) = 1 ([{fd=5, >>> revents=POLLIN}]) >>> recvfrom(5, >>> "\377\377\377\377\0\0\0\0(0\0\0\0\0\0\0\365\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0"..., >>> 12328, MSG_WAITALL|MSG_NOSIGNAL, NULL, NULL) = 12328 >>> shutdown(5, SHUT_RDWR) = 0 >>> close(5) = 0 >>> write(2, "Cannot initialise CFG service\n", 30Cannot initialise CFG >>> service) = 30 >>> [snipped] >> >> This just demonstrated the effect of already detailed server-side >> error in the client, which communicates with the server just fine, >> but as soon as the server hits the mmap-based problem, it bails >> out the observed way, leaving client unsatisfied. >> >> Note one thing, abstract Unix sockets are being used for the >> communication like this (observe the first line in the strace >> output excerpt above), and if you happen to run container via >> a docker command with --network=host, you may also be affected with >> issues arising from abstract sockets not being isolated but rather >> sharing the same namespace. At least that was the case some years >> back and what asked for a switch in underlying libqb library to >> use strictly the file-backed sockets, where the isolation >> semantics matches the intuition: >> >> https://lists.clusterlabs.org/pipermail/users/2017-May/013003.html >> >> + way to enable (presumably only for container environments, note >> that there's no per process straightforward granularity): >> >> https://clusterlabs.github.io/libqb/1.0.2/doxygen/qb_ipc_overview.html >> (scroll down to "IPC sockets (Linux only)") >> >> You may test that if you are using said --network=host switch. >> >>> I tried to understand what happen behind the scene but it is not easy >>> for me. >>> Hoping someone on this list can help. >> >> Containers are tricky, just as Ansible (as shown earlier on the list) >> can be, when encumbered with false believes and/or misunderstandings. >> Virtual machines may serve better wrt. insights for the later bare >> metal deployments. >> >> -- >> Jan (Poki) >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org