Hi, I have tried with: 0.16.0.real-1ubuntu4 0.16.0.real-1ubuntu5
which version should I try? > On 26 Jun 2018, at 12:03, Christine Caulfield <ccaul...@redhat.com> wrote: > > On 26/06/18 11:00, Salvatore D'angelo wrote: >> Consider that the container is the same when corosync 2.3.5 run. >> If it is something related to the container probably the 2.4.4 >> introduced a feature that has an impact on container. >> Should be something related to libqb according to the code. >> Anyone can help? >> > > > Have you tried downgrading libqb to the previous version to see if it > still happens? > > Chrissie > >>> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaul...@redhat.com >>> <mailto:ccaul...@redhat.com>> wrote: >>> >>> On 26/06/18 10:35, Salvatore D'angelo wrote: >>>> Sorry after the command: >>>> >>>> corosync-quorumtool -ps >>>> >>>> the error in log are still visible. Looking at the source code it seems >>>> problem is at this line: >>>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c >>>> >>>> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) { >>>> fprintf(stderr, "Cannot initialize QUORUM service\n"); >>>> q_handle = 0; >>>> goto out; >>>> } >>>> >>>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { >>>> fprintf(stderr, "Cannot initialise CFG service\n"); >>>> c_handle = 0; >>>> goto out; >>>> } >>>> >>>> The quorum_initialize function is defined here: >>>> https://github.com/corosync/corosync/blob/master/lib/quorum.c >>>> >>>> It seems interacts with libqb to allocate space on /dev/shm but >>>> something fails. I tried to update the libqb with apt-get install but no >>>> success. >>>> >>>> The same for second function: >>>> https://github.com/corosync/corosync/blob/master/lib/cfg.c >>>> >>>> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5. >>>> >>>> The folder /dev/shm has 777 permission like other nodes with older >>>> corosync and pacemaker that work fine. The only difference is that I >>>> only see files created by root, no one created by hacluster like other >>>> two nodes (probably because pacemaker didn’t start correctly). >>>> >>>> This is the analysis I have done so far. >>>> Any suggestion? >>>> >>>> >>> >>> Hmm. t seems very likely something to do with the way the container is >>> set up then - and I know nothing about containers. Sorry :/ >>> >>> Can anyone else help here? >>> >>> Chrissie >>> >>>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadang...@gmail.com >>>>> <mailto:sasadang...@gmail.com> >>>>> <mailto:sasadang...@gmail.com>> wrote: >>>>> >>>>> Yes, sorry you’re right I could find it by myself. >>>>> However, I did the following: >>>>> >>>>> 1. Added the line you suggested to /etc/fstab >>>>> 2. mount -o remount /dev/shm >>>>> 3. Now I correctly see /dev/shm of 512M with df -h >>>>> Filesystem Size Used Avail Use% Mounted on >>>>> overlay 63G 11G 49G 19% / >>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>> osxfs 466G 158G 305G 35% /Users >>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>> *shm 512M 15M 498M 3% /dev/shm* >>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>> tmpfs 128M 0 128M 0% /tmp >>>>> >>>>> The errors in log went away. Consider that I remove the log file >>>>> before start corosync so it does not contains lines of previous >>>>> executions. >>>>> <corosync.log> >>>>> >>>>> But the command: >>>>> corosync-quorumtool -ps >>>>> >>>>> still give: >>>>> Cannot initialize QUORUM service >>>>> >>>>> Consider that few minutes before it gave me the message: >>>>> Cannot initialize CFG service >>>>> >>>>> I do not know the differences between CFG and QUORUM in this case. >>>>> >>>>> If I try to start pacemaker the service is OK but I see only pacemaker >>>>> and the Transport does not work if I try to run a cam command. >>>>> Any suggestion? >>>>> >>>>> >>>>>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaul...@redhat.com >>>>>> <mailto:ccaul...@redhat.com> >>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>> >>>>>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Yes, >>>>>>> >>>>>>> I am reproducing only the required part for test. I think the original >>>>>>> system has a larger shm. The problem is that I do not know exactly how >>>>>>> to change it. >>>>>>> I tried the following steps, but I have the impression I didn’t >>>>>>> performed the right one: >>>>>>> >>>>>>> 1. remove everything under /tmp >>>>>>> 2. Added the following line to /etc/fstab >>>>>>> tmpfs /tmp tmpfs >>>>>>> defaults,nodev,nosuid,mode=1777,size=128M >>>>>>> 0 0 >>>>>>> 3. mount /tmp >>>>>>> 4. df -h >>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>> overlay 63G 11G 49G 19% / >>>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>>> osxfs 466G 158G 305G 35% /Users >>>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>>> *tmpfs 128M 0 128M 0% /tmp* >>>>>>> >>>>>>> The errors are exactly the same. >>>>>>> I have the impression that I changed the wrong parameter. Probably I >>>>>>> have to change: >>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>> >>>>>>> but I do not know how to do that. Any suggestion? >>>>>>> >>>>>> >>>>>> According to google, you just add a new line to /etc/fstab for /dev/shm >>>>>> >>>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>>>>> >>>>>> Chrissie >>>>>> >>>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield >>>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>>>> >>>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Let me add here one important detail. I use Docker for my test >>>>>>>>> with 5 >>>>>>>>> containers deployed on my Mac. >>>>>>>>> Basically the team that worked on this project installed the cluster >>>>>>>>> on soft layer bare metal. >>>>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration >>>>>>>>> occurred recreate the cluster from scratch is not easy. >>>>>>>>> Test it was a cumbersome if you consider that we access to the >>>>>>>>> machines with a complex system hard to describe here. >>>>>>>>> For this reason I ported the cluster on Docker for test purpose. >>>>>>>>> I am >>>>>>>>> not interested to have it working for months, I just need a proof of >>>>>>>>> concept. >>>>>>>>> >>>>>>>>> When the migration works I’ll port everything on bare metal >>>>>>>>> where the >>>>>>>>> size of resources are ambundant. >>>>>>>>> >>>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me >>>>>>>>> what >>>>>>>>> should be an acceptable size for several days of running it is ok >>>>>>>>> for me. >>>>>>>>> It is ok also have commands to clean the shm when required. >>>>>>>>> I know I can find them on Google but if you can suggest me these >>>>>>>>> info >>>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to >>>>>>>>> avoid days of guesswork and try and error if possible. >>>>>>>> >>>>>>>> >>>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if >>>>>>>> you can >>>>>>>> spare it. My 'standard' system uses 75MB under normal running >>>>>>>> allowing >>>>>>>> for one command-line query to run. >>>>>>>> >>>>>>>> If I read this right then you're reproducing a bare-metal system in >>>>>>>> containers now? so the original systems will have a default /dev/shm >>>>>>>> size which is probably much larger than your containers? >>>>>>>> >>>>>>>> I'm just checking here that we don't have a regression in memory >>>>>>>> usage >>>>>>>> as Poki suggested. >>>>>>>> >>>>>>>> Chrissie >>>>>>>> >>>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>> <mailto:jpoko...@redhat.com>> wrote: >>>>>>>>>> >>>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>>>>> Thanks for reply. I scratched my cluster and created it again and >>>>>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>>>>> >>>>>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>>>>> I launch: >>>>>>>>>>> corosync-quorumtool -ps >>>>>>>>>>> >>>>>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>>>>> >>>>>>>>>>> Here the log with debug enabled: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create circular mmap >>>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>>>>> unavailable (11) >>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>>>>> Resource temporarily unavailable (11) >>>>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>>>>> >>>>>>>>>>> I tried to check /dev/shm and I am not sure these are the right >>>>>>>>>>> commands, however: >>>>>>>>>>> >>>>>>>>>>> df -h /dev/shm >>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>>>>> >>>>>>>>>>> ls /dev/shm >>>>>>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data >>>>>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>>>>> qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header >>>>>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>>>>> >>>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous >>>>>>>>>>> corosync release? >>>>>>>>>> >>>>>>>>>> For a start, can you try configuring corosync with >>>>>>>>>> --enable-small-memory-footprint switch? >>>>>>>>>> >>>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>>>>> opposite of generous (per today's standards), but may be the result >>>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case, >>>>>>>>>> the above build-time toggle might help. >>>>>>>>>> >>>>>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>>>>> likely your best bet (I don't recommended fiddling with mlockall() >>>>>>>>>> and similar measures in corosync). >>>>>>>>>> >>>>>>>>>> Of course, feel free to raise a regression if you have a >>>>>>>>>> reproducible >>>>>>>>>> comparison between two corosync (plus possibly different libraries >>>>>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.). >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jan (Poki) >>>>>>>>>> _______________________________________________ >>>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>>> >>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>>> Getting >>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>>>>> Getting >>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> <http://www.clusterlabs.org/> >>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>>>> Getting >>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>>> Getting >>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org