Consider that the container is the same when corosync 2.3.5 run. If it is something related to the container probably the 2.4.4 introduced a feature that has an impact on container. Should be something related to libqb according to the code. Anyone can help?
> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaul...@redhat.com> wrote: > > On 26/06/18 10:35, Salvatore D'angelo wrote: >> Sorry after the command: >> >> corosync-quorumtool -ps >> >> the error in log are still visible. Looking at the source code it seems >> problem is at this line: >> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c >> >> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) { >> fprintf(stderr, "Cannot initialize QUORUM service\n"); >> q_handle = 0; >> goto out; >> } >> >> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { >> fprintf(stderr, "Cannot initialise CFG service\n"); >> c_handle = 0; >> goto out; >> } >> >> The quorum_initialize function is defined here: >> https://github.com/corosync/corosync/blob/master/lib/quorum.c >> >> It seems interacts with libqb to allocate space on /dev/shm but >> something fails. I tried to update the libqb with apt-get install but no >> success. >> >> The same for second function: >> https://github.com/corosync/corosync/blob/master/lib/cfg.c >> >> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5. >> >> The folder /dev/shm has 777 permission like other nodes with older >> corosync and pacemaker that work fine. The only difference is that I >> only see files created by root, no one created by hacluster like other >> two nodes (probably because pacemaker didn’t start correctly). >> >> This is the analysis I have done so far. >> Any suggestion? >> >> > > Hmm. t seems very likely something to do with the way the container is > set up then - and I know nothing about containers. Sorry :/ > > Can anyone else help here? > > Chrissie > >>> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadang...@gmail.com >>> <mailto:sasadang...@gmail.com> >>> <mailto:sasadang...@gmail.com <mailto:sasadang...@gmail.com>>> wrote: >>> >>> Yes, sorry you’re right I could find it by myself. >>> However, I did the following: >>> >>> 1. Added the line you suggested to /etc/fstab >>> 2. mount -o remount /dev/shm >>> 3. Now I correctly see /dev/shm of 512M with df -h >>> Filesystem Size Used Avail Use% Mounted on >>> overlay 63G 11G 49G 19% / >>> tmpfs 64M 4.0K 64M 1% /dev >>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>> osxfs 466G 158G 305G 35% /Users >>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>> *shm 512M 15M 498M 3% /dev/shm* >>> tmpfs 1000M 0 1000M 0% /sys/firmware >>> tmpfs 128M 0 128M 0% /tmp >>> >>> The errors in log went away. Consider that I remove the log file >>> before start corosync so it does not contains lines of previous >>> executions. >>> <corosync.log> >>> >>> But the command: >>> corosync-quorumtool -ps >>> >>> still give: >>> Cannot initialize QUORUM service >>> >>> Consider that few minutes before it gave me the message: >>> Cannot initialize CFG service >>> >>> I do not know the differences between CFG and QUORUM in this case. >>> >>> If I try to start pacemaker the service is OK but I see only pacemaker >>> and the Transport does not work if I try to run a cam command. >>> Any suggestion? >>> >>> >>>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaul...@redhat.com >>>> <mailto:ccaul...@redhat.com> >>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>> >>>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>>> Hi, >>>>> >>>>> Yes, >>>>> >>>>> I am reproducing only the required part for test. I think the original >>>>> system has a larger shm. The problem is that I do not know exactly how >>>>> to change it. >>>>> I tried the following steps, but I have the impression I didn’t >>>>> performed the right one: >>>>> >>>>> 1. remove everything under /tmp >>>>> 2. Added the following line to /etc/fstab >>>>> tmpfs /tmp tmpfs defaults,nodev,nosuid,mode=1777,size=128M >>>>> 0 0 >>>>> 3. mount /tmp >>>>> 4. df -h >>>>> Filesystem Size Used Avail Use% Mounted on >>>>> overlay 63G 11G 49G 19% / >>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>> osxfs 466G 158G 305G 35% /Users >>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>> shm 64M 11M 54M 16% /dev/shm >>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>> *tmpfs 128M 0 128M 0% /tmp* >>>>> >>>>> The errors are exactly the same. >>>>> I have the impression that I changed the wrong parameter. Probably I >>>>> have to change: >>>>> shm 64M 11M 54M 16% /dev/shm >>>>> >>>>> but I do not know how to do that. Any suggestion? >>>>> >>>> >>>> According to google, you just add a new line to /etc/fstab for /dev/shm >>>> >>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>>> >>>> Chrissie >>>> >>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaul...@redhat.com >>>>>> <mailto:ccaul...@redhat.com> >>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>>>> >>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Let me add here one important detail. I use Docker for my test with 5 >>>>>>> containers deployed on my Mac. >>>>>>> Basically the team that worked on this project installed the cluster >>>>>>> on soft layer bare metal. >>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration >>>>>>> occurred recreate the cluster from scratch is not easy. >>>>>>> Test it was a cumbersome if you consider that we access to the >>>>>>> machines with a complex system hard to describe here. >>>>>>> For this reason I ported the cluster on Docker for test purpose. I am >>>>>>> not interested to have it working for months, I just need a proof of >>>>>>> concept. >>>>>>> >>>>>>> When the migration works I’ll port everything on bare metal where the >>>>>>> size of resources are ambundant. >>>>>>> >>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me what >>>>>>> should be an acceptable size for several days of running it is ok >>>>>>> for me. >>>>>>> It is ok also have commands to clean the shm when required. >>>>>>> I know I can find them on Google but if you can suggest me these info >>>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to >>>>>>> avoid days of guesswork and try and error if possible. >>>>>> >>>>>> >>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can >>>>>> spare it. My 'standard' system uses 75MB under normal running allowing >>>>>> for one command-line query to run. >>>>>> >>>>>> If I read this right then you're reproducing a bare-metal system in >>>>>> containers now? so the original systems will have a default /dev/shm >>>>>> size which is probably much larger than your containers? >>>>>> >>>>>> I'm just checking here that we don't have a regression in memory usage >>>>>> as Poki suggested. >>>>>> >>>>>> Chrissie >>>>>> >>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>> >>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>>> wrote: >>>>>>>> >>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>>> Thanks for reply. I scratched my cluster and created it again and >>>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>>> >>>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>>> I launch: >>>>>>>>> corosync-quorumtool -ps >>>>>>>>> >>>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>>> >>>>>>>>> Here the log with debug enabled: >>>>>>>>> >>>>>>>>> >>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create circular mmap >>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>>> unavailable (11) >>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>>> Resource temporarily unavailable (11) >>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>>> >>>>>>>>> I tried to check /dev/shm and I am not sure these are the right >>>>>>>>> commands, however: >>>>>>>>> >>>>>>>>> df -h /dev/shm >>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>>> >>>>>>>>> ls /dev/shm >>>>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data >>>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>>> qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header >>>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>>> >>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous >>>>>>>>> corosync release? >>>>>>>> >>>>>>>> For a start, can you try configuring corosync with >>>>>>>> --enable-small-memory-footprint switch? >>>>>>>> >>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>>> opposite of generous (per today's standards), but may be the result >>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case, >>>>>>>> the above build-time toggle might help. >>>>>>>> >>>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>>> likely your best bet (I don't recommended fiddling with mlockall() >>>>>>>> and similar measures in corosync). >>>>>>>> >>>>>>>> Of course, feel free to raise a regression if you have a reproducible >>>>>>>> comparison between two corosync (plus possibly different libraries >>>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.). >>>>>>>> >>>>>>>> -- >>>>>>>> Jan (Poki) >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>>>>>> Getting >>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>>>>> Getting >>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>>>> >>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>>>> Getting >>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>>> >>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>> >> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org