Sorry after the command: corosync-quorumtool -ps
the error in log are still visible. Looking at the source code it seems problem is at this line: https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c <https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) { fprintf(stderr, "Cannot initialize QUORUM service\n"); q_handle = 0; goto out; } if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { fprintf(stderr, "Cannot initialise CFG service\n"); c_handle = 0; goto out; } The quorum_initialize function is defined here: https://github.com/corosync/corosync/blob/master/lib/quorum.c <https://github.com/corosync/corosync/blob/master/lib/quorum.c> It seems interacts with libqb to allocate space on /dev/shm but something fails. I tried to update the libqb with apt-get install but no success. The same for second function: https://github.com/corosync/corosync/blob/master/lib/cfg.c <https://github.com/corosync/corosync/blob/master/lib/cfg.c> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5. The folder /dev/shm has 777 permission like other nodes with older corosync and pacemaker that work fine. The only difference is that I only see files created by root, no one created by hacluster like other two nodes (probably because pacemaker didn’t start correctly). This is the analysis I have done so far. Any suggestion? > On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadang...@gmail.com> wrote: > > Yes, sorry you’re right I could find it by myself. > However, I did the following: > > 1. Added the line you suggested to /etc/fstab > 2. mount -o remount /dev/shm > 3. Now I correctly see /dev/shm of 512M with df -h > Filesystem Size Used Avail Use% Mounted on > overlay 63G 11G 49G 19% / > tmpfs 64M 4.0K 64M 1% /dev > tmpfs 1000M 0 1000M 0% /sys/fs/cgroup > osxfs 466G 158G 305G 35% /Users > /dev/sda1 63G 11G 49G 19% /etc/hosts > shm 512M 15M 498M 3% /dev/shm > tmpfs 1000M 0 1000M 0% /sys/firmware > tmpfs 128M 0 128M 0% /tmp > > The errors in log went away. Consider that I remove the log file before start > corosync so it does not contains lines of previous executions. > <corosync.log> > > But the command: > corosync-quorumtool -ps > > still give: > Cannot initialize QUORUM service > > Consider that few minutes before it gave me the message: > Cannot initialize CFG service > > I do not know the differences between CFG and QUORUM in this case. > > If I try to start pacemaker the service is OK but I see only pacemaker and > the Transport does not work if I try to run a cam command. > Any suggestion? > > >> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaul...@redhat.com >> <mailto:ccaul...@redhat.com>> wrote: >> >> On 26/06/18 09:40, Salvatore D'angelo wrote: >>> Hi, >>> >>> Yes, >>> >>> I am reproducing only the required part for test. I think the original >>> system has a larger shm. The problem is that I do not know exactly how >>> to change it. >>> I tried the following steps, but I have the impression I didn’t >>> performed the right one: >>> >>> 1. remove everything under /tmp >>> 2. Added the following line to /etc/fstab >>> tmpfs /tmp tmpfs defaults,nodev,nosuid,mode=1777,size=128M >>> 0 0 >>> 3. mount /tmp >>> 4. df -h >>> Filesystem Size Used Avail Use% Mounted on >>> overlay 63G 11G 49G 19% / >>> tmpfs 64M 4.0K 64M 1% /dev >>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>> osxfs 466G 158G 305G 35% /Users >>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>> shm 64M 11M 54M 16% /dev/shm >>> tmpfs 1000M 0 1000M 0% /sys/firmware >>> *tmpfs 128M 0 128M 0% /tmp* >>> >>> The errors are exactly the same. >>> I have the impression that I changed the wrong parameter. Probably I >>> have to change: >>> shm 64M 11M 54M 16% /dev/shm >>> >>> but I do not know how to do that. Any suggestion? >>> >> >> According to google, you just add a new line to /etc/fstab for /dev/shm >> >> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >> >> Chrissie >> >>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaul...@redhat.com >>>> <mailto:ccaul...@redhat.com> >>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>> >>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>> Hi, >>>>> >>>>> Let me add here one important detail. I use Docker for my test with 5 >>>>> containers deployed on my Mac. >>>>> Basically the team that worked on this project installed the cluster >>>>> on soft layer bare metal. >>>>> The PostgreSQL cluster was hard to test and if a misconfiguration >>>>> occurred recreate the cluster from scratch is not easy. >>>>> Test it was a cumbersome if you consider that we access to the >>>>> machines with a complex system hard to describe here. >>>>> For this reason I ported the cluster on Docker for test purpose. I am >>>>> not interested to have it working for months, I just need a proof of >>>>> concept. >>>>> >>>>> When the migration works I’ll port everything on bare metal where the >>>>> size of resources are ambundant. >>>>> >>>>> Now I have enough RAM and disk space on my Mac so if you tell me what >>>>> should be an acceptable size for several days of running it is ok for me. >>>>> It is ok also have commands to clean the shm when required. >>>>> I know I can find them on Google but if you can suggest me these info >>>>> I’ll appreciate. I have OS knowledge to do that but I would like to >>>>> avoid days of guesswork and try and error if possible. >>>> >>>> >>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can >>>> spare it. My 'standard' system uses 75MB under normal running allowing >>>> for one command-line query to run. >>>> >>>> If I read this right then you're reproducing a bare-metal system in >>>> containers now? so the original systems will have a default /dev/shm >>>> size which is probably much larger than your containers? >>>> >>>> I'm just checking here that we don't have a regression in memory usage >>>> as Poki suggested. >>>> >>>> Chrissie >>>> >>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>> <mailto:jpoko...@redhat.com> >>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>>> wrote: >>>>>> >>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>> Thanks for reply. I scratched my cluster and created it again and >>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>> >>>>>>> then I installed new packages. The problem is the same, when >>>>>>> I launch: >>>>>>> corosync-quorumtool -ps >>>>>>> >>>>>>> I got: Cannot initialize QUORUM service >>>>>>> >>>>>>> Here the log with debug enabled: >>>>>>> >>>>>>> >>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create circular mmap >>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>> unavailable (11) >>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>> Resource temporarily unavailable (11) >>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>> >>>>>>> I tried to check /dev/shm and I am not sure these are the right >>>>>>> commands, however: >>>>>>> >>>>>>> df -h /dev/shm >>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>> >>>>>>> ls /dev/shm >>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data >>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>> qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header >>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>> >>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous >>>>>>> corosync release? >>>>>> >>>>>> For a start, can you try configuring corosync with >>>>>> --enable-small-memory-footprint switch? >>>>>> >>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>> opposite of generous (per today's standards), but may be the result >>>>>> of automatic HW adaptation, and if RAM is so scarce in your case, >>>>>> the above build-time toggle might help. >>>>>> >>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>> likely your best bet (I don't recommended fiddling with mlockall() >>>>>> and similar measures in corosync). >>>>>> >>>>>> Of course, feel free to raise a regression if you have a reproducible >>>>>> comparison between two corosync (plus possibly different libraries >>>>>> like libqb) versions, one that works and one that won't, in >>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.). >>>>>> >>>>>> -- >>>>>> Jan (Poki) >>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>>>> >>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>>> >>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>>> >>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>> <http://www.clusterlabs.org/ <http://www.clusterlabs.org/>> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>> <http://bugs.clusterlabs.org/ <http://bugs.clusterlabs.org/>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> <https://lists.clusterlabs.org/mailman/listinfo/users> >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >> https://lists.clusterlabs.org/mailman/listinfo/users >> <https://lists.clusterlabs.org/mailman/listinfo/users> >> >> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org