On 26/06/18 10:35, Salvatore D'angelo wrote: > Sorry after the command: > > corosync-quorumtool -ps > > the error in log are still visible. Looking at the source code it seems > problem is at this line: > https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c > > if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) { > fprintf(stderr, "Cannot initialize QUORUM service\n"); > q_handle = 0; > goto out; > } > > if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { > fprintf(stderr, "Cannot initialise CFG service\n"); > c_handle = 0; > goto out; > } > > The quorum_initialize function is defined here: > https://github.com/corosync/corosync/blob/master/lib/quorum.c > > It seems interacts with libqb to allocate space on /dev/shm but > something fails. I tried to update the libqb with apt-get install but no > success. > > The same for second function: > https://github.com/corosync/corosync/blob/master/lib/cfg.c > > Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5. > > The folder /dev/shm has 777 permission like other nodes with older > corosync and pacemaker that work fine. The only difference is that I > only see files created by root, no one created by hacluster like other > two nodes (probably because pacemaker didn’t start correctly). > > This is the analysis I have done so far. > Any suggestion? > >
Hmm. t seems very likely something to do with the way the container is set up then - and I know nothing about containers. Sorry :/ Can anyone else help here? Chrissie >> On 26 Jun 2018, at 11:03, Salvatore D'angelo <sasadang...@gmail.com >> <mailto:sasadang...@gmail.com>> wrote: >> >> Yes, sorry you’re right I could find it by myself. >> However, I did the following: >> >> 1. Added the line you suggested to /etc/fstab >> 2. mount -o remount /dev/shm >> 3. Now I correctly see /dev/shm of 512M with df -h >> Filesystem Size Used Avail Use% Mounted on >> overlay 63G 11G 49G 19% / >> tmpfs 64M 4.0K 64M 1% /dev >> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >> osxfs 466G 158G 305G 35% /Users >> /dev/sda1 63G 11G 49G 19% /etc/hosts >> *shm 512M 15M 498M 3% /dev/shm* >> tmpfs 1000M 0 1000M 0% /sys/firmware >> tmpfs 128M 0 128M 0% /tmp >> >> The errors in log went away. Consider that I remove the log file >> before start corosync so it does not contains lines of previous >> executions. >> <corosync.log> >> >> But the command: >> corosync-quorumtool -ps >> >> still give: >> Cannot initialize QUORUM service >> >> Consider that few minutes before it gave me the message: >> Cannot initialize CFG service >> >> I do not know the differences between CFG and QUORUM in this case. >> >> If I try to start pacemaker the service is OK but I see only pacemaker >> and the Transport does not work if I try to run a cam command. >> Any suggestion? >> >> >>> On 26 Jun 2018, at 10:49, Christine Caulfield <ccaul...@redhat.com >>> <mailto:ccaul...@redhat.com>> wrote: >>> >>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>> Hi, >>>> >>>> Yes, >>>> >>>> I am reproducing only the required part for test. I think the original >>>> system has a larger shm. The problem is that I do not know exactly how >>>> to change it. >>>> I tried the following steps, but I have the impression I didn’t >>>> performed the right one: >>>> >>>> 1. remove everything under /tmp >>>> 2. Added the following line to /etc/fstab >>>> tmpfs /tmp tmpfs defaults,nodev,nosuid,mode=1777,size=128M >>>> 0 0 >>>> 3. mount /tmp >>>> 4. df -h >>>> Filesystem Size Used Avail Use% Mounted on >>>> overlay 63G 11G 49G 19% / >>>> tmpfs 64M 4.0K 64M 1% /dev >>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>> osxfs 466G 158G 305G 35% /Users >>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>> shm 64M 11M 54M 16% /dev/shm >>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>> *tmpfs 128M 0 128M 0% /tmp* >>>> >>>> The errors are exactly the same. >>>> I have the impression that I changed the wrong parameter. Probably I >>>> have to change: >>>> shm 64M 11M 54M 16% /dev/shm >>>> >>>> but I do not know how to do that. Any suggestion? >>>> >>> >>> According to google, you just add a new line to /etc/fstab for /dev/shm >>> >>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>> >>> Chrissie >>> >>>>> On 26 Jun 2018, at 09:48, Christine Caulfield <ccaul...@redhat.com >>>>> <mailto:ccaul...@redhat.com> >>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>> >>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>> Hi, >>>>>> >>>>>> Let me add here one important detail. I use Docker for my test with 5 >>>>>> containers deployed on my Mac. >>>>>> Basically the team that worked on this project installed the cluster >>>>>> on soft layer bare metal. >>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration >>>>>> occurred recreate the cluster from scratch is not easy. >>>>>> Test it was a cumbersome if you consider that we access to the >>>>>> machines with a complex system hard to describe here. >>>>>> For this reason I ported the cluster on Docker for test purpose. I am >>>>>> not interested to have it working for months, I just need a proof of >>>>>> concept. >>>>>> >>>>>> When the migration works I’ll port everything on bare metal where the >>>>>> size of resources are ambundant. >>>>>> >>>>>> Now I have enough RAM and disk space on my Mac so if you tell me what >>>>>> should be an acceptable size for several days of running it is ok >>>>>> for me. >>>>>> It is ok also have commands to clean the shm when required. >>>>>> I know I can find them on Google but if you can suggest me these info >>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to >>>>>> avoid days of guesswork and try and error if possible. >>>>> >>>>> >>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if you can >>>>> spare it. My 'standard' system uses 75MB under normal running allowing >>>>> for one command-line query to run. >>>>> >>>>> If I read this right then you're reproducing a bare-metal system in >>>>> containers now? so the original systems will have a default /dev/shm >>>>> size which is probably much larger than your containers? >>>>> >>>>> I'm just checking here that we don't have a regression in memory usage >>>>> as Poki suggested. >>>>> >>>>> Chrissie >>>>> >>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>>> <mailto:jpoko...@redhat.com> >>>>>>> <mailto:jpoko...@redhat.com>> wrote: >>>>>>> >>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>> Thanks for reply. I scratched my cluster and created it again and >>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>> >>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>> I launch: >>>>>>>> corosync-quorumtool -ps >>>>>>>> >>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>> >>>>>>>> Here the log with debug enabled: >>>>>>>> >>>>>>>> >>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create circular mmap >>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>> unavailable (11) >>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>> Resource temporarily unavailable (11) >>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>> >>>>>>>> I tried to check /dev/shm and I am not sure these are the right >>>>>>>> commands, however: >>>>>>>> >>>>>>>> df -h /dev/shm >>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>> >>>>>>>> ls /dev/shm >>>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data >>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>> qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header >>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>> >>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous >>>>>>>> corosync release? >>>>>>> >>>>>>> For a start, can you try configuring corosync with >>>>>>> --enable-small-memory-footprint switch? >>>>>>> >>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>> opposite of generous (per today's standards), but may be the result >>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case, >>>>>>> the above build-time toggle might help. >>>>>>> >>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>> likely your best bet (I don't recommended fiddling with mlockall() >>>>>>> and similar measures in corosync). >>>>>>> >>>>>>> Of course, feel free to raise a regression if you have a reproducible >>>>>>> comparison between two corosync (plus possibly different libraries >>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.). >>>>>>> >>>>>>> -- >>>>>>> Jan (Poki) >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> <http://www.clusterlabs.org/> >>>>>>> Getting >>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>> Getting >>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>> Getting >>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >> > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org