On 26/06/18 11:24, Salvatore D'angelo wrote: > Hi, > > I have tried with: > 0.16.0.real-1ubuntu4 > 0.16.0.real-1ubuntu5 > > which version should I try?
Hmm both of those are actually quite old! maybe a newer one? Chrissie > >> On 26 Jun 2018, at 12:03, Christine Caulfield <ccaul...@redhat.com >> <mailto:ccaul...@redhat.com>> wrote: >> >> On 26/06/18 11:00, Salvatore D'angelo wrote: >>> Consider that the container is the same when corosync 2.3.5 run. >>> If it is something related to the container probably the 2.4.4 >>> introduced a feature that has an impact on container. >>> Should be something related to libqb according to the code. >>> Anyone can help? >>> >> >> >> Have you tried downgrading libqb to the previous version to see if it >> still happens? >> >> Chrissie >> >>>> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaul...@redhat.com >>>> <mailto:ccaul...@redhat.com> >>>> <mailto:ccaul...@redhat.com>> wrote: >>>> >>>> On 26/06/18 10:35, Salvatore D'angelo wrote: >>>>> Sorry after the command: >>>>> >>>>> corosync-quorumtool -ps >>>>> >>>>> the error in log are still visible. Looking at the source code it seems >>>>> problem is at this line: >>>>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c >>>>> >>>>> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) { >>>>> fprintf(stderr, "Cannot initialize QUORUM service\n"); >>>>> q_handle = 0; >>>>> goto out; >>>>> } >>>>> >>>>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { >>>>> fprintf(stderr, "Cannot initialise CFG service\n"); >>>>> c_handle = 0; >>>>> goto out; >>>>> } >>>>> >>>>> The quorum_initialize function is defined here: >>>>> https://github.com/corosync/corosync/blob/master/lib/quorum.c >>>>> >>>>> It seems interacts with libqb to allocate space on /dev/shm but >>>>> something fails. I tried to update the libqb with apt-get install >>>>> but no >>>>> success. >>>>> >>>>> The same for second function: >>>>> https://github.com/corosync/corosync/blob/master/lib/cfg.c >>>>> >>>>> Now I am not an expert of libqb. I have the >>>>> version 0.16.0.real-1ubuntu5. >>>>> >>>>> The folder /dev/shm has 777 permission like other nodes with older >>>>> corosync and pacemaker that work fine. The only difference is that I >>>>> only see files created by root, no one created by hacluster like other >>>>> two nodes (probably because pacemaker didn’t start correctly). >>>>> >>>>> This is the analysis I have done so far. >>>>> Any suggestion? >>>>> >>>>> >>>> >>>> Hmm. t seems very likely something to do with the way the container is >>>> set up then - and I know nothing about containers. Sorry :/ >>>> >>>> Can anyone else help here? >>>> >>>> Chrissie >>>> >>>>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo >>>>>> <sasadang...@gmail.com <mailto:sasadang...@gmail.com> >>>>>> <mailto:sasadang...@gmail.com> >>>>>> <mailto:sasadang...@gmail.com>> wrote: >>>>>> >>>>>> Yes, sorry you’re right I could find it by myself. >>>>>> However, I did the following: >>>>>> >>>>>> 1. Added the line you suggested to /etc/fstab >>>>>> 2. mount -o remount /dev/shm >>>>>> 3. Now I correctly see /dev/shm of 512M with df -h >>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>> overlay 63G 11G 49G 19% / >>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>> osxfs 466G 158G 305G 35% /Users >>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>> *shm 512M 15M 498M 3% /dev/shm* >>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>> tmpfs 128M 0 128M 0% /tmp >>>>>> >>>>>> The errors in log went away. Consider that I remove the log file >>>>>> before start corosync so it does not contains lines of previous >>>>>> executions. >>>>>> <corosync.log> >>>>>> >>>>>> But the command: >>>>>> corosync-quorumtool -ps >>>>>> >>>>>> still give: >>>>>> Cannot initialize QUORUM service >>>>>> >>>>>> Consider that few minutes before it gave me the message: >>>>>> Cannot initialize CFG service >>>>>> >>>>>> I do not know the differences between CFG and QUORUM in this case. >>>>>> >>>>>> If I try to start pacemaker the service is OK but I see only pacemaker >>>>>> and the Transport does not work if I try to run a cam command. >>>>>> Any suggestion? >>>>>> >>>>>> >>>>>>> On 26 Jun 2018, at 10:49, Christine Caulfield >>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>> <mailto:ccaul...@redhat.com> >>>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>>> >>>>>>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Yes, >>>>>>>> >>>>>>>> I am reproducing only the required part for test. I think the >>>>>>>> original >>>>>>>> system has a larger shm. The problem is that I do not know >>>>>>>> exactly how >>>>>>>> to change it. >>>>>>>> I tried the following steps, but I have the impression I didn’t >>>>>>>> performed the right one: >>>>>>>> >>>>>>>> 1. remove everything under /tmp >>>>>>>> 2. Added the following line to /etc/fstab >>>>>>>> tmpfs /tmp tmpfs >>>>>>>> defaults,nodev,nosuid,mode=1777,size=128M >>>>>>>> 0 0 >>>>>>>> 3. mount /tmp >>>>>>>> 4. df -h >>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>> overlay 63G 11G 49G 19% / >>>>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>>>> osxfs 466G 158G 305G 35% /Users >>>>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>>>> *tmpfs 128M 0 128M 0% /tmp* >>>>>>>> >>>>>>>> The errors are exactly the same. >>>>>>>> I have the impression that I changed the wrong parameter. Probably I >>>>>>>> have to change: >>>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>>> >>>>>>>> but I do not know how to do that. Any suggestion? >>>>>>>> >>>>>>> >>>>>>> According to google, you just add a new line to /etc/fstab for >>>>>>> /dev/shm >>>>>>> >>>>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>>>>>> >>>>>>> Chrissie >>>>>>> >>>>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield >>>>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>>>>> >>>>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Let me add here one important detail. I use Docker for my test >>>>>>>>>> with 5 >>>>>>>>>> containers deployed on my Mac. >>>>>>>>>> Basically the team that worked on this project installed the >>>>>>>>>> cluster >>>>>>>>>> on soft layer bare metal. >>>>>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration >>>>>>>>>> occurred recreate the cluster from scratch is not easy. >>>>>>>>>> Test it was a cumbersome if you consider that we access to the >>>>>>>>>> machines with a complex system hard to describe here. >>>>>>>>>> For this reason I ported the cluster on Docker for test purpose. >>>>>>>>>> I am >>>>>>>>>> not interested to have it working for months, I just need a >>>>>>>>>> proof of >>>>>>>>>> concept. >>>>>>>>>> >>>>>>>>>> When the migration works I’ll port everything on bare metal >>>>>>>>>> where the >>>>>>>>>> size of resources are ambundant. >>>>>>>>>> >>>>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me >>>>>>>>>> what >>>>>>>>>> should be an acceptable size for several days of running it is ok >>>>>>>>>> for me. >>>>>>>>>> It is ok also have commands to clean the shm when required. >>>>>>>>>> I know I can find them on Google but if you can suggest me these >>>>>>>>>> info >>>>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would >>>>>>>>>> like to >>>>>>>>>> avoid days of guesswork and try and error if possible. >>>>>>>>> >>>>>>>>> >>>>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if >>>>>>>>> you can >>>>>>>>> spare it. My 'standard' system uses 75MB under normal running >>>>>>>>> allowing >>>>>>>>> for one command-line query to run. >>>>>>>>> >>>>>>>>> If I read this right then you're reproducing a bare-metal system in >>>>>>>>> containers now? so the original systems will have a default >>>>>>>>> /dev/shm >>>>>>>>> size which is probably much larger than your containers? >>>>>>>>> >>>>>>>>> I'm just checking here that we don't have a regression in memory >>>>>>>>> usage >>>>>>>>> as Poki suggested. >>>>>>>>> >>>>>>>>> Chrissie >>>>>>>>> >>>>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>> <mailto:jpoko...@redhat.com>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>>>>>> Thanks for reply. I scratched my cluster and created it >>>>>>>>>>>> again and >>>>>>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>>>>>> >>>>>>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>>>>>> I launch: >>>>>>>>>>>> corosync-quorumtool -ps >>>>>>>>>>>> >>>>>>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>>>>>> >>>>>>>>>>>> Here the log with debug enabled: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create >>>>>>>>>>>> circular mmap >>>>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>>>>>> unavailable (11) >>>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>>>>>> Resource temporarily unavailable (11) >>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>>>>>> >>>>>>>>>>>> I tried to check /dev/shm and I am not sure these are the right >>>>>>>>>>>> commands, however: >>>>>>>>>>>> >>>>>>>>>>>> df -h /dev/shm >>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>>>>>> >>>>>>>>>>>> ls /dev/shm >>>>>>>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data >>>>>>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>>>>>> qb-cmap-request-18020-18036-25-header >>>>>>>>>>>> qb-corosync-blackbox-header >>>>>>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>>>>>> >>>>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous >>>>>>>>>>>> corosync release? >>>>>>>>>>> >>>>>>>>>>> For a start, can you try configuring corosync with >>>>>>>>>>> --enable-small-memory-footprint switch? >>>>>>>>>>> >>>>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>>>>>> opposite of generous (per today's standards), but may be the >>>>>>>>>>> result >>>>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case, >>>>>>>>>>> the above build-time toggle might help. >>>>>>>>>>> >>>>>>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>>>>>> likely your best bet (I don't recommended fiddling with >>>>>>>>>>> mlockall() >>>>>>>>>>> and similar measures in corosync). >>>>>>>>>>> >>>>>>>>>>> Of course, feel free to raise a regression if you have a >>>>>>>>>>> reproducible >>>>>>>>>>> comparison between two corosync (plus possibly different >>>>>>>>>>> libraries >>>>>>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, >>>>>>>>>>> etc.). >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jan (Poki) >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>>>> >>>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>>>> Getting >>>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>>> >>>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>>>>>> Getting >>>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>>>>> Getting >>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>>> <mailto:Users@clusterlabs.org> >>>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> <http://www.clusterlabs.org/> >>>>>>>> Getting >>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> <mailto:Users@clusterlabs.org> >>>>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> <http://www.clusterlabs.org/> >>>>>>> Getting started: >>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org >>>> <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org> >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org