On 26/06/18 11:00, Salvatore D'angelo wrote: > Consider that the container is the same when corosync 2.3.5 run. > If it is something related to the container probably the 2.4.4 > introduced a feature that has an impact on container. > Should be something related to libqb according to the code. > Anyone can help? >
Have you tried downgrading libqb to the previous version to see if it still happens? Chrissie >> On 26 Jun 2018, at 11:56, Christine Caulfield <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 26/06/18 10:35, Salvatore D'angelo wrote: >>> Sorry after the command: >>> >>> corosync-quorumtool -ps >>> >>> the error in log are still visible. Looking at the source code it seems >>> problem is at this line: >>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c >>> >>> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != CS_OK) { >>> fprintf(stderr, "Cannot initialize QUORUM service\n"); >>> q_handle = 0; >>> goto out; >>> } >>> >>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { >>> fprintf(stderr, "Cannot initialise CFG service\n"); >>> c_handle = 0; >>> goto out; >>> } >>> >>> The quorum_initialize function is defined here: >>> https://github.com/corosync/corosync/blob/master/lib/quorum.c >>> >>> It seems interacts with libqb to allocate space on /dev/shm but >>> something fails. I tried to update the libqb with apt-get install but no >>> success. >>> >>> The same for second function: >>> https://github.com/corosync/corosync/blob/master/lib/cfg.c >>> >>> Now I am not an expert of libqb. I have the version 0.16.0.real-1ubuntu5. >>> >>> The folder /dev/shm has 777 permission like other nodes with older >>> corosync and pacemaker that work fine. The only difference is that I >>> only see files created by root, no one created by hacluster like other >>> two nodes (probably because pacemaker didn’t start correctly). >>> >>> This is the analysis I have done so far. >>> Any suggestion? >>> >>> >> >> Hmm. t seems very likely something to do with the way the container is >> set up then - and I know nothing about containers. Sorry :/ >> >> Can anyone else help here? >> >> Chrissie >> >>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo <[email protected] >>>> <mailto:[email protected]> >>>> <mailto:[email protected]>> wrote: >>>> >>>> Yes, sorry you’re right I could find it by myself. >>>> However, I did the following: >>>> >>>> 1. Added the line you suggested to /etc/fstab >>>> 2. mount -o remount /dev/shm >>>> 3. Now I correctly see /dev/shm of 512M with df -h >>>> Filesystem Size Used Avail Use% Mounted on >>>> overlay 63G 11G 49G 19% / >>>> tmpfs 64M 4.0K 64M 1% /dev >>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>> osxfs 466G 158G 305G 35% /Users >>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>> *shm 512M 15M 498M 3% /dev/shm* >>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>> tmpfs 128M 0 128M 0% /tmp >>>> >>>> The errors in log went away. Consider that I remove the log file >>>> before start corosync so it does not contains lines of previous >>>> executions. >>>> <corosync.log> >>>> >>>> But the command: >>>> corosync-quorumtool -ps >>>> >>>> still give: >>>> Cannot initialize QUORUM service >>>> >>>> Consider that few minutes before it gave me the message: >>>> Cannot initialize CFG service >>>> >>>> I do not know the differences between CFG and QUORUM in this case. >>>> >>>> If I try to start pacemaker the service is OK but I see only pacemaker >>>> and the Transport does not work if I try to run a cam command. >>>> Any suggestion? >>>> >>>> >>>>> On 26 Jun 2018, at 10:49, Christine Caulfield <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>>>> Hi, >>>>>> >>>>>> Yes, >>>>>> >>>>>> I am reproducing only the required part for test. I think the original >>>>>> system has a larger shm. The problem is that I do not know exactly how >>>>>> to change it. >>>>>> I tried the following steps, but I have the impression I didn’t >>>>>> performed the right one: >>>>>> >>>>>> 1. remove everything under /tmp >>>>>> 2. Added the following line to /etc/fstab >>>>>> tmpfs /tmp tmpfs >>>>>> defaults,nodev,nosuid,mode=1777,size=128M >>>>>> 0 0 >>>>>> 3. mount /tmp >>>>>> 4. df -h >>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>> overlay 63G 11G 49G 19% / >>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>> osxfs 466G 158G 305G 35% /Users >>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>> *tmpfs 128M 0 128M 0% /tmp* >>>>>> >>>>>> The errors are exactly the same. >>>>>> I have the impression that I changed the wrong parameter. Probably I >>>>>> have to change: >>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>> >>>>>> but I do not know how to do that. Any suggestion? >>>>>> >>>>> >>>>> According to google, you just add a new line to /etc/fstab for /dev/shm >>>>> >>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>>>> >>>>> Chrissie >>>>> >>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield >>>>>>> <[email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected]> >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Let me add here one important detail. I use Docker for my test >>>>>>>> with 5 >>>>>>>> containers deployed on my Mac. >>>>>>>> Basically the team that worked on this project installed the cluster >>>>>>>> on soft layer bare metal. >>>>>>>> The PostgreSQL cluster was hard to test and if a misconfiguration >>>>>>>> occurred recreate the cluster from scratch is not easy. >>>>>>>> Test it was a cumbersome if you consider that we access to the >>>>>>>> machines with a complex system hard to describe here. >>>>>>>> For this reason I ported the cluster on Docker for test purpose. >>>>>>>> I am >>>>>>>> not interested to have it working for months, I just need a proof of >>>>>>>> concept. >>>>>>>> >>>>>>>> When the migration works I’ll port everything on bare metal >>>>>>>> where the >>>>>>>> size of resources are ambundant. >>>>>>>> >>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me >>>>>>>> what >>>>>>>> should be an acceptable size for several days of running it is ok >>>>>>>> for me. >>>>>>>> It is ok also have commands to clean the shm when required. >>>>>>>> I know I can find them on Google but if you can suggest me these >>>>>>>> info >>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would like to >>>>>>>> avoid days of guesswork and try and error if possible. >>>>>>> >>>>>>> >>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if >>>>>>> you can >>>>>>> spare it. My 'standard' system uses 75MB under normal running >>>>>>> allowing >>>>>>> for one command-line query to run. >>>>>>> >>>>>>> If I read this right then you're reproducing a bare-metal system in >>>>>>> containers now? so the original systems will have a default /dev/shm >>>>>>> size which is probably much larger than your containers? >>>>>>> >>>>>>> I'm just checking here that we don't have a regression in memory >>>>>>> usage >>>>>>> as Poki suggested. >>>>>>> >>>>>>> Chrissie >>>>>>> >>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <[email protected] >>>>>>>>> <mailto:[email protected]> >>>>>>>>> <mailto:[email protected]> >>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>> >>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>>>> Thanks for reply. I scratched my cluster and created it again and >>>>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>>>> >>>>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>>>> I launch: >>>>>>>>>> corosync-quorumtool -ps >>>>>>>>>> >>>>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>>>> >>>>>>>>>> Here the log with debug enabled: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create circular mmap >>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>>>> unavailable (11) >>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>>>> Resource temporarily unavailable (11) >>>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>>>> >>>>>>>>>> I tried to check /dev/shm and I am not sure these are the right >>>>>>>>>> commands, however: >>>>>>>>>> >>>>>>>>>> df -h /dev/shm >>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>>>> >>>>>>>>>> ls /dev/shm >>>>>>>>>> qb-cmap-request-18020-18036-25-data qb-corosync-blackbox-data >>>>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>>>> qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header >>>>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>>>> >>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with previous >>>>>>>>>> corosync release? >>>>>>>>> >>>>>>>>> For a start, can you try configuring corosync with >>>>>>>>> --enable-small-memory-footprint switch? >>>>>>>>> >>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>>>> opposite of generous (per today's standards), but may be the result >>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your case, >>>>>>>>> the above build-time toggle might help. >>>>>>>>> >>>>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>>>> likely your best bet (I don't recommended fiddling with mlockall() >>>>>>>>> and similar measures in corosync). >>>>>>>>> >>>>>>>>> Of course, feel free to raise a regression if you have a >>>>>>>>> reproducible >>>>>>>>> comparison between two corosync (plus possibly different libraries >>>>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, etc.). >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jan (Poki) >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list: [email protected] >>>>>>>>> <mailto:[email protected]> >>>>>>>>> <mailto:[email protected]> <mailto:[email protected]> >>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>> <http://www.clusterlabs.org/> >>>>>>>>> Getting >>>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list: [email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> <mailto:[email protected]> <mailto:[email protected]> >>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>> >>>>>>>> Project Home: http://www.clusterlabs.org >>>>>>>> <http://www.clusterlabs.org/> >>>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>>>> Getting >>>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: [email protected] >>>>>>> <mailto:[email protected]> >>>>>>> <mailto:[email protected]> <mailto:[email protected]> >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> <http://www.clusterlabs.org/> >>>>>>> <http://www.clusterlabs.org/> <http://www.clusterlabs.org/> >>>>>>> Getting >>>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>>> <http://bugs.clusterlabs.org/> <http://bugs.clusterlabs.org/> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: [email protected] >>>>>> <mailto:[email protected]> <mailto:[email protected]> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>>> Getting >>>>>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: [email protected] >>>>> <mailto:[email protected]> <mailto:[email protected]> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list: [email protected] <mailto:[email protected]> >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Users mailing list: [email protected] <mailto:[email protected]> >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
