corosync 2.3.5 and libqb 0.16.0 > On 26 Jun 2018, at 14:08, Christine Caulfield <ccaul...@redhat.com> wrote: > > On 26/06/18 12:16, Salvatore D'angelo wrote: >> libqb update to 1.0.3 but same issue. >> >> I know corosync has also these dependencies nspr and nss3. I updated >> them using apt-get install, here the version installed: >> >> libnspr4, libnspr4-dev 2:4.13.1-0ubuntu0.14.04.1 >> libnss3, libnss3-dev, libnss3-nssb 2:3.28.4-0ubuntu0.14.04.3 >> >> but same problem. >> >> I am working on Ubuntu 14.04 image and I know that packages could be >> quite old here. Are there new versions for these libraries? >> Where I can download them? I tried to search on google but results where >> quite confusing. >> > > It's pretty unlikely to be the crypto libraries. It's almost certainly > in libqb, with a small possibility that of corosync. Which versions did > you have that worked (libqb and corosync) ? > > Chrissie > > >> >>> On 26 Jun 2018, at 12:27, Christine Caulfield <ccaul...@redhat.com >>> <mailto:ccaul...@redhat.com> >>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>> >>> On 26/06/18 11:24, Salvatore D'angelo wrote: >>>> Hi, >>>> >>>> I have tried with: >>>> 0.16.0.real-1ubuntu4 >>>> 0.16.0.real-1ubuntu5 >>>> >>>> which version should I try? >>> >>> >>> Hmm both of those are actually quite old! maybe a newer one? >>> >>> Chrissie >>> >>>> >>>>> On 26 Jun 2018, at 12:03, Christine Caulfield <ccaul...@redhat.com >>>>> <mailto:ccaul...@redhat.com> >>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>>> >>>>> On 26/06/18 11:00, Salvatore D'angelo wrote: >>>>>> Consider that the container is the same when corosync 2.3.5 run. >>>>>> If it is something related to the container probably the 2.4.4 >>>>>> introduced a feature that has an impact on container. >>>>>> Should be something related to libqb according to the code. >>>>>> Anyone can help? >>>>>> >>>>> >>>>> >>>>> Have you tried downgrading libqb to the previous version to see if it >>>>> still happens? >>>>> >>>>> Chrissie >>>>> >>>>>>> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaul...@redhat.com >>>>>>> <mailto:ccaul...@redhat.com> >>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>>>>> >>>>>>> On 26/06/18 10:35, Salvatore D'angelo wrote: >>>>>>>> Sorry after the command: >>>>>>>> >>>>>>>> corosync-quorumtool -ps >>>>>>>> >>>>>>>> the error in log are still visible. Looking at the source code it >>>>>>>> seems >>>>>>>> problem is at this line: >>>>>>>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c >>>>>>>> >>>>>>>> <https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c> >>>>>>>> >>>>>>>> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != >>>>>>>> CS_OK) { >>>>>>>> fprintf(stderr, "Cannot initialize QUORUM service\n"); >>>>>>>> q_handle = 0; >>>>>>>> goto out; >>>>>>>> } >>>>>>>> >>>>>>>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { >>>>>>>> fprintf(stderr, "Cannot initialise CFG service\n"); >>>>>>>> c_handle = 0; >>>>>>>> goto out; >>>>>>>> } >>>>>>>> >>>>>>>> The quorum_initialize function is defined here: >>>>>>>> https://github.com/corosync/corosync/blob/master/lib/quorum.c >>>>>>>> <https://github.com/corosync/corosync/blob/master/lib/quorum.c> >>>>>>>> >>>>>>>> It seems interacts with libqb to allocate space on /dev/shm but >>>>>>>> something fails. I tried to update the libqb with apt-get install >>>>>>>> but no >>>>>>>> success. >>>>>>>> >>>>>>>> The same for second function: >>>>>>>> https://github.com/corosync/corosync/blob/master/lib/cfg.c >>>>>>>> <https://github.com/corosync/corosync/blob/master/lib/cfg.c> >>>>>>>> >>>>>>>> Now I am not an expert of libqb. I have the >>>>>>>> version 0.16.0.real-1ubuntu5. >>>>>>>> >>>>>>>> The folder /dev/shm has 777 permission like other nodes with older >>>>>>>> corosync and pacemaker that work fine. The only difference is that I >>>>>>>> only see files created by root, no one created by hacluster like >>>>>>>> other >>>>>>>> two nodes (probably because pacemaker didn’t start correctly). >>>>>>>> >>>>>>>> This is the analysis I have done so far. >>>>>>>> Any suggestion? >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Hmm. t seems very likely something to do with the way the container is >>>>>>> set up then - and I know nothing about containers. Sorry :/ >>>>>>> >>>>>>> Can anyone else help here? >>>>>>> >>>>>>> Chrissie >>>>>>> >>>>>>>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo >>>>>>>>> <sasadang...@gmail.com <mailto:sasadang...@gmail.com> >>>>>>>>> <mailto:sasadang...@gmail.com <mailto:sasadang...@gmail.com>> >>>>>>>>> <mailto:sasadang...@gmail.com <mailto:sasadang...@gmail.com>> >>>>>>>>> <mailto:sasadang...@gmail.com <mailto:sasadang...@gmail.com>> >>>>>>>>> <mailto:sasadang...@gmail.com <mailto:sasadang...@gmail.com>>> wrote: >>>>>>>>> >>>>>>>>> Yes, sorry you’re right I could find it by myself. >>>>>>>>> However, I did the following: >>>>>>>>> >>>>>>>>> 1. Added the line you suggested to /etc/fstab >>>>>>>>> 2. mount -o remount /dev/shm >>>>>>>>> 3. Now I correctly see /dev/shm of 512M with df -h >>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>> overlay 63G 11G 49G 19% / >>>>>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>>>>> osxfs 466G 158G 305G 35% /Users >>>>>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>>>>> *shm 512M 15M 498M 3% /dev/shm* >>>>>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>>>>> tmpfs 128M 0 128M 0% /tmp >>>>>>>>> >>>>>>>>> The errors in log went away. Consider that I remove the log file >>>>>>>>> before start corosync so it does not contains lines of previous >>>>>>>>> executions. >>>>>>>>> <corosync.log> >>>>>>>>> >>>>>>>>> But the command: >>>>>>>>> corosync-quorumtool -ps >>>>>>>>> >>>>>>>>> still give: >>>>>>>>> Cannot initialize QUORUM service >>>>>>>>> >>>>>>>>> Consider that few minutes before it gave me the message: >>>>>>>>> Cannot initialize CFG service >>>>>>>>> >>>>>>>>> I do not know the differences between CFG and QUORUM in this case. >>>>>>>>> >>>>>>>>> If I try to start pacemaker the service is OK but I see only >>>>>>>>> pacemaker >>>>>>>>> and the Transport does not work if I try to run a cam command. >>>>>>>>> Any suggestion? >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 26 Jun 2018, at 10:49, Christine Caulfield >>>>>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>>>>>>>> >>>>>>>>>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Yes, >>>>>>>>>>> >>>>>>>>>>> I am reproducing only the required part for test. I think the >>>>>>>>>>> original >>>>>>>>>>> system has a larger shm. The problem is that I do not know >>>>>>>>>>> exactly how >>>>>>>>>>> to change it. >>>>>>>>>>> I tried the following steps, but I have the impression I didn’t >>>>>>>>>>> performed the right one: >>>>>>>>>>> >>>>>>>>>>> 1. remove everything under /tmp >>>>>>>>>>> 2. Added the following line to /etc/fstab >>>>>>>>>>> tmpfs /tmp tmpfs >>>>>>>>>>> defaults,nodev,nosuid,mode=1777,size=128M >>>>>>>>>>> 0 0 >>>>>>>>>>> 3. mount /tmp >>>>>>>>>>> 4. df -h >>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>> overlay 63G 11G 49G 19% / >>>>>>>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>>>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>>>>>>> osxfs 466G 158G 305G 35% /Users >>>>>>>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>>>>>>> *tmpfs 128M 0 128M 0% /tmp* >>>>>>>>>>> >>>>>>>>>>> The errors are exactly the same. >>>>>>>>>>> I have the impression that I changed the wrong parameter. >>>>>>>>>>> Probably I >>>>>>>>>>> have to change: >>>>>>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>>>>>> >>>>>>>>>>> but I do not know how to do that. Any suggestion? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> According to google, you just add a new line to /etc/fstab for >>>>>>>>>> /dev/shm >>>>>>>>>> >>>>>>>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>>>>>>>>> >>>>>>>>>> Chrissie >>>>>>>>>> >>>>>>>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield >>>>>>>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>> >>>>>>>>>>>> <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Let me add here one important detail. I use Docker for my test >>>>>>>>>>>>> with 5 >>>>>>>>>>>>> containers deployed on my Mac. >>>>>>>>>>>>> Basically the team that worked on this project installed the >>>>>>>>>>>>> cluster >>>>>>>>>>>>> on soft layer bare metal. >>>>>>>>>>>>> The PostgreSQL cluster was hard to test and if a >>>>>>>>>>>>> misconfiguration >>>>>>>>>>>>> occurred recreate the cluster from scratch is not easy. >>>>>>>>>>>>> Test it was a cumbersome if you consider that we access to the >>>>>>>>>>>>> machines with a complex system hard to describe here. >>>>>>>>>>>>> For this reason I ported the cluster on Docker for test purpose. >>>>>>>>>>>>> I am >>>>>>>>>>>>> not interested to have it working for months, I just need a >>>>>>>>>>>>> proof of >>>>>>>>>>>>> concept. >>>>>>>>>>>>> >>>>>>>>>>>>> When the migration works I’ll port everything on bare metal >>>>>>>>>>>>> where the >>>>>>>>>>>>> size of resources are ambundant. >>>>>>>>>>>>> >>>>>>>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me >>>>>>>>>>>>> what >>>>>>>>>>>>> should be an acceptable size for several days of running it >>>>>>>>>>>>> is ok >>>>>>>>>>>>> for me. >>>>>>>>>>>>> It is ok also have commands to clean the shm when required. >>>>>>>>>>>>> I know I can find them on Google but if you can suggest me these >>>>>>>>>>>>> info >>>>>>>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would >>>>>>>>>>>>> like to >>>>>>>>>>>>> avoid days of guesswork and try and error if possible. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if >>>>>>>>>>>> you can >>>>>>>>>>>> spare it. My 'standard' system uses 75MB under normal running >>>>>>>>>>>> allowing >>>>>>>>>>>> for one command-line query to run. >>>>>>>>>>>> >>>>>>>>>>>> If I read this right then you're reproducing a bare-metal >>>>>>>>>>>> system in >>>>>>>>>>>> containers now? so the original systems will have a default >>>>>>>>>>>> /dev/shm >>>>>>>>>>>> size which is probably much larger than your containers? >>>>>>>>>>>> >>>>>>>>>>>> I'm just checking here that we don't have a regression in memory >>>>>>>>>>>> usage >>>>>>>>>>>> as Poki suggested. >>>>>>>>>>>> >>>>>>>>>>>> Chrissie >>>>>>>>>>>> >>>>>>>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>> >>>>>>>>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>> >>>>>>>>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>> >>>>>>>>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>> >>>>>>>>>>>>>> <mailto:jpoko...@redhat.com <mailto:jpoko...@redhat.com>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>>>>>>>>> Thanks for reply. I scratched my cluster and created it >>>>>>>>>>>>>>> again and >>>>>>>>>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>>>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>>>>>>>>> I launch: >>>>>>>>>>>>>>> corosync-quorumtool -ps >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here the log with debug enabled: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create >>>>>>>>>>>>>>> circular mmap >>>>>>>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>>>>>>>>> unavailable (11) >>>>>>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>>>>>>>>> Resource temporarily unavailable (11) >>>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I tried to check /dev/shm and I am not sure these are the >>>>>>>>>>>>>>> right >>>>>>>>>>>>>>> commands, however: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> df -h /dev/shm >>>>>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ls /dev/shm >>>>>>>>>>>>>>> qb-cmap-request-18020-18036-25-data >>>>>>>>>>>>>>> qb-corosync-blackbox-data >>>>>>>>>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>>>>>>>>> qb-cmap-request-18020-18036-25-header >>>>>>>>>>>>>>> qb-corosync-blackbox-header >>>>>>>>>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with >>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>> corosync release? >>>>>>>>>>>>>> >>>>>>>>>>>>>> For a start, can you try configuring corosync with >>>>>>>>>>>>>> --enable-small-memory-footprint switch? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>>>>>>>>> opposite of generous (per today's standards), but may be the >>>>>>>>>>>>>> result >>>>>>>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your >>>>>>>>>>>>>> case, >>>>>>>>>>>>>> the above build-time toggle might help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>>>>>>>>> likely your best bet (I don't recommended fiddling with >>>>>>>>>>>>>> mlockall() >>>>>>>>>>>>>> and similar measures in corosync). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Of course, feel free to raise a regression if you have a >>>>>>>>>>>>>> reproducible >>>>>>>>>>>>>> comparison between two corosync (plus possibly different >>>>>>>>>>>>>> libraries >>>>>>>>>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, >>>>>>>>>>>>>> etc.). >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jan (Poki) >>>>>>>>>>>>>> _______________________________________________ > _______________________________________________ > Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org