On 26/06/18 12:16, Salvatore D'angelo wrote: > libqb update to 1.0.3 but same issue. > > I know corosync has also these dependencies nspr and nss3. I updated > them using apt-get install, here the version installed: > > libnspr4, libnspr4-dev 2:4.13.1-0ubuntu0.14.04.1 > libnss3, libnss3-dev, libnss3-nssb 2:3.28.4-0ubuntu0.14.04.3 > > but same problem. > > I am working on Ubuntu 14.04 image and I know that packages could be > quite old here. Are there new versions for these libraries? > Where I can download them? I tried to search on google but results where > quite confusing. >
It's pretty unlikely to be the crypto libraries. It's almost certainly in libqb, with a small possibility that of corosync. Which versions did you have that worked (libqb and corosync) ? Chrissie > >> On 26 Jun 2018, at 12:27, Christine Caulfield <ccaul...@redhat.com >> <mailto:ccaul...@redhat.com>> wrote: >> >> On 26/06/18 11:24, Salvatore D'angelo wrote: >>> Hi, >>> >>> I have tried with: >>> 0.16.0.real-1ubuntu4 >>> 0.16.0.real-1ubuntu5 >>> >>> which version should I try? >> >> >> Hmm both of those are actually quite old! maybe a newer one? >> >> Chrissie >> >>> >>>> On 26 Jun 2018, at 12:03, Christine Caulfield <ccaul...@redhat.com >>>> <mailto:ccaul...@redhat.com> >>>> <mailto:ccaul...@redhat.com>> wrote: >>>> >>>> On 26/06/18 11:00, Salvatore D'angelo wrote: >>>>> Consider that the container is the same when corosync 2.3.5 run. >>>>> If it is something related to the container probably the 2.4.4 >>>>> introduced a feature that has an impact on container. >>>>> Should be something related to libqb according to the code. >>>>> Anyone can help? >>>>> >>>> >>>> >>>> Have you tried downgrading libqb to the previous version to see if it >>>> still happens? >>>> >>>> Chrissie >>>> >>>>>> On 26 Jun 2018, at 11:56, Christine Caulfield <ccaul...@redhat.com >>>>>> <mailto:ccaul...@redhat.com> >>>>>> <mailto:ccaul...@redhat.com> >>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>> >>>>>> On 26/06/18 10:35, Salvatore D'angelo wrote: >>>>>>> Sorry after the command: >>>>>>> >>>>>>> corosync-quorumtool -ps >>>>>>> >>>>>>> the error in log are still visible. Looking at the source code it >>>>>>> seems >>>>>>> problem is at this line: >>>>>>> https://github.com/corosync/corosync/blob/master/tools/corosync-quorumtool.c >>>>>>> >>>>>>> if (quorum_initialize(&q_handle, &q_callbacks, &q_type) != >>>>>>> CS_OK) { >>>>>>> fprintf(stderr, "Cannot initialize QUORUM service\n"); >>>>>>> q_handle = 0; >>>>>>> goto out; >>>>>>> } >>>>>>> >>>>>>> if (corosync_cfg_initialize(&c_handle, &c_callbacks) != CS_OK) { >>>>>>> fprintf(stderr, "Cannot initialise CFG service\n"); >>>>>>> c_handle = 0; >>>>>>> goto out; >>>>>>> } >>>>>>> >>>>>>> The quorum_initialize function is defined here: >>>>>>> https://github.com/corosync/corosync/blob/master/lib/quorum.c >>>>>>> >>>>>>> It seems interacts with libqb to allocate space on /dev/shm but >>>>>>> something fails. I tried to update the libqb with apt-get install >>>>>>> but no >>>>>>> success. >>>>>>> >>>>>>> The same for second function: >>>>>>> https://github.com/corosync/corosync/blob/master/lib/cfg.c >>>>>>> >>>>>>> Now I am not an expert of libqb. I have the >>>>>>> version 0.16.0.real-1ubuntu5. >>>>>>> >>>>>>> The folder /dev/shm has 777 permission like other nodes with older >>>>>>> corosync and pacemaker that work fine. The only difference is that I >>>>>>> only see files created by root, no one created by hacluster like >>>>>>> other >>>>>>> two nodes (probably because pacemaker didn’t start correctly). >>>>>>> >>>>>>> This is the analysis I have done so far. >>>>>>> Any suggestion? >>>>>>> >>>>>>> >>>>>> >>>>>> Hmm. t seems very likely something to do with the way the container is >>>>>> set up then - and I know nothing about containers. Sorry :/ >>>>>> >>>>>> Can anyone else help here? >>>>>> >>>>>> Chrissie >>>>>> >>>>>>>> On 26 Jun 2018, at 11:03, Salvatore D'angelo >>>>>>>> <sasadang...@gmail.com <mailto:sasadang...@gmail.com> >>>>>>>> <mailto:sasadang...@gmail.com> >>>>>>>> <mailto:sasadang...@gmail.com> >>>>>>>> <mailto:sasadang...@gmail.com>> wrote: >>>>>>>> >>>>>>>> Yes, sorry you’re right I could find it by myself. >>>>>>>> However, I did the following: >>>>>>>> >>>>>>>> 1. Added the line you suggested to /etc/fstab >>>>>>>> 2. mount -o remount /dev/shm >>>>>>>> 3. Now I correctly see /dev/shm of 512M with df -h >>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>> overlay 63G 11G 49G 19% / >>>>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>>>> osxfs 466G 158G 305G 35% /Users >>>>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>>>> *shm 512M 15M 498M 3% /dev/shm* >>>>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>>>> tmpfs 128M 0 128M 0% /tmp >>>>>>>> >>>>>>>> The errors in log went away. Consider that I remove the log file >>>>>>>> before start corosync so it does not contains lines of previous >>>>>>>> executions. >>>>>>>> <corosync.log> >>>>>>>> >>>>>>>> But the command: >>>>>>>> corosync-quorumtool -ps >>>>>>>> >>>>>>>> still give: >>>>>>>> Cannot initialize QUORUM service >>>>>>>> >>>>>>>> Consider that few minutes before it gave me the message: >>>>>>>> Cannot initialize CFG service >>>>>>>> >>>>>>>> I do not know the differences between CFG and QUORUM in this case. >>>>>>>> >>>>>>>> If I try to start pacemaker the service is OK but I see only >>>>>>>> pacemaker >>>>>>>> and the Transport does not work if I try to run a cam command. >>>>>>>> Any suggestion? >>>>>>>> >>>>>>>> >>>>>>>>> On 26 Jun 2018, at 10:49, Christine Caulfield >>>>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>>>>> >>>>>>>>> On 26/06/18 09:40, Salvatore D'angelo wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Yes, >>>>>>>>>> >>>>>>>>>> I am reproducing only the required part for test. I think the >>>>>>>>>> original >>>>>>>>>> system has a larger shm. The problem is that I do not know >>>>>>>>>> exactly how >>>>>>>>>> to change it. >>>>>>>>>> I tried the following steps, but I have the impression I didn’t >>>>>>>>>> performed the right one: >>>>>>>>>> >>>>>>>>>> 1. remove everything under /tmp >>>>>>>>>> 2. Added the following line to /etc/fstab >>>>>>>>>> tmpfs /tmp tmpfs >>>>>>>>>> defaults,nodev,nosuid,mode=1777,size=128M >>>>>>>>>> 0 0 >>>>>>>>>> 3. mount /tmp >>>>>>>>>> 4. df -h >>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>> overlay 63G 11G 49G 19% / >>>>>>>>>> tmpfs 64M 4.0K 64M 1% /dev >>>>>>>>>> tmpfs 1000M 0 1000M 0% /sys/fs/cgroup >>>>>>>>>> osxfs 466G 158G 305G 35% /Users >>>>>>>>>> /dev/sda1 63G 11G 49G 19% /etc/hosts >>>>>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>>>>> tmpfs 1000M 0 1000M 0% /sys/firmware >>>>>>>>>> *tmpfs 128M 0 128M 0% /tmp* >>>>>>>>>> >>>>>>>>>> The errors are exactly the same. >>>>>>>>>> I have the impression that I changed the wrong parameter. >>>>>>>>>> Probably I >>>>>>>>>> have to change: >>>>>>>>>> shm 64M 11M 54M 16% /dev/shm >>>>>>>>>> >>>>>>>>>> but I do not know how to do that. Any suggestion? >>>>>>>>>> >>>>>>>>> >>>>>>>>> According to google, you just add a new line to /etc/fstab for >>>>>>>>> /dev/shm >>>>>>>>> >>>>>>>>> tmpfs /dev/shm tmpfs defaults,size=512m 0 0 >>>>>>>>> >>>>>>>>> Chrissie >>>>>>>>> >>>>>>>>>>> On 26 Jun 2018, at 09:48, Christine Caulfield >>>>>>>>>>> <ccaul...@redhat.com <mailto:ccaul...@redhat.com> >>>>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>>>> <mailto:ccaul...@redhat.com> >>>>>>>>>>> <mailto:ccaul...@redhat.com>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 25/06/18 20:41, Salvatore D'angelo wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Let me add here one important detail. I use Docker for my test >>>>>>>>>>>> with 5 >>>>>>>>>>>> containers deployed on my Mac. >>>>>>>>>>>> Basically the team that worked on this project installed the >>>>>>>>>>>> cluster >>>>>>>>>>>> on soft layer bare metal. >>>>>>>>>>>> The PostgreSQL cluster was hard to test and if a >>>>>>>>>>>> misconfiguration >>>>>>>>>>>> occurred recreate the cluster from scratch is not easy. >>>>>>>>>>>> Test it was a cumbersome if you consider that we access to the >>>>>>>>>>>> machines with a complex system hard to describe here. >>>>>>>>>>>> For this reason I ported the cluster on Docker for test purpose. >>>>>>>>>>>> I am >>>>>>>>>>>> not interested to have it working for months, I just need a >>>>>>>>>>>> proof of >>>>>>>>>>>> concept. >>>>>>>>>>>> >>>>>>>>>>>> When the migration works I’ll port everything on bare metal >>>>>>>>>>>> where the >>>>>>>>>>>> size of resources are ambundant. >>>>>>>>>>>> >>>>>>>>>>>> Now I have enough RAM and disk space on my Mac so if you tell me >>>>>>>>>>>> what >>>>>>>>>>>> should be an acceptable size for several days of running it >>>>>>>>>>>> is ok >>>>>>>>>>>> for me. >>>>>>>>>>>> It is ok also have commands to clean the shm when required. >>>>>>>>>>>> I know I can find them on Google but if you can suggest me these >>>>>>>>>>>> info >>>>>>>>>>>> I’ll appreciate. I have OS knowledge to do that but I would >>>>>>>>>>>> like to >>>>>>>>>>>> avoid days of guesswork and try and error if possible. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would recommend at least 128MB of space on /dev/shm, 256MB if >>>>>>>>>>> you can >>>>>>>>>>> spare it. My 'standard' system uses 75MB under normal running >>>>>>>>>>> allowing >>>>>>>>>>> for one command-line query to run. >>>>>>>>>>> >>>>>>>>>>> If I read this right then you're reproducing a bare-metal >>>>>>>>>>> system in >>>>>>>>>>> containers now? so the original systems will have a default >>>>>>>>>>> /dev/shm >>>>>>>>>>> size which is probably much larger than your containers? >>>>>>>>>>> >>>>>>>>>>> I'm just checking here that we don't have a regression in memory >>>>>>>>>>> usage >>>>>>>>>>> as Poki suggested. >>>>>>>>>>> >>>>>>>>>>> Chrissie >>>>>>>>>>> >>>>>>>>>>>>> On 25 Jun 2018, at 21:18, Jan Pokorný <jpoko...@redhat.com >>>>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>>>> <mailto:jpoko...@redhat.com> >>>>>>>>>>>>> <mailto:jpoko...@redhat.com>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 25/06/18 19:06 +0200, Salvatore D'angelo wrote: >>>>>>>>>>>>>> Thanks for reply. I scratched my cluster and created it >>>>>>>>>>>>>> again and >>>>>>>>>>>>>> then migrated as before. This time I uninstalled pacemaker, >>>>>>>>>>>>>> corosync, crmsh and resource agents with make uninstall >>>>>>>>>>>>>> >>>>>>>>>>>>>> then I installed new packages. The problem is the same, when >>>>>>>>>>>>>> I launch: >>>>>>>>>>>>>> corosync-quorumtool -ps >>>>>>>>>>>>>> >>>>>>>>>>>>>> I got: Cannot initialize QUORUM service >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here the log with debug enabled: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] couldn't create >>>>>>>>>>>>>> circular mmap >>>>>>>>>>>>>> on /dev/shm/qb-cfg-event-18020-18028-23-data >>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] >>>>>>>>>>>>>> qb_rb_open:cfg-event-18020-18028-23: Resource temporarily >>>>>>>>>>>>>> unavailable (11) >>>>>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>>>>> /dev/shm/qb-cfg-request-18020-18028-23-header >>>>>>>>>>>>>> [18019] pg3 corosyncdebug [QB ] Free'ing ringbuffer: >>>>>>>>>>>>>> /dev/shm/qb-cfg-response-18020-18028-23-header >>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] shm connection FAILED: >>>>>>>>>>>>>> Resource temporarily unavailable (11) >>>>>>>>>>>>>> [18019] pg3 corosyncerror [QB ] Error in connection setup >>>>>>>>>>>>>> (18020-18028-23): Resource temporarily unavailable (11) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried to check /dev/shm and I am not sure these are the >>>>>>>>>>>>>> right >>>>>>>>>>>>>> commands, however: >>>>>>>>>>>>>> >>>>>>>>>>>>>> df -h /dev/shm >>>>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>>>> shm 64M 16M 49M 24% /dev/shm >>>>>>>>>>>>>> >>>>>>>>>>>>>> ls /dev/shm >>>>>>>>>>>>>> qb-cmap-request-18020-18036-25-data >>>>>>>>>>>>>> qb-corosync-blackbox-data >>>>>>>>>>>>>> qb-quorum-request-18020-18095-32-data >>>>>>>>>>>>>> qb-cmap-request-18020-18036-25-header >>>>>>>>>>>>>> qb-corosync-blackbox-header >>>>>>>>>>>>>> qb-quorum-request-18020-18095-32-header >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is 64 Mb enough for /dev/shm. If no, why it worked with >>>>>>>>>>>>>> previous >>>>>>>>>>>>>> corosync release? >>>>>>>>>>>>> >>>>>>>>>>>>> For a start, can you try configuring corosync with >>>>>>>>>>>>> --enable-small-memory-footprint switch? >>>>>>>>>>>>> >>>>>>>>>>>>> Hard to say why the space provisioned to /dev/shm is the direct >>>>>>>>>>>>> opposite of generous (per today's standards), but may be the >>>>>>>>>>>>> result >>>>>>>>>>>>> of automatic HW adaptation, and if RAM is so scarce in your >>>>>>>>>>>>> case, >>>>>>>>>>>>> the above build-time toggle might help. >>>>>>>>>>>>> >>>>>>>>>>>>> If not, then exponentially increasing size of /dev/shm space is >>>>>>>>>>>>> likely your best bet (I don't recommended fiddling with >>>>>>>>>>>>> mlockall() >>>>>>>>>>>>> and similar measures in corosync). >>>>>>>>>>>>> >>>>>>>>>>>>> Of course, feel free to raise a regression if you have a >>>>>>>>>>>>> reproducible >>>>>>>>>>>>> comparison between two corosync (plus possibly different >>>>>>>>>>>>> libraries >>>>>>>>>>>>> like libqb) versions, one that works and one that won't, in >>>>>>>>>>>>> reproducible conditions (like this small /dev/shm, VM image, >>>>>>>>>>>>> etc.). >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jan (Poki) >>>>>>>>>>>>> _______________________________________________ _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org