On 05/02/2016 03:45 PM, Jan Pokorný wrote: > Hello Radoslaw, > > On 02/05/16 11:47 -0500, Radoslaw Garbacz wrote: >> When testing pacemaker I encountered a start error, which seems to be >> related to reported libqb segmentation fault. >> - cluster started and acquired quorum >> - some nodes failed to connect to CIB, and lost membership as a result >> - restart solved the problem >> >> Segmentation fault reports libqb library in version 0.17.1, a standard >> package provided for CentOS.6. > > Chances are that you are running into this nasty bug: > https://bugzilla.redhat.com/show_bug.cgi?id=1114852 > >> Please let me know if the problem is known, and if there is a remedy (e.g. >> using the latest libqb). > > Try libqb >= 0.17.2. > > [...] > >> Logs from /var/log/messages: >> >> Apr 22 15:46:41 (...) pacemakerd[111190]: notice: Additional logging >> available in /var/log/pacemaker.log >> Apr 22 15:46:41 (...) pacemakerd[111190]: notice: Configured corosync to >> accept connections from group 498: Library error (2) > > IIRC, that last line ^ was one of the symptoms.
Yes, that does look like the culprit. The root cause is libqb being unable to handle 6-digit PIDs, which we can see in the above logs -- "[111190]". As a workaround, you can lower /proc/sys/kernel/pid_max (aka kernel.pid_max sysctl variable), if you don't want to install a newer libqb before CentOS 6.8 is released, which will have the fix. _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org