On 05/02/2016 03:45 PM, Jan Pokorný wrote:
> Hello Radoslaw,
> 
> On 02/05/16 11:47 -0500, Radoslaw Garbacz wrote:
>> When testing pacemaker I encountered a start error, which seems to be
>> related to reported libqb segmentation fault.
>> - cluster started and acquired quorum
>> - some nodes failed to connect to CIB, and lost membership as a result
>> - restart solved the problem
>>
>> Segmentation fault reports libqb library in version 0.17.1, a standard
>> package provided for CentOS.6.
> 
> Chances are that you are running into this nasty bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1114852
> 
>> Please let me know if the problem is known, and if  there is a remedy (e.g.
>> using the latest libqb).
> 
> Try libqb >= 0.17.2.
> 
> [...]
> 
>> Logs from /var/log/messages:
>>
>> Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Additional logging
>> available in /var/log/pacemaker.log
>> Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Configured corosync to
>> accept connections from group 498: Library error (2)
> 
> IIRC, that last line ^ was one of the symptoms.

Yes, that does look like the culprit. The root cause is libqb being
unable to handle 6-digit PIDs, which we can see in the above logs --
"[111190]".

As a workaround, you can lower /proc/sys/kernel/pid_max (aka
kernel.pid_max sysctl variable), if you don't want to install a newer
libqb before CentOS 6.8 is released, which will have the fix.

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to