Hi all..

Over the past few days, I noticed that pcsd and ruby process is pegged at
99% CPU, and commands such as
pcs status pcsd  take up to 5 minutes to complete.  On all active cluster
nodes, top shows:

PID     USER     PR     NI      VIRT      RES     SHR    S  %CPU %MEM  TIME+
COMMAND
27225   haclust+ 20     0       116324   91600     23136 R  99.3
0.1      1943:40            cib
23277   root       20        0          12.868g  8.176g   8460   S  99.7
13.0        407:44.18       ruby

The system log indicates High CIB load detected over the past 2 days:

[root@zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Feb
3" |wc -l
1655
[root@zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Feb
2" |wc -l
1658
[root@zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Feb
1" |wc -l
147
[root@zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Jan
31" |wc -l
444
[root@zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Jan
30" |wc -l
352


The first entries logged on Feb 2 started around 8:42am ...

Feb  2 08:42:12 zs95kj crmd[27233]:  notice: High CIB load detected:
0.974333

This happens to coincide with the time that I had caused a node fence (off)
action by creating a iface-bridge resources and specified
a non-existent vlan slave interface (reported to the group yesterday in a
separate email thread).   It also happened to cause me to lose
quorum in the cluster, because 2 of my 5 cluster nodes were already
offline.

My cluster currently has just over 200 VirtualDomain resources to manage,
plus one iface-bridge resource and one iface-vlan resource.
Both of which are currently configured properly and operational.

I would appreciate some guidance how to proceed with debugging this issue.
I have not taken any recovery actions yet.
I considered stopping the cluster, recycling pcsd.service on all nodes,
restarting cluster... and also, reboot the nodes, if
necessary.  But, didn't want to clear it yet in case there's anything I can
capture while in this state.

Thanks..

Scott Greenlese ... KVM on System Z -  Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  swgre...@us.ibm.com
_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to