I have two two-node clusters set up using corosync/pacemaker on CentOS 6.8. One cluster is simply sharing an IP, while the other one has numerous services and IP's set up between the two machines in the cluster. Both appear to be working fine. However, I was poking around today, and I noticed that on the single IP cluster, corosync, stonithd, and fenced were using "significant" amounts of processing power - 25% for corosync on the current primary node, with fenced and stonithd often showing 1-2% (not horrible, but more than any other process). In looking at my logs, I see that they are dumping messages like the following to the messages log every second or two: Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: warning: get_xpath_object: No match for //@st_delegate in /st-reply Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: notice: remote_op_done: Operation reboot of fai-dbs1 by fai-dbs2 for stonith_admin.cman.15835@fai-dbs2.c5161517: No such device Sep 27 08:51:50 fai-dbs1 crmd[4855]: notice: tengine_stonith_notify: Peer fai-dbs1 was not terminated (reboot) by fai-dbs2 for fai-dbs2: No such device (ref=c5161517-c0cc-42e5-ac11-1d55f7749b05) by client stonith_admin.cman.15835 Sep 27 08:51:50 fai-dbs1 fence_pcmk[15393]: Requesting Pacemaker fence fai-dbs2 (reset) Sep 27 08:51:50 fai-dbs1 stonith_admin[15394]: notice: crm_log_args: Invoked: stonith_admin --reboot fai-dbs2 --tolerance 5s --tag cman Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: notice: handle_request: Client stonith_admin.cman.15394.2a97d89d wants to fence (reboot) 'fai-dbs2' with device '(any)' Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for fai-dbs2: bc3f5d73-57bd-4aff-a94c-f9978aa5c3ae (0) Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: notice: stonith_choose_peer: Couldn't find anyone to fence fai-dbs2 with <any> Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: warning: get_xpath_object: No match for //@st_delegate in /st-reply Sep 27 08:51:50 fai-dbs1 stonith-ng[4851]: error: remote_op_done: Operation reboot of fai-dbs2 by fai-dbs1 for stonith_admin.cman.15394@fai-dbs1.bc3f5d73: No such device Sep 27 08:51:50 fai-dbs1 crmd[4855]: notice: tengine_stonith_notify: Peer fai-dbs2 was not terminated (reboot) by fai-dbs1 for fai-dbs1: No such device (ref=bc3f5d73-57bd-4aff-a94c-f9978aa5c3ae) by client stonith_admin.cman.15394 Sep 27 08:51:50 fai-dbs1 fence_pcmk[15393]: Call to fence fai-dbs2 (reset) failed with rc=237 After seeing this one the one cluster, I checked the logs on the other and sure enough I'm seeing the same thing there. As I mentioned, both nodes in both clusters *appear* to be operating correctly. For example, the output of "pcs status" on the small cluster is this: [root@fai-dbs1 ~]# pcs status Cluster name: dbs_cluster Last updated: Tue Sep 27 08:59:44 2016 Last change: Thu Mar 3 06:11:00 2016 Stack: cman Current DC: fai-dbs1 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 1 Resources configured Online: [ fai-dbs1 fai-dbs2 ] Full list of resources: virtual_ip (ocf::heartbeat:IPaddr2): Started fai-dbs1 And on the larger cluster, it has services running across both nodes of the cluster, and I've been able to move stuff back and forth without issue. Both nodes have the stonith-enabled property set to false, and no-quorum-policy set to ignore (since they are only two nodes in the cluster). What could be causing the log messages? Is the CPU usage normal, or might there be something I can do about that as well? Thanks. ----------------------------------------------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 ----------------------------------------------- |
BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org