Re: [ClusterLabs] Never join a list without a problem...

Jeffrey Westgate Mon, 27 Feb 2017 11:52:53 -0800

I think I may be on to something.  It seems that every time my boxes start 
showing increased host load, the preceding change that takes place is:


 crmd:     info: throttle_send_command: New throttle mode: 0100 (was 0000)

I'm attaching the last 50-odd lines from the corosync.log.  It just happens 
that  - at the moment - our host load on this box is coming back down.  No host 
load issue (0.00 load) immediately preceding this part of the log.

I know the log shows them in reverse order, but it shows them as the same log 
item, and printed at the same time.  I'm assuming the throttle change takes 
place and that increases the load, not the other way around....

So - what is the throttle mode?

--
Jeff Westgate
DIS UNIX/Linux System Administrator

------------------------------
Message: 3
Date: Mon, 27 Feb 2017 13:26:30 +0000
From: Jeffrey Westgate <[email protected]>
To: "[email protected]" <[email protected]>
Subject: Re: [ClusterLabs] Never join a list without a problem...
Message-ID:
        
<a36b14fa9aa67f4e836c0ee59dea89c4015b20c...@cm-sas-mbx-07.sas.arkgov.net>

Content-Type: text/plain; charset="us-ascii"

Thanks, Ken.

Our late guru was the admin who set all this up, and it's been rock solid until 
recent oddities started cropping up.  They still function fine - they've just 
developed some... quirks.

I found the solution before I got your reply, which was essentially what we 
did; update all but pacemaker, reboot, stop pacemaker, update pacemaker, 
reboot.  That process was necessary because they've been running sooo long, 
pacemaker would not stop.  it would try, then seemingly stall after several 
minutes.

We're good now, up-to-date-wise, and stuck only with the initial issue we were 
hoping to eliminate by updating/patching EVERYthing.  And we honestly don't 
know what may be causing it.

We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, and 
we cannot set a clock by it - while the machine is 95% idle (or more according 
to 'top'), the host load shoots up to 50 or 60%.  It takes about 20 minutes to 
peak, and another 30 to 45 minutes to come back down to baseline, which is 
mostly 0.00.  (attached hostload.pdf)  This happens to both machines, randomly, 
and is concerning, as we'd like to find what's causing it and resolve it.

We were hoping "uptime kernel bug", but patching has not helped.  There seems 
to be no increase in the number of processes running, and the processes running 
do not take any more cpu time.  They are DNS forwarding resolvers, but there is 
no correlation between dns requests and load increase - sometimes (like this 
morning) it rises around 1 AM when the dns load is minimal.

The oddity is - these are the only two boxes with this issue, and we have a 
couple dozen at the same OS and level.  Only these two, with this role and this 
particular package set have the issue.

--
Jeff

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hostload.pdf
Type: application/pdf
Size: 32748 bytes
Desc: hostload.pdf
URL: 
<http://lists.clusterlabs.org/pipermail/users/attachments/20170227/5f468b78/attachment.pdf>

------------------------------

_______________________________________________
Users mailing list
[email protected]
http://lists.clusterlabs.org/mailman/listinfo/users


End of Users Digest, Vol 25, Issue 75
*************************************

Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:     info: 
crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
do_state_transition:      State transition S_IDLE -> S_POLICY_ENGINE [ 
input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:     info: 
do_state_transition:      Progressed to state S_POLICY_ENGINE after 
C_TIMER_POPPED
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
update_validation:        Transformation upgrade-1.3.xsl successful
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
update_validation:        Transformed the configuration from pacemaker-1.2 to 
pacemaker-2.4
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
cli_config_update:        Your configuration was internally updated to the 
latest version (pacemaker-2.4)
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
process_pe_message:       Input has not changed since last time, not saving to 
disk
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:   notice: 
unpack_config:    On loss of CCM Quorum: Ignore
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
determine_online_status:  Node resolver-lb3 is online
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
determine_online_status:  Node resolver-lb4 is online
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
native_print:     testIP  (ocf::heartbeat:IPaddr2):       Started resolver-lb3
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
native_print:     resolver2IP     (ocf::heartbeat:IPaddr2):       Started 
resolver-lb4
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
native_print:     resolver1IP     (ocf::heartbeat:IPaddr2):       Started 
resolver-lb3
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
LogActions:       Leave   testIP  (Started resolver-lb3)
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
LogActions:       Leave   resolver2IP     (Started resolver-lb4)
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:     info: 
LogActions:       Leave   resolver1IP     (Started resolver-lb3)
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:     info: 
do_state_transition:      State transition S_POLICY_ENGINE -> 
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
origin=handle_response ]
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:     info: 
do_te_invoke:     Processing graph 298 (ref=pe_calc-dc-1488223475-407) derived 
from /var/lib/pacemaker/pengine/pe-input-84.bz2
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
run_graph:        Transition 298 (Complete=0, Pending=0, Fired=0, Skipped=0, 
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-84.bz2): Complete
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:     info: do_log:   
FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Feb 27 13:24:35 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
do_state_transition:      State transition S_TRANSITION_ENGINE -> S_IDLE [ 
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 27 13:24:35 [2610] resolver-lb3.state.ar.us    pengine:   notice: 
process_pe_message:       Calculated Transition 298: 
/var/lib/pacemaker/pengine/pe-input-84.bz2
Feb 27 13:25:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 1.920000
Feb 27 13:25:12 [2611] resolver-lb3.state.ar.us       crmd:     info: 
throttle_send_command:    New throttle mode: 0100 (was 0000)
Feb 27 13:25:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 6.860000
Feb 27 13:26:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 10.070000
Feb 27 13:26:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 12.400000
Feb 27 13:27:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 14.600000
Feb 27 13:27:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 16.049999
Feb 27 13:28:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 16.870001
Feb 27 13:28:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 16.770000
Feb 27 13:29:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 17.629999
Feb 27 13:29:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 20.320000
Feb 27 13:30:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 23.540001
Feb 27 13:30:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 25.410000
Feb 27 13:31:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 26.480000
Feb 27 13:31:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 28.410000
Feb 27 13:32:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 33.369999
Feb 27 13:32:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 35.020000
Feb 27 13:33:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 36.090000
Feb 27 13:33:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 36.810001
Feb 27 13:34:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 39.369999
Feb 27 13:34:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 41.840000
Feb 27 13:35:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 42.799999
Feb 27 13:35:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 43.220001
Feb 27 13:36:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 43.029999
Feb 27 13:36:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 44.619999
Feb 27 13:37:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 45.529999
Feb 27 13:37:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 46.189999
Feb 27 13:38:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 47.779999
Feb 27 13:38:42 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 46.630001
Feb 27 13:39:12 [2611] resolver-lb3.state.ar.us       crmd:   notice: 
throttle_handle_load:     High CPU load detected: 45.520000

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Never join a list without a problem...

Reply via email to