>>> "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> schrieb am 26.07.2022 >>> um 21:36 in Nachricht <1994685463.141245271.1658864186207.javamail.zim...@helmholtz-muenchen.de>:
> > ----- On 26 Jul, 2022, at 20:06, Ulrich Windl > ulrich.wi...@rz.uni-regensburg.de wrote: > >> Hi Bernd! >> >> I think the answer may be some time before the timeout was reported; maybe a >> network issue? Or a very high load. It's hard to say from the logs... > > Yes, i had a high load before: > Jul 20 00:17:42 [32512] ha-idg-1 crmd: notice: > throttle_check_thresholds: High CPU load detected: 90.080002 > Jul 20 00:18:12 [32512] ha-idg-1 crmd: notice: > throttle_check_thresholds: High CPU load detected: 76.169998 > Jul 20 00:18:42 [32512] ha-idg-1 crmd: notice: > throttle_check_thresholds: High CPU load detected: 85.629997 > Jul 20 00:19:12 [32512] ha-idg-1 crmd: notice: > throttle_check_thresholds: High CPU load detected: 70.660004 > Jul 20 00:19:42 [32512] ha-idg-1 crmd: notice: > throttle_check_thresholds: High CPU load detected: 58.340000 > Jul 20 00:20:12 [32512] ha-idg-1 crmd: info: > throttle_check_thresholds: Moderate CPU load detected: 48.740002 > Jul 20 00:20:12 [32512] ha-idg-1 crmd: info: > throttle_send_command: New throttle mode: 0010 (was 0100) > Jul 20 00:20:42 [32512] ha-idg-1 crmd: info: > throttle_check_thresholds: Moderate CPU load detected: 41.889999 > Jul 20 00:21:12 [32512] ha-idg-1 crmd: info: > throttle_send_command: New throttle mode: 0001 (was 0010) > Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: > child_timeout_callback: dlm_monitor_30000 process (PID 11816) timed out > Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: operation_finished: > dlm_monitor_30000:11816 - timed out after 20000ms > Jul 20 00:21:56 [32512] ha-idg-1 crmd: error: process_lrm_event: > Result of monitor operation for dlm on ha-idg-1: Timed Out | call=1255 > key=dlm_monitor_30000 timeout=20000ms > Jul 20 00:21:56 [32512] ha-idg-1 crmd: info: exec_alert_list: > Sending resource alert via smtp_alert to informatic....@helmholtz-muenchen.de > Jul 20 00:21:56 [12204] ha-idg-1 lrmd: info: > process_lrmd_alert_exec: Executing alert smtp_alert for > 8f934e90-12f5-4bad-b4f4-55ac933f01c6 > > Can that interfere with DLM ? It depends ;-) If the CPU load is mostly user load, then (also depending on the number of CPUs you have) proably not, but if the load is I/O or system load, it could affect any pacemaker process in a bad way. I think you'll have to analyze your load; maybe adjusting timeouts. You could use monit to examine your system load (this is just some idle VM): status OK monitoring status Monitored monitoring mode active on reboot start load average [0.00] [0.00] [0.00] cpu 0.2%usr 0.1%sys 0.0%nice 0.0%iowait 0.0%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice memory usage 442.1 MB [22.3%] swap usage 20.5 MB [1.0%] uptime 13d 17h 41m boot time Thu, 14 Jul 2022 17:40:58 filedescriptors 1376 [0.7% of 198048 limit] data collected Thu, 28 Jul 2022 11:20:41 You could configurer action scripts like this: if loadavg (1min) per core > 4 then exec "/var/lib/monit/log-top.sh" if loadavg (5min) per core > 2 then exec "/var/lib/monit/log-top.sh" if loadavg (15min) per core > 1 then exec "/var/lib/monit/log-top.sh" if memory usage > 90% for 2 cycles then exec "/var/lib/monit/log-top.sh" if swap usage > 25% for 2 cycles then exec "/var/lib/monit/log-top.sh" if swap usage > 50% then exec "/var/lib/monit/log-top.sh" if cpu usage (system) > 20% for 3 cycles then exec "/var/lib/monit/log-top.sh" if cpu usage (wait) > 80% then exec "/var/lib/monit/log-top.sh" A possible script could be (this mess created by< myself): #!/bin/sh sect() { echo "--- $1 ---" shift eval "$@" } { echo "========== $(/bin/date) ==========" sect 'MONIT env' 'env | grep ^MONIT_' sect 'mpstat' /usr/bin/mpstat sect 'vmstat' /usr/bin/vmstat sect 'top' /usr/bin/top -b -n 1 -Hi } >> /var/log/monit/top.log Regards, Ulrich > > Bernd _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/