[lopsa-tech] What can cause sudden load average spikes

Chris Picton Tue, 12 Nov 2013 20:10:56 -0800

Hi all

I have a set of servers running asterisk and some java apps which have(so far) unexplained spikes in load average.

A typical spike which occurs at "random" times would see the 1 minuteload average load go from around 4 to upwards of 50, sometimeapproaching 200, within one second.

From proc manpage, the 1 min load average is "number of jobs in therun queue (state R) or waiting for disk I/O (state D) averaged over 1minute"

I am collecting many different stats from proc every second, but nothingI have found can correlate with the spike in load average. The counts ofprocess numbers from /proc/stat and /prov/loadavg do not match up to thesudden spike. I have looked at memory paging, irqs, number of threads,cpu states(intr/iowait/etc), network traffic, disk io, etc but no metricI have yet found indicates it is changing behaviour at the same time asthe load average spikes

As I am writing this, I have realized that I am not actually trackingthe numbers which would be the direct cause of the load average, whichwould be to loop through all processes, extract the process state from/proc/<pid>/stat, and add up the various types. This would provide(hopefully) a match so I could see that the load average numbers are"correct", and may indicate a cause (many processes waiting for IO, orlots of the same process (asterisk or java) being scheduled to run atthe same time)

While I do that, would anyone have some other idea of how totroubleshoot the cause of very high load spikes?


Regards

Chris

_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

[lopsa-tech] What can cause sudden load average spikes

Reply via email to