These are direct on hardware

On 2013/11/13 10:05 AM, matthewhall wrote:
Running virtualised or direct on hardware?


On 13.11.2013 06:36, Chris Picton wrote:
I have looked for IO spikes, but not found anything to correlate.

The java garbage collection does seem a possible option.

I will let the list know how things go..

Chris

On 11/13/13 07:51, David Lang wrote:
The other thing I would look for is I/O spikes, try running iostat for a bit and see if you have a disk getting hammered. If so you can tweak kernel settings to not let as much data get cached before it starts getting written out.

David Lang

On Tue, 12 Nov 2013, Leon Towns-von Stauber wrote:

Random-seeming resource usage spikes on a Java application server? I'd suspect garbage collection.

- Leon

On Nov 12, 2013, at 7:57 PM, Chris Picton <[email protected]> wrote:

Hi all

I have a set of servers running asterisk and some java apps which have (so far) unexplained spikes in load average.

A typical spike which occurs at "random" times would see the 1 minute load average load go from around 4 to upwards of 50, sometime approaching 200, within one second.

From proc manpage, the 1 min load average is "number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1 minute"

I am collecting many different stats from proc every second, but nothing I have found can correlate with the spike in load average. The counts of process numbers from /proc/stat and /prov/loadavg do not match up to the sudden spike. I have looked at memory paging, irqs, number of threads, cpu states(intr/iowait/etc), network traffic, disk io, etc but no metric I have yet found indicates it is changing behaviour at the same time as the load average spikes

As I am writing this, I have realized that I am not actually tracking the numbers which would be the direct cause of the load average, which would be to loop through all processes, extract the process state from /proc/<pid>/stat, and add up the various types. This would provide (hopefully) a match so I could see that the load average numbers are "correct", and may indicate a cause (many processes waiting for IO, or lots of the same process (asterisk or java) being scheduled to run at the same time)

While I do that, would anyone have some other idea of how to troubleshoot the cause of very high load spikes?

Regards

Chris

_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/


_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

Reply via email to