Hi Pete,

Well, I eliminated WeightGate for the time being, just to do my "due
diligence".

Also, since there is a fix sized buffer, I assume actually LOWERING the 3rd
number (the allocation for each non-interactive process) would allow for
MORE parallel processes to run (as long as the value is still large enough
to support each of the applications that rely on it.)

Of course, I assume the "heap" issue in reality is actually a SECONDARY
problem ( a symptom of too many non-interactive tasks being launched and not
completing). Since the 'heap' space is finite, there is a hard limit as to
how many processes can be in a wait state at the same time. The problem to
focus on is not the known, limited heap, but rather the reason why these
processes  were unable to complete and thus eventually too many processes
being active.

Best Regards,
Andy

From: Pete McNeil [mailto:[EMAIL PROTECTED] 
Sent: Saturday, October 04, 2008 10:07 PM
To: Andy Schmidt
Cc: [EMAIL PROTECTED]
Subject: Re: FW: [sniffer] Re: Sniffer 3.0 Froze Mail Server

 

Hello Andy,

 

Saturday, October 4, 2008, 9:22:39 PM, you wrote:

 


> 

Hi Pete,

Here the log files. 

I can't tell you WHEN the problem was triggered. I was off site and was
alerted around noon that the SMTP service had become unresponsive. I assumed
it had crashed, but found it running. Thus I tried restarting the SMTP
service, but after shutting down, it wouldn't allow me to restart. That's
when I started looking a bit more closely.

Once I realized that I had all these SNFclient processes running (I checked
the event log to see if it would give me any clue - but since the errors had
been occurring for a while, my system event log had wrapped around, so I
couldn't tell when it actually started and how long it may have taken
between the actual problem and until the SMTP service became unresponsive.

This Imail server is a PowerEdge 2950, Quad CPU, 3GHz.

2 GB of RAM and normally using about 1.5 GB of virtual RAM and on weekends,
CPU load is usually below 10%.

When this was going on, I didn't pay close attention because I wasn't quite
sure yet what was going on and was trying to figure out how to get out of
it. But, based on the memory use graph, I would guess it had maxed out 4 GB
of virtual RAM, which eventually starved the SMTP service and prevented it
from accepting more connections.. As soon as I flushed the command line
programs, the memory curve dropped very sharply by easily half.

Sorry - don't have anything more specific.

 

 

I've been watching your telemetry and I don't think the problem was
triggered by an ordinary overload. Your message rate is not high enough to
cause that -- SNFClients will only wait about 30 seconds or so at most if
they are unable to make contact - - even on the busiest systems.

 

The other thing that strikes me is that you had to kill a lot of
imailsrv.exe instances as well-- this is new and very different.

 

Once the "mystery heap" was exhausted I would expect SNFClient instances to
build up in a broken state (0x0000142) but there is no good reason for
imailsrv instances to build up that I can think of -- except maybe some kind
of list processing event? (IIRC, imailsrv is called to handle list
processing requests through an alias -- it's been a while).

 

I will check the SNF log to see if I can identify anything useful.

 

Thanks,

 

_M

 

-- 

Pete McNeil

Chief Scientist,

Arm Research Labs, LLC.

Reply via email to