[sniffer] Re: FW: [sniffer] Re: Sniffer 3.0 Froze Mail Server

Pete McNeil Sat, 04 Oct 2008 19:51:13 -0700

Hello Andy,

Saturday, October 4, 2008, 10:21:31 PM, you wrote:

Hi Pete,

Well, I eliminated WeightGate for the time being, just to do my “due diligence”.

Also, since there is a fix sized buffer, I assume actually LOWERING the 3rd number (the allocation for each non-interactive process) would allow for MORE parallel processes to run (as long as the value is still large enough to support each of the applications that rely on it.)

Of course, I assume the “heap” issue in reality is actually a SECONDARY problem ( a symptom of too many non-interactive tasks being launched and not completing). Since the ‘heap’ space is finite, there is a hard limit as to how many processes can be in a wait state at the same time. The problem to focus on is not the known, limited heap, but rather the reason why these processes were unable to complete and thus eventually too many processes being active.

Indeed. Eliminating WeightGate might impact this because it will represent one less process per message.

I just did a search of errors in the SNF logs and didn't find anything unusual.

I was unable to pinpoint the time of the problem -- that will require a harder analysis of the data. Indications are that SNFServer didn't see any significant issues during the period covered by the two logs you sent. When client's talked to it they were served (according to the logs).

You're showing about 40 msg/minute on average.

According to a spot check of log entries SNFServer is finished processing these in an unmeasurable amount of time (0 indicates < 15 ms for both setup, read, scan, and response). Most of the logs performance metrics <p/> indicate s='0' and t='0' -- setup time in ms, and scan time in ms.

On occasion I see some nonzero t values - but nothing unusual (16, 47, 63, etc).

You probably don't need a lot of threads active on your system. If you have provided for a high number then you might consider reducing that number... Processing 1 message per second would exceed your average handily and doesn't take a lot of threads.

If for some reason you were hit with a large number of messages and put them in work in parallel then that might have exhausted the heap.

The new SNF is much more efficient than the old one and so it would have more easily allowed this... Sometimes introducing a more efficient component into a system exposes problems that were hidden by the previous less efficient component -- the less efficient component may have masked the problem by artificially reducing or shaping throughput. When we see this kind of thing we call it a "lens effect" -- the newer component reshapes the dynamics of the system and brings previously unknown problems "into focus".

It's possible the heap problem you experienced was caused by a "lens effect" since the new SNF engine is more efficient and would naturally allow for more messages to be handled concurrently in a burst than the previous version would have allowed.

A theory -- the previous version would naturally be constrained by I/O contention since it would need to create, scan, modify, and remove job control files. This would naturally couple performance to other I/O intensive operations such as writing new messages to the spool etc. The new version does not have any of this overhead and so would allow for an unconstrained ramp-up of new instances -- that might lead to a higher number of concurrent tasks and cause heap exhaustion--- after heap exhaustion is achieved additional tasks build up in a failed and partially initialized state. This typically continues until the failed tasks are manually removed -- since none of them is ever properly initialized none of the tasks can time out, fail, or shut down on their own.

Hope this helps,

Pete McNeil

Chief Scientist,

Arm Research Labs, LLC.

#############################################################
This message is sent to you because you are subscribed to
  the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

[sniffer] Re: FW: [sniffer] Re: Sniffer 3.0 Froze Mail Server

Reply via email to