Hi Chuck,

Am 10.09.2012 um 21:35 schrieb Chuck Hill <[email protected]>:
>>> "WorkerThread207" that many worker threads indicates two things to me:
>>> 1. Your app configuration is too high.  I'd use a max of 6-10 and a listen 
>>> queue size of around 4 (adjusted to suit your specific needs).  A WO app is 
>>> very, very unlikely to recover from a 200 worker thread backlog in any way 
>>> that is useful to the users
>> 
>> You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and 
>> am eager to watch the behaviour tomorrow.
> 
> You should at least know when there is a problem sooner.  Then as quickly as 
> you can, get a thread dump with jstack.
> 
>> 
>> There are up to 100 users concurrently (it's a backoffice app), although 
>> concurrently running requests are typically not more than 2-3, plus 1-2 
>> DirectActions, plus possibly 1-2 long response pages running statistics 
>> stuff.
> 
> OK, the 4/8/8/6 numbers you have seem reasonable for that load.
> 
> 
>>> 2. You have a thread that is taking a long time to return a result.  If you 
>>> are dispatching requests concurrently, then this is most likely stuck in 
>>> EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to 
>>> some external process.  You could also have a deadlock.  If you are not 
>>> dispatching requests concurrently, then this delay could be in other code.
>> 
>> When that situation occurs, the app is not using CPU any more, neither is 
>> the database. It often doesn't respond to SIGTERM any more and needs SIGKILL 
>> to terminate so we can restart.
> 
> That sounds like what a blocked non-daemon thread would cause.
> 
> 
>>> The traces below do not show the problem.  If you want to send a full dump, 
>>> I am willing to look at it.  It is possible that the problem had resolved 
>>> by the time you took this dump.  What you show below is normal for a lot of 
>>> worker threads.  WorkerThread206 is waiting for a new request, 
>>> WorkerThread207 is idle waiting for something to do in the future.
>> 
>> Thanks for the offer; here is the full jstack output:
>> http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt
> 
> Other than having a large number of idle worker threads, there is nothing in 
> that trace that indicates the problem.  In my experience, that means that 
> they problem has resolved itself and the application recovered.  You will 
> need to run jstack closer to the start of the problem even to capture what is 
> going wrong.

The state the app was in when I took that jstack was that no login was possible 
and user's requests would not return, ultimately running into "no instance" 
responses after the timeout elapsed.

If the problem persists, I think I'll set up a cronjob to record jstacks every 
couple of minutes or so.

Note that I recently switched to Wonder for this project (using all the Wonder 
base classes), and since I did, this problem occurred more frequently. It's now 
almost once a day, and was about once a week before. I switched from 
MultiECLockManager to ERXEC with autolocking in the process.

Maik


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to