Hi Chuck, Am 10.09.2012 um 21:35 schrieb Chuck Hill <[email protected]>: >>> "WorkerThread207" that many worker threads indicates two things to me: >>> 1. Your app configuration is too high. I'd use a max of 6-10 and a listen >>> queue size of around 4 (adjusted to suit your specific needs). A WO app is >>> very, very unlikely to recover from a 200 worker thread backlog in any way >>> that is useful to the users >> >> You may be right, they were at 16/512/8/128. I just set them to 4/8/8/6 and >> am eager to watch the behaviour tomorrow. > > You should at least know when there is a problem sooner. Then as quickly as > you can, get a thread dump with jstack. > >> >> There are up to 100 users concurrently (it's a backoffice app), although >> concurrently running requests are typically not more than 2-3, plus 1-2 >> DirectActions, plus possibly 1-2 long response pages running statistics >> stuff. > > OK, the 4/8/8/6 numbers you have seem reasonable for that load. > > >>> 2. You have a thread that is taking a long time to return a result. If you >>> are dispatching requests concurrently, then this is most likely stuck in >>> EOControl/EOAccess (e.g. waiting for a slow query result) or connecting to >>> some external process. You could also have a deadlock. If you are not >>> dispatching requests concurrently, then this delay could be in other code. >> >> When that situation occurs, the app is not using CPU any more, neither is >> the database. It often doesn't respond to SIGTERM any more and needs SIGKILL >> to terminate so we can restart. > > That sounds like what a blocked non-daemon thread would cause. > > >>> The traces below do not show the problem. If you want to send a full dump, >>> I am willing to look at it. It is possible that the problem had resolved >>> by the time you took this dump. What you show below is normal for a lot of >>> worker threads. WorkerThread206 is waiting for a new request, >>> WorkerThread207 is idle waiting for something to do in the future. >> >> Thanks for the offer; here is the full jstack output: >> http://akaihi.selbstdenker.com/~maik/jstack_powerd_20120910.txt > > Other than having a large number of idle worker threads, there is nothing in > that trace that indicates the problem. In my experience, that means that > they problem has resolved itself and the application recovered. You will > need to run jstack closer to the start of the problem even to capture what is > going wrong.
The state the app was in when I took that jstack was that no login was possible and user's requests would not return, ultimately running into "no instance" responses after the timeout elapsed. If the problem persists, I think I'll set up a cronjob to record jstacks every couple of minutes or so. Note that I recently switched to Wonder for this project (using all the Wonder base classes), and since I did, this problem occurred more frequently. It's now almost once a day, and was about once a week before. I switched from MultiECLockManager to ERXEC with autolocking in the process. Maik _______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [email protected]
