Hi Ingo,
On Nov 14, 2007, at 12:46 AM, [EMAIL PROTECTED] wrote:
Hi Chuck,
many thanks for your suggestions. There are enough free file
handles, that can't be the problem. We have 15-20 instances per
application server with different ports, so that shouldn't be a
problem. And it is possible to continue working after the session
timeout. We are using a frameset, on top there is a menu. If the
menu is still visible and we get the connection timeout displayed
in the lower (which happens normally) frame we can just click at
any item in the menu and continue the work like nothing happened.
We will try the measuring of response times as well but we don't
expect much of this. We get the session timeout immediately after
clicking somewhere, there is no waiting time.
That sounds odd to me. I am not sure why that would be. Is it
possible that the previous request for that instance timed out and
caused wotaskd to mark it as unresponsive? If so, the next request
would result in the session timeout message appearing immediately.
The only other thing I can think of is that it might be network
related. When the app does this, take a look in top. How many
threads does each instance have?
And it happens in times without much traffic as well.
That is what makes me think it might be something on those machine
external to your applications.
But of course we will give it a try. I think it really is some
problem with server configuration. We didn't have this problems for
two years or something, but now the last months we get hundreds of
"false" session timeouts a day.
Has anything at all changed on the machines or network just before it
started to happen?
Chuck
On Nov 7, 2007, at 1:47 AM, [EMAIL PROTECTED] wrote:
Hi,
we have got a problem for some months now that we can’t find a
solution for.
The situation: We have an application that is running on four
different application servers (with quite some instances on each
server, servers running on linux) controlled by monitors running
on two of those servers (each monitor is responsible for 2
servers). The wotaskd is running on each server as well. Finally
we got two web servers (Apache 2.0.49). We use Java 1.4.2,
WebObjects 5.2.3.
The problem: Several times a day on each of the instances we got
session timeouts (SessionRestorationErrors). But the sessions
don’t time out, the requests are placed on the wrong instances.
Of course, the session ids are not known on those wrong instances
so the SessionRestorationErrors take place.
What we have done so far: we tried setting send timeout, receive
timeout and connect timeout in “Load Balancing and Adaptor
Settings” to values of one minute and above without any success.
That is the classic solution for this type of problem. I can
think of two explanations why it might not be working. The first
is that your instances are stalling for longer than one minute.
The other is that the problem is at a level below WebObjects.
For the first situation, we can use the apps to diagnose it. Add
this to your Application,
public WOResponse dispatchRequest(WORequest request)
{
WOResponse response;
NSTimestamp startTime = new NSTimestamp();
response = super.dispatchRequest(request);
NSTimestamp stopTime = new NSTimestamp();
long milliseconds = stopTime.getTime() -
startTime.getTime();
NSLog.debug.appendln("," + request.uri() + ", -
elapsed time: ," + (milliseconds / 1000.0) );
return response;
}
You can easily grep this out of the log, separate it by commas,
and sort by the time to see what the longest lag in returning a
response it. If it is over a minute, I would look at:
1. Slow queries / DB contention
2. Excessive garbage collection due to memory starvation
3. Other processes on the machine (a cron job?) taking too many
resources
If it is not over a minute, see below.
We are logging the woadaptor now. It seems we have got some kind
of connection trouble:
Error: couldn't connect to 10.0.0.40 (1085): Operation now in
progress
Error: Error connecting to server 10.0.0.40
Warn: Unable to find instance 55. Attempting to select another.
Warn: Unable to find instance 55. Attempting to select another.
Warn: Unable to find instance 60. Attempting to select another.
But 10.0.0.40:1085 is up and running. This error message is just
been thrown about every 10 or 20 minutes and not all the time.
We found some similar problems in mailing lists but none was
helpful so far. Any suggestions how we can get rid of this
problem? Thanks in advance.
The only other thing I can think of is that you have problems in
your network or the app servers are running out of ports / file
handles or some similar problem below the level of WebObjects. I
have no idea how to debug that.
Chuck
--
Practical WebObjects - for developers who want to increase their
overall knowledge of WebObjects or who are trying to solve specific
problems.
http://www.global-village.net/products/practical_webobjects
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-deploy mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-deploy/archive%40mail-archive.com
This email sent to [EMAIL PROTECTED]