Are you running as root or as _appserver. the number of handle is much lower for unprivileged accounts and you may very well be running out. Try to look at the open connection with lsof.

Pierre
--
Pierre Frisch
[EMAIL PROTECTED]


On Nov 14, 2007, at 0:46, [EMAIL PROTECTED] wrote:

Hi Chuck,

many thanks for your suggestions. There are enough free file handles, that can't be the problem. We have 15-20 instances per application server with different ports, so that shouldn't be a problem. And it is possible to continue working after the session timeout. We are using a frameset, on top there is a menu. If the menu is still visible and we get the connection timeout displayed in the lower (which happens normally) frame we can just click at any item in the menu and continue the work like nothing happened.

We will try the measuring of response times as well but we don't expect much of this. We get the session timeout immediately after clicking somewhere, there is no waiting time. And it happens in times without much traffic as well. But of course we will give it a try. I think it really is some problem with server configuration. We didn't have this problems for two years or something, but now the last months we get hundreds of "false" session timeouts a day.

Ingo

On Nov 7, 2007, at 1:47 AM, [EMAIL PROTECTED] wrote:

Hi,

we have got a problem for some months now that we can’t find a solution for.

The situation: We have an application that is running on four different application servers (with quite some instances on each server, servers running on linux) controlled by monitors running on two of those servers (each monitor is responsible for 2 servers). The wotaskd is running on each server as well. Finally we got two web servers (Apache 2.0.49). We use Java 1.4.2, WebObjects 5.2.3.

The problem: Several times a day on each of the instances we got session timeouts (SessionRestorationErrors). But the sessions don’t time out, the requests are placed on the wrong instances. Of course, the session ids are not known on those wrong instances so the SessionRestorationErrors take place. What we have done so far: we tried setting send timeout, receive timeout and connect timeout in “Load Balancing and Adaptor Settings” to values of one minute and above without any success.

That is the classic solution for this type of problem. I can think of two explanations why it might not be working. The first is that your instances are stalling for longer than one minute. The other is that the problem is at a level below WebObjects.

For the first situation, we can use the apps to diagnose it. Add this to your Application,

   public WOResponse dispatchRequest(WORequest request)
   {
           WOResponse response;
           NSTimestamp startTime = new NSTimestamp();

           response = super.dispatchRequest(request);
           NSTimestamp stopTime = new NSTimestamp();
long milliseconds = stopTime.getTime() - startTime.getTime();

NSLog.debug.appendln("," + request.uri() + ", - elapsed time: ," + (milliseconds / 1000.0) );

       return response;
   }


You can easily grep this out of the log, separate it by commas, and sort by the time to see what the longest lag in returning a response it. If it is over a minute, I would look at:

1. Slow queries / DB contention
2. Excessive garbage collection due to memory starvation
3. Other processes on the machine (a cron job?) taking too many resources

If it is not over a minute, see below.

We are logging the woadaptor now. It seems we have got some kind of connection trouble:

Error: couldn't connect to 10.0.0.40 (1085): Operation now in progress
Error: Error connecting to server 10.0.0.40
Warn: Unable to find instance 55. Attempting to select another.
Warn: Unable to find instance 55. Attempting to select another.
Warn: Unable to find instance 60. Attempting to select another.

But 10.0.0.40:1085 is up and running. This error message is just been thrown about every 10 or 20 minutes and not all the time. We found some similar problems in mailing lists but none was helpful so far. Any suggestions how we can get rid of this problem? Thanks in advance.

The only other thing I can think of is that you have problems in your network or the app servers are running out of ports / file handles or some similar problem below the level of WebObjects. I have no idea how to debug that.

Chuck


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-deploy mailing list (Webobjects- [EMAIL PROTECTED])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-deploy/pierre%40apple.com

This email sent to [EMAIL PROTECTED]

Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-deploy mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-deploy/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to