Hi.

Thank you for all the very detailed information provided.

From what I can see in the logs, at this point I would have to say that my impression is that this is a problem buried fairly deep in the TCP/IP stack, and both Apache+mod_proxy_ajp, and Tomcat, may just be suffering the consequences of an underlying TCP/IP issue (or of a Windows NLB "feature").

In the logs, you have messages like :

java.net.SocketException: Software caused connection abort: socket write error

which is something that comes from the JVM running Tomcat (and even probably from native code in the JVM).

Similarly, messages in Apache httpd's logs like

[Tue May 29 15:29:43 2012] [error] (OS 10060)A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. : ajp_ilink_receive() can't receive header
[Tue May 29 15:29:43 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Tue May 29 15:29:43 2012] [error] (70007)The timeout specified has expired: proxy: dialog to 10.11.102.223:9109 (10.11.102.223) failed

look to me like OS-level error conditions, just forwarded by Apache to the logs (at least the (OS 10060) prefix looks like a Windows error code).

I've read a bit about Windows NLB (just right now, to find out what it is), and it seems to me that there at least /a possibility/ that combining this with another kind of load-balancing (as you do with mod_proxy_ajp) may not be the most stable configuration. From the logs, it really looks as if both the Apache and Tomcat softwares occasionally find themselves with a suddenly non-existent connection, where ping packets are not being returned, and/or a read or write socket suddenly becomes unresponsive.

I know that you mentioned that these httpd/tomcat connections are being done on the respective hosts "private addresses", and I can see in the logs that the problems happen even on the host's local loop address 127.0.0.1. But on the other hand, setting up NLB seems to involve a common IP stack driver buried fairly deep in the protocol stack of each host (and "affinity" parameters), and who knows what that thing is doing, or not doing.

Just to give an idea - and I realise that this article may have no direct relevance whatsoever to the present issue - see : http://support.microsoft.com/kb/905179 In this case, they are talking about the installation of some software package resulting indirectly in shortening the packet MTU, and this indirectly causing problems with some webserver functions. Just to say that you may be faced with some deep issue like this, because of the NLB implementation.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to