Hi.
Thank you for all the very detailed information provided.
From what I can see in the logs, at this point I would have to say that my impression is
that this is a problem buried fairly deep in the TCP/IP stack, and both
Apache+mod_proxy_ajp, and Tomcat, may just be suffering the consequences of an underlying
TCP/IP issue (or of a Windows NLB "feature").
In the logs, you have messages like :
java.net.SocketException: Software caused connection abort: socket write error
which is something that comes from the JVM running Tomcat (and even probably from native
code in the JVM).
Similarly, messages in Apache httpd's logs like
[Tue May 29 15:29:43 2012] [error] (OS 10060)A connection attempt failed because the
connected party did not properly respond after a period of time, or established connection
failed because connected host has failed to respond. : ajp_ilink_receive() can't receive
header
[Tue May 29 15:29:43 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Tue May 29 15:29:43 2012] [error] (70007)The timeout specified has expired: proxy: dialog
to 10.11.102.223:9109 (10.11.102.223) failed
look to me like OS-level error conditions, just forwarded by Apache to the logs (at least
the (OS 10060) prefix looks like a Windows error code).
I've read a bit about Windows NLB (just right now, to find out what it is), and it seems
to me that there at least /a possibility/ that combining this with another kind of
load-balancing (as you do with mod_proxy_ajp) may not be the most stable configuration.
From the logs, it really looks as if both the Apache and Tomcat softwares occasionally
find themselves with a suddenly non-existent connection, where ping packets are not being
returned, and/or a read or write socket suddenly becomes unresponsive.
I know that you mentioned that these httpd/tomcat connections are being done on the
respective hosts "private addresses", and I can see in the logs that the problems happen
even on the host's local loop address 127.0.0.1. But on the other hand, setting up NLB
seems to involve a common IP stack driver buried fairly deep in the protocol stack of each
host (and "affinity" parameters), and who knows what that thing is doing, or not doing.
Just to give an idea - and I realise that this article may have no direct relevance
whatsoever to the present issue - see : http://support.microsoft.com/kb/905179
In this case, they are talking about the installation of some software package resulting
indirectly in shortening the packet MTU, and this indirectly causing problems with some
webserver functions. Just to say that you may be faced with some deep issue like this,
because of the NLB implementation.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org