On 19.11.2011 06:07, Jeremy wrote:
OK, we figured it out. It's a case of too many timeout settings and not having a real DevOps person on hand. There was an obvious error message in Apache's mod_jk.log that I failed to correlate with the problem because I misread the timestamp on one of the many log entries. Doh![info] ajp_connection_tcp_get_message::jk_ajp_common.c (1150): (node5) can't receive the response header message from tomcat, network problems or tomcat (10.xx.xx.xx:8009) is down (errno=11) [error] ajp_get_reply::jk_ajp_common.c (1962): (node5) Tomcat is down or refused connection. No response has been sent to the client (yet) [info] ajp_service::jk_ajp_common.c (2447): (node5) sending request to tomcat failed (recoverable), (attempt=1) There is, I now see, a socket_timeout and a socket_connect_timeout that do not show up in the jkmanager status page, in addition to connection_pool_timeout, connect_timeout, prepost_timeout, and reply_timeout which are listed by jkmanager. We had socket_timeout set to 10 seconds and I didn't know it. Our transactions only take longer than 10 seconds a few times a week, so that's why we weren't seeing it that often. We'll fix by setting: socket_timeout=90 socket_connection_timeout=5000 retry_options=25 unless someone has a better idea.
Have a look at the example configuration conatined in the mod_jk source download tarball. It provides a pretty decent default configuration.
It is "recovery_options" not "retry_options". Furthermore I personally do not recommend the general "socket_timeout", but I *do* recommend to use all other timeouts with appropriate values. There's more info on Timeouts at
http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html Regards, Rainer --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
