Here is the setup. We have one http(2.0.43)+mod_jk(v1.2.5) load-balancing to two jboss-tomcat workers (tomcat version 4.1.24).
Everything works fine under normal condition. However, when we try to simulate a "network unreachable" scenerio to the first worker, we start to see long pauses in the web application before it fails to the 2nd worker. After some research, we found that, in mod_jk, there is a loop of 4 blocking connect() trying to establish the connection to the first worker before it fails over. Since each connect() takes 15 seconds to timeout (on solaris), it takes about 1 minute for mod_jk to fail over. Apparently that is not acceptable in real deployment. So we've modified the connect() to use a non-blocking connect() and specify the select() timeout to a low value (say, 3 seconds) and change the looping to only once. That solved our problem. I really think this fix (configurable connect() timeout) should be checked in to handle network unreachable problem. opinions? suggestions? cheers, -joe --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]