Here is the setup. We have one http(2.0.43)+mod_jk(v1.2.5) load-balancing to two 
jboss-tomcat workers (tomcat version 4.1.24).

Everything works fine under normal condition. However, when we try to simulate a 
"network unreachable" scenerio to the first worker, we start to see long pauses in the 
web application before it fails to the 2nd worker. After some research, we found that, 
in mod_jk, there is a loop of 4 blocking connect() trying to establish the connection 
to the first worker before it fails over. Since each connect() takes 15 seconds to 
timeout (on solaris), it takes about 1 minute for mod_jk to fail over. Apparently that 
is not acceptable in real deployment.

So we've modified the connect() to use a non-blocking connect() and specify the 
select() timeout to a low value (say, 3 seconds) and change the looping to only once. 
That solved our problem.

I really think this fix (configurable connect() timeout) should be checked in to 
handle network unreachable problem.

opinions? suggestions?

cheers,

-joe


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to