Here's my problem:

A tomcat server participating in a farm-deploy scheme goes off-line...

For this particular sticky situation we'll say the connection to the rest
of the cluster was interrupted when a network cable was knocked loose.
While the tomcat server is off-line a parallel deployment takes place. The
off-line server doesn't get the new artifact. When the off-line server
comes back on-line the front-end load-balancer (in my case, haproxy)
doesn't know it's running an older webapp and happily begins to pass
traffic to it.

In a farm deployment scenario, the master node will announce to the cluster
a new artifact is available and then the clustered tomcats will retrieve
and deploy the new artifact. The problem is that the tomcat server that
went off-line never heard the announcement.

There doesn't seem to exist a mechanism to re-announce, or announce at
regular intervals. This seems like a real weakness in the scheme. That
makes me think I'm missing something obvious. Apache's docs on the subject
amount to barely a paragraph. The other howtos out there don't address the
problem of when a server goes off-line temporarily.

My question is directed to other tomcat admins out there who are handling
this scenario gracefully. What are you doing to handle this problem?

Reply via email to