Hi Dominik,

Dominik Pospisil wrote:
Hello,
I am having following problem with following failover test scenario.

Cluster setup:
- 1 apache load balancer
- 2 nodes with equal LB factor
- sticky session turned on
- Apache/2.0.52, mod_jk/1.2.26

Test scenario:
1. start 1st node
2. start load driver
3. start 2nd node
4. wait for state transfer (2 minutes)
5. kill 1st node

My experience is that after stage 1 and 2, all clients are handled correctly by 1st node and the second node is set correctly to ERR state. After while, the second none switches to ERR/REC state.

However at stage 4 (after starting 2nd node) the second node will never come up to OK state. I have set both worker maintain period and LB recovery_time to 30s. So i guess that in 2 minutes, the second node should have been re-checked. When I press manually "Reset worker state" button, it comes up immediatelly, but it never happend automatically during maintenance phase.

I would expect, that your load driver only send sticky requests, i.e. requests with eiher cookie or URL encoding for node cluster01. At least that would fit to your observation.

mod_jk detects during maintenance, if a worker was in error state long enough to try again. This happens in your setup, as you can see by the ERR/REC state. The next request that comes in *and does not contain a session id of another node* will be routed to the REC node. Under load, you won't see this state often, because most of the time it should turn into ERR or OK very quick.

Maybe your app sets a cookie and the load driver always presrnts that cookie. That way all further requests would be handled as sticky and routed to the first node.

You can find out by logging %{Cookie}i in your httpd access log. If you include this in your LogFormat, you can see the incoming Cookie header for each request.

Eventually, after killing 1st node, and after returning couple of "503 Service Temporarily Unavailable" exceptions, mod_jk finally recheck 2nd node status, reroute requests to 2nd node and resumes correct operation.

My question is: Why the second node is not recognized before failover? Did I missed something? Or is it a bug?
Thanks,

- Dominik

Attaching worker.properties
---------
worker.list=loadbalancer,status
worker.maintain=30

# modify the host as your host IP or DNS name.
worker.cluster01.port=8009
worker.cluster01.host=172.17.0.39
worker.cluster01.type=ajp13
worker.cluster01.lbfactor=1
#worker.cluster01.redirect=cluster02
# modify the host as your host IP or DNS name.
worker.cluster02.port=8009
worker.cluster02.host=172.17.1.39
worker.cluster02.type=ajp13
worker.cluster02.lbfactor=1
#worker.cluster02.redirect=cluster01
# modify the host as your host IP or DNS name.

# Load-balancing behaviour
worker.loadbalancer.type=lb
worker.loadbalancer.method=Session
worker.loadbalancer.balance_workers=cluster01,cluster02
worker.loadbalancer.sticky_session=1
worker.loadbalancer.recover_time=30

#worker.list=loadbalancer
# Status worker for managing load balancer
worker.status.type=status

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to