Re: mod_jk error detection

Rainer Jung Wed, 25 Jul 2007 12:59:14 -0700

Hi Scott,

I thoroughly enjoyed the updated docs.  It is just what I needed.  I
just want to mention a few inferences I have now from reading it.


Thanks.

In a load balanced setup using connect_timeout and prepost_timeout, this
will protect me from sending either newly established connections (rare
event due to persistence) as well as each and every individual request
from being sent to a failed tomcat node based on CPING/CPONG messages.
These messages only detect whether or not the container (I'm using
tomcat) is healthy enough to respond to such a message but not
necessarily anything more, correct?  Basically, its ajp listener is
responsive.  Plus, if I need more high speed error detection I can use


That's correct.

reply_timeout.  Sound correct?

That one, reply_timeout, is not really meant for high speed detection.Usually you've got an ap, that every now and then needs 10 or 20 secondsfor an answer and you don't like to disable a worker automaticallybecause of those rare events. So normally one sets reply_timeout to 1, 2or 3 minutes.

Now with the new max_reply_timeouts one can experiment with lowervalues. It's new, so not enough experience for good suggestions.

I get confused on the recovery_options section.  How does it work in a
load balanced environment?  If tomcat receives a request and processes
some of it followed by a catastrophic failure before completing the
response, what exactly does a repeated request from the client do?
Assuming recovery_options is set to 0.

Value "0" means, if you don't get any part of the answer and an erroroccurs (network, reply_timeout, ...) then send the same request again toanother member of the load balancer (if a working member is remaining).

That's why you usualy really want to not use value "0" in case your apphas data changing use cases. Most apps have.

If you use REST principles and HEAD and GET is always idempotent foryour app, the new (version 1.2.24) bits 8 and 16 are your friend!

Also, I get confused with the section describing the retries directive.
In a load balanced environment, would the connector retry no matter the
state (tcp state here) of the connection whether it be established
already?  Would it retry against the same backend tomcat server?  The
reason I ask is because the docs say "If the load balancer can not get a
free connection for a member worker from the pool, it will try again a
number of times given by retries." I highlighted the words that confuse
me.

We have to strongly make a difference between retries of a non-lb workerand of a load balancer worker. A normal worker has a simple retryprocedure, independant of the fact, if it is used directly or as part ofan lb. If it detects an error it uses another pool connection and bydefault tries once more.

An lb has another idea of retries. It uses retries if all connections toa backend are busy. For Apache with default config, this should neverhappen, because we allow as many connections as threads per process. Soany request should be able to get a connection without waiting (maybe itneeds to start a new one). For the other web servers we don't have agood way to detect the "correct" pool size. In some cases even forApache it might be interesting to use a smaler pool size, in case thebackend is only used occasionally and/or you want to prevent it fromgetting flodded in case of congestion. Then you might run out ofavailable connections and requests will have to wait. LB retriesconfigure this waiting.

Every 60 seconds would we expect the connector to attempt to send a
valid request to a backend tomcat and fail or once a worker goes into
error state do we only check with CPING/CPONG requests during the
maintenance cycle?

The maintenance uses a real request and handles it as if the backendwouldn't have failed. If you enabled CPing/CPong this means, that itwould detect a still broken backend early and transparently send therequest to another member. Because no part of the request (the CPingdoesn't count) already has been send, the failover to another memberhappens independently of recovery_options (i.e. even withrecovery_options 3).

If you like to improve the page about load balancing or the timeoutspage, or you want to add some parts about retries and recovery:contributions are welcome.


Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: [email protected]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: mod_jk error detection

Reply via email to