Re: mod_jk error detection

Scott McClanahan Wed, 25 Jul 2007 13:25:43 -0700

Thanks, so much! I'd like to continue this thread a bit more because of
helpful I think it will be for everyone using mod_jk.


On Wed, 2007-07-25 at 22:00 +0200, Rainer Jung wrote:
> Hi Scott,
> 
> > I thoroughly enjoyed the updated docs.  It is just what I needed.  I
> > just want to mention a few inferences I have now from reading it.
> 
> Thanks.
> 
> > In a load balanced setup using connect_timeout and prepost_timeout, this
> > will protect me from sending either newly established connections (rare
> > event due to persistence) as well as each and every individual request
> > from being sent to a failed tomcat node based on CPING/CPONG messages.
> > These messages only detect whether or not the container (I'm using
> > tomcat) is healthy enough to respond to such a message but not
> > necessarily anything more, correct?  Basically, its ajp listener is
> > responsive.  Plus, if I need more high speed error detection I can use
> 
> That's correct.
> 
> > reply_timeout.  Sound correct?
> 
> That one, reply_timeout, is not really meant for high speed detection. 
> Usually you've got an ap, that every now and then needs 10 or 20 seconds 
> for an answer and you don't like to disable a worker automatically 
> because of those rare events. So normally one sets reply_timeout to 1, 2 
> or 3 minutes.

I don't understand what besides a timed out CPING/CPONG message would
render a backend tomcat disabled, especially in a default config since
reply_timeout is 0.

> 
> Now with the new max_reply_timeouts one can experiment with lower 
> values. It's new, so not enough experience for good suggestions.
> 
> > I get confused on the recovery_options section.  How does it work in a
> > load balanced environment?  If tomcat receives a request and processes
> > some of it followed by a catastrophic failure before completing the
> > response, what exactly does a repeated request from the client do?
> > Assuming recovery_options is set to 0.
> 
> Value "0" means, if you don't get any part of the answer and an error 
> occurs (network, reply_timeout, ...) then send the same request again to 
> another member of the load balancer (if a working member is remaining).
> 
> That's why you usualy really want to not use value "0" in case your app 
> has data changing use cases. Most apps have.
> 
> If you use REST principles and HEAD and GET is always idempotent for 
> your app, the new (version 1.2.24) bits 8 and 16 are your friend!
> 
> > Also, I get confused with the section describing the retries directive.
> > In a load balanced environment, would the connector retry no matter the
> > state (tcp state here) of the connection whether it be established
> > already?  Would it retry against the same backend tomcat server?  The
> > reason I ask is because the docs say "If the load balancer can not get a
> > free connection for a member worker from the pool, it will try again a
> > number of times given by retries." I highlighted the words that confuse
> > me.
> 
> We have to strongly make a difference between retries of a non-lb worker 
> and of a load balancer worker. A normal worker has a simple retry 
> procedure, independant of the fact, if it is used directly or as part of 
> an lb. If it detects an error it uses another pool connection and by 
> default tries once more.

If that happens does the real worker officially change to an error state
which would subsequently kick off the retry logic of the load balancer
worker?

> 
> An lb has another idea of retries. It uses retries if all connections to 
> a backend are busy. For Apache with default config, this should never 
> happen, because we allow as many connections as threads per process. So 
> any request should be able to get a connection without waiting (maybe it 
> needs to start a new one). For the other web servers we don't have a 
> good way to detect the "correct" pool size. In some cases even for 
> Apache it might be interesting to use a smaler pool size, in case the 
> backend is only used occasionally and/or you want to prevent it from 
> getting flodded in case of congestion. Then you might run out of 
> available connections and requests will have to wait. LB retries 
> configure this waiting.
> 
> > Every 60 seconds would we expect the connector to attempt to send a
> > valid request to a backend tomcat and fail or once a worker goes into
> > error state do we only check with CPING/CPONG requests during the
> > maintenance cycle?
> 
> The maintenance uses a real request and handles it as if the backend 
> wouldn't have failed. If you enabled CPing/CPong this means, that it 
> would detect a still broken backend early and transparently send the 
> request to another member. Because no part of the request (the CPing 
> doesn't count) already has been send, the failover to another member 
> happens independently of recovery_options (i.e. even with 
> recovery_options 3).

Is the request used to test the health of the backend tomcat whichever
one comes first after a global maintenance run even if it has been
previously serviced by another healthy tomcat?  Is this request attempt
to a once errant worker only to test its healthiness and not to actually
have it fulfill the request?  I would hope it is only to test the health
of the backend tomcat and even if it is now willing to accept
connections, the request goes to whatever tomcat has been previously and
successfully responding to the session.

> 
> If you like to improve the page about load balancing or the timeouts 
> page, or you want to add some parts about retries and recovery: 
> contributions are welcome.

After, we are done discussing I might have some recommendations.  Again,
you've been great.

> 
> Regards,
> 
> Rainer
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: mod_jk error detection

Reply via email to