Re: mod_jk error detection

Rainer Jung Wed, 25 Jul 2007 13:39:43 -0700

Scott McClanahan wrote:

Thanks, so much! I'd like to continue this thread a bit more because of
helpful I think it will be for everyone using mod_jk.
That one, reply_timeout, is not really meant for high speed detection.Usually you've got an ap, that every now and then needs 10 or 20 secondsfor an answer and you don't like to disable a worker automaticallybecause of those rare events. So normally one sets reply_timeout to 1, 2or 3 minutes.
I don't understand what besides a timed out CPING/CPONG message would
render a backend tomcat disabled, especially in a default config since
reply_timeout is 0.

Default config: no CPing/CPong. But: after some time the TCP stack willgive up, when there is a network problem, or the backend is no longerlistening. So this case will even be handled in a default config, butdepending on the exact network situation, the error detection might takea long time.

n case your backend simply eats your requests, but doesn't produceanswers, you will very fast eat up all connections and threads and thewhole system will hang - without configured timeouts.

BTW: there is also a non-default config to make a worker fail on severalreceived HTTP status codes, "fail_on_status".

We have to strongly make a difference between retries of a non-lb workerand of a load balancer worker. A normal worker has a simple retryprocedure, independant of the fact, if it is used directly or as part ofan lb. If it detects an error it uses another pool connection and bydefault tries once more.
If that happens does the real worker officially change to an error state
which would subsequently kick off the retry logic of the load balancer
worker?

Without an lb a worker does not have an error state. It will becontinuously reused. Only an lb uses error states and temporarilydisables a failed worker. Even an lb will continuously reuse a worker,if there is no other worker to failover.

The maintenance uses a real request and handles it as if the backendwouldn't have failed. If you enabled CPing/CPong this means, that itwould detect a still broken backend early and transparently send therequest to another member. Because no part of the request (the CPingdoesn't count) already has been send, the failover to another memberhappens independently of recovery_options (i.e. even withrecovery_options 3).
Is the request used to test the health of the backend tomcat whichever
one comes first after a global maintenance run even if it has been
previously serviced by another healthy tomcat?  Is this request attempt
to a once errant worker only to test its healthiness and not to actually
have it fulfill the request?  I would hope it is only to test the health
of the backend tomcat and even if it is now willing to accept
connections, the request goes to whatever tomcat has been previously and
successfully responding to the session.

No, the first new request accepted by the web server and mapped to thelb will be used (at least if it is free to be routed to any worker. Ifthe request belongs to a session located on another backend and thedefault config with sticky sessions is active, it will of course be sendto its correct backend). It is a real user request. If the backendworks, OK. If it doesn't accept the request, we can still send it tosome other worker. If the backend accepts the requests, but processingfails, depending on recovery_options the user gets an error.

If you like to improve the page about load balancing or the timeoutspage, or you want to add some parts about retries and recovery:contributions are welcome.
After, we are done discussing I might have some recommendations.  Again,
you've been great.


Thanks. At least we improve the knowledge inside the mailing list archive.

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: [email protected]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: mod_jk error detection

Reply via email to