Rainer,
Thanks very much for the clarification! Since I have playing with the
load balancing strategy set to session ("worker.router.method=S" on my
load balancer), is there a way to tell roughly how many sessions have
been pinned to each worker/tomcat? In this case would the load balancer
value be (something like) the number of new sessions sent to a
particular worker divided by two some number of times? If this were true
you still would not know the number of sessions pinned to a worked
because of the factors of two having been divided out. I just got a HTTP
JMX adapter wired up in Tomcat so I'll see if I can get session info
that way...
Thanks again,
Brian
-->-----Original Message-----
-->From: Rainer Jung [mailto:[EMAIL PROTECTED]
-->Sent: Thursday, August 23, 2007 11:22 AM
-->To: Tomcat Users List
-->Subject: Re: JK Loadbalancer not balancing fairly
-->
-->[EMAIL PROTECTED] schrieb:
-->> Ben,
-->>
-->> So I assume you have two web servers fronting two app servers - or
-->> there are two servers both of which have a web server and an app
-->> server? For the restart you talk about - did you restart both web
-->> servers? Do you have a good load balancer (local director, content
-->> director like an F5) in front of the two web servers?
-->>
-->> If I am reading your JKStatus text correctly I noticed the
-->following:
-->>
-->> Load balancer value on web server 2
-->> ----------------------------------- = ~0.56 Load balancer
-->value on web
-->> server 1
-->>
-->> but
-->>
-->> Number requests on web server 2
-->> ----------------------------------- = ~0.91 Number requests on web
-->> server 1
-->>
-->>
-->> Now, if I am interpreting the meaning of "load balancer value" and
-->> "number of reuqests" correctly, that would imply that the
-->number of
-->> sessions stuck to each app server from web server 1 is
-->very roughly
-->> twice as high as from 2, but the total number of requests
-->sent to each
-->> app server from both web servers is very roughly the same. (Can
-->> someone confirm I'm intrepreting those #s correctly?)
-->
-->The number of requests is the total since last jk/apache
-->restart. So if the last restart was shortly before, the
-->numbers will not help. If they were not reset after the
-->tests, we would know, that Apache 1 had a little more
-->requests than apache 2, but both of them send exacty the
-->same number of requests to the two tomcat nodes (delta=1 request).
-->
-->The V column is the balancing value used to decide, where
-->the next request goes to. It is the number of requests sent
-->to the tomcat divided by two once a minute, so it is
-->multiplied by a decay curve. The big difference between the
-->V values of apache 1 and apache 2 does not matter. It could
-->simply mean, that the one with the bigger V value did it's
-->division more recent in time. The V values for the two
-->tomcats are again very similar on the same Apache, another
-->indication of good balancing.
-->
-->All his is true for the default balancing method "Requests".
-->
-->I would suggest first to follow CPU by Tomcat process over
-->the test period (not per system and not simply as one
-->number, instead as a graph over time).
-->
-->> According to the docs, each connect by default trys to
-->keep the number
-->> of requests sent to each worker the same, which looks to
-->be happening
-->> reasonably well. (I'm playing with trying the keep the number of
-->> sessions balanced since our apps tend to be more of a memory issue
-->> than a cpu issue. There is a setting on the connector for this.)
-->>
-->> With a some info on your setup we can try to figure out the load
-->> imbalance.
-->>
-->> As a note, I am playing with the jk1.2.x connector, but
-->our productio
-->> systems use the old jk2.x connector. With that, I've seen a load
-->> imbalance on the app servers when one of the app serves
-->has gone down
-->> for a while, and then has come back up. If the connectors are not
-->> reset, they will try to "catch up" the restarted app
-->server in terms
-->> of the number of requests it has handled, thus loading it
-->more heavily
-->> than servers that have been up the whole time.
-->
-->The catchup problem should be fixed. A recovered or
-->reactivated worker gets the biggest "work done" value of all
-->other workers, so it should start normal or even a little
-->less loaded.
-->
-->>
-->> Brian
-->
-->Regards,
-->
-->Rainer
-->
-->---------------------------------------------------------------------
-->To start a new topic, e-mail: [email protected] To
-->unsubscribe, e-mail: [EMAIL PROTECTED]
-->For additional commands, e-mail: [EMAIL PROTECTED]
-->
-->
---------------------------------------------------------------------
To start a new topic, e-mail: [email protected]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]