Bernd, At this moment I believe we should add flags and stop using the '0' value in the config file.
Internally ( in the code ) - it doesn't matter, we can keep 0 or use the flag ( I prefer the second ). I'm waiting for your patch - it seems there is another bug that must be fixed before we can tag - but I hope we can finish all changes in the next few days. Costin On Mon, 6 May 2002, Bernd Koecke wrote: > thanks for commiting my patch :). After thinking about it, I found the same > problem like Mathias. It's a problem for my environment too. We have the same > problem with shutdown and recovering here. I'm on the way of looking in jk2. The > question for jk1 is, what want we do if the main worker fails because of an error? > > Because the normal intention of lb is to switch to another worker in such case. > But for the special use of a main worker we don't want that (at least it is an > error in my environment here :) ). My suggestion is to add an additional flag to > the lb_worker struct where we hold the information that we have a main worker, > e.g main_worker_mode. Because of this flag we send only requests with a session > id to one of the other worker. And we could change the behavior after an error > of an other worker and check his state only if we get a request with his session > route. This would be easy if we set the main worker at the begining of the > worker list and/or use the flag. But we need the flag if we want to use more the > one main worker. > > But what should happen if the main worker is in error state? In my patch some > weeks ago I added an additional flag which causes the module to reject a request > if it comes in without a session id and the main worker is down. If this flag > wasn't set or was not set to reject the module chooses one of the other worker. > For our environment here rejecting the request is ok, because if a request > without a session comes to a switched off node, we have a problem with our > separated load balancer. This should never happen. We could make this rejecting > be the standard if we have a main worker, but with a separate flag it would be > more flexible. > > I will build a patch against cvs to make my intention clearer. > > Bernd > > [EMAIL PROTECTED] wrote: > > Hi Mathias, > > > > I think we understand your use case, it is not very uncommon. > > In fact, as I mentioned few times, it is the 'main' use > > case for Apache ( multi-process ) when using the JNI worker. > > In this case Apache acts as a 'natural' load-balancer, with > > requests going to various processes ( more or less randomly ). > > As in your case, requests without a session should allways go > > to the worker that is in the same process. > > > > The main reason for using '0' for the "local" worker is that > > in jk2 I want to switch from float to int - there is no reason > > ( AFAIK ) to do all the float computation, even a short int > > will be enough for the purpose of implementing a round-roubin > > with weitghs. > > > > BTW, one extension I'm trying to make is support for multiple > > local workers - I'm still thining on how to do that. This will > > cover the case of few big boxes, each with several tomcat > > instances ( if you have many G of RAM and many processors, sometimes > > is better to run more VMs instead of a single large process ) > > In this case you still want some remote tomcats, for failover, > > but most load should go to the local workers. > > > > For jk2 I already fixed the selection of the 'recovering' worker, > > after timeout the worker will go through normal selection instead > > of beeing automatically chosen. > > > > For jk1 - I'm waiting for patches :-) I wouldn't do a big change - > > the current fix seemed like a good one. > > > > I agree that changing the meaning of 0 may be confusing ( is it > > documented ? my workers.properties says it should never be used ). > > We can fix that by using an additional flag - and not using > > special values. > > > > Another special note - Jk2 will also support 'gracefull shutdown', > > that means your case ( replacing a webapp ) will be handled > > in a different way. You should be able to add/remove workers > > without restarting apache ( and I hope mostly automated ). > > > > Let me know what you think - with patches if possible :-) > > > > Costin > > > > > >>The setup I use is the following, a load balancer (Alteon) is in front > >>of several Apache servers, each hosted on a machine which also hosts a > >>Tomcat. > >>Let's call those Apache servers A1, A2 and A3 and the associated Tomcat > >>servers T1, T2 and T3. > >> > >>I have been using Paul's patch which I modified so the lb_value field of > >>fault tolerant workers would not be changed to a value other than INF. > >> > >>The basic setup is that Ai can talk to all Tj, but for requests not > >>associated with a session, Ti will be used unless it is unavailable. > >>Sessions belonging to Tk will be correctly routed. The load balancing > >>worker definition is different for all three Ai, the lbfactor is set to > >>0 for workers connecting to Tk for all k != i and set to 1.0 for the > >>worker connecting to Ti. > >> > >>This setup allows to have sticky sessions independently of the Apache > >>handling the request, which is a good thing since the Alteon cannot > >>extract the ';jsessionid=.....' part from the URL in a way which allows > >>the dispatching of the requests to the proper Ai (the cookie is dealed > >>with correctly though). > >> > >>This works perfectly except when we roll out a new release of our > >>webapps. In this case it would be ideal to be able to make the load > >>balancer ignore one Apache server, deploy the new version of the webapp > >>on this server, and switch this server back on and the other two off so > >>the service interruption would be as short as possible for the > >>customers. The immediate idea, if Ai/Ti is to be the first server to > >>have the new webapp, is to stop Ti so Ai will not be selected by the > >>load balancer. This does not work, indeed with Paul's patch Ti is the > >>preferred server BUT if Ti fails then another Tk will be selected by Ai, > >>therefore the load balancer will never declare Ai failed (even though we > >>managed to make it behave like this by specifying a test URL which > >>includes a jvmroute to Ti, but this uses lots of slb groups on the > >>alteon) and it will continue to send requests to it. > >> > >>Bernd's patch allows Ai to reject requests if Ti is stopped, the load > >>balancer will therefore quickly declare Ai inactive and will stop send > >>it requests, thus allowing to roll out the new webapp very easily, just > >>set up the new webapp, restart Ti, restart Ai, and as soon as the load > >>balancer sees Ai, shut down the other two Ak, the current sessions will > >>still be routed to the old webapp, and the new sessions will see the new > >>version. When there are no more sessions on the old version, shut down > >>Tk (k != i) and deploy the new webapp. > >> > >>My remark concerning the possible selection of recovering workers prior > >>to the local worker (one with lb_value set to 0) deals with the load > >>balancer not being able in this case to declare Ai inactive. > >> > >>I hope I have been clear enough, and that everybody got the point, if > >>not I'd be glad to explain more thoroughly. > >> > >>Mathias. > >> > >>Paul Frieden wrote: > >> > >>>Hello, > >>> > >>>I'm afraid that I am no longer subscribed to the devel list. I would be > >>>happy to add my advice for this issue, but I don't have time to keep up > >>>with the entire devel list. If there is anything I can do, please just > >>>mail me directly. > >>> > >>>I chose to use the value 0 for a worker because it used the inverse of > >>>the value specified. The value 0 then resulted in essentially infinite > >>>preference. I used that approach purely because it was the smallest > >>>change possible, and the least likely to change the expected behavior > >>>for anybody else. The path of least astonishment and whatnot. I would > >>>be concerned about changing the current behavior now, because people > >>>probably want a drop in replacement. If there is going to be a change > >>>in the algorithm and behavior, a different approach may be better. > >>> > >>>I would also like to make a note of how we were using this code. In our > >>>environment, we have an external dedicated load balancer, and three web > >>>servers. The main problem that we ran into was with AOL users. AOL > >>>uses a proxy that randomizes the source IP of requests. That means that > >>>you can no longer count on the source IP to tell the load balancer which > >>>server to send future requests to. We used this code to allow sessions > >>>that arive on the wrong web server to be redirected to the tomcat on the > >>>correct server. This neatly side-steps the whole issue of changing IPs, > >>>because apache is able to make the decision based on the session ID. > >>> > >>>The reliability issue was a nice side effect for us in that it caught a > >>>failed server more quickly than the load balancer did, and prevented the > >>>user from having a connection time out or seeing an error message. > >>> > >>>I hope this provides some insight into why I changed the code that I > >>>did, and why that behavior worked well for us. > >>> > >>>Paul > >>> > >>>[EMAIL PROTECTED] wrote: > >>> > >>> > >>>>Hi Mathias, > >>>> > >>>>I think it would be better to discuss this on tomcat-dev. > >>>> > >>>>The 'error' worker will not be choosen unless the > >>>>timeout expires. When the timeout expires, we'll indeed > >>>>select it ( in preference to the default ) - this is easy to fix > >>>>if it creates problems, but I don't see why it would be a > >>>>problem. > >>>> > >>>>If it is working, next request will be served normally by > >>>>the default. If not, it'll go back to error state. > >>>> > >>>>In jk2 I removed that - error workers are no longer > >>>>selected. But for jk1 I would rather leave the old > >>>>behavior intact. > >>>> > >>>>Note that the reason for choosing 0 ( in jk2 ) as > >>>>default is that I want to switch from float to ints, > >>>>I'm not convinced floats are good for performance > >>>>( or needed ). > >>>> > >>>>Again - I'm just learning and trying, if you have > >>>>any idea I would be happy to hear them, patches > >>>>are more than wellcome. > >>>> > >>>>Costin > >>>> > >>>>On Sat, 4 May 2002, Mathias Herberts wrote: > >>>> > >>>> > >>>> > >>>> > >>>>>Hi, I just joined the Tomcat-dev list and saw your patch to > >>>>>jk_lb_worker.c (making it version 1.9). > >>>>> > >>>>>If I understand well your patch it offers the same behaviors as Paul's > >>>>>patch but with an opposite semantic for a lbfactor of 0.0 in the > >>>>>worker's definition, i.e. a value of 0.0 now means ALWAYS USE THIS > >>>>>WORKER FOR REQUESTS WITH NO SESSIONS instead of NEVER USE THIS WORKER > >>>>>FOR REQUESTS WITH NO SESSIONS. This seems fine to me. > >>>>> > >>>>>What disturbs me is what is happening when one worker is in error > >>>>>state and not yet recovering. In get_most_suitable worker, such a > >>>>>worker will be selected whatever its lb_value, meaning a recovering > >>>>>worker will have priority over one with a lb_value of 0.0 and this > >>>>>seems to break the behavior we had achieved with your patch. > >>>>> > >>>>>Did I miss something or is this really a problem? > >>>>> > >>>>>Mathias. > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > > > > > > -- > > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > > > > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>