Hi Costin, at the moment I'm testing my patch and it seems to work. I'll send it soon with an explanation what I did. It's not the same as Mathias did. I used the 0 value and one additional config flag which results in two flags in lb_worker struct.
But I think your suggestion for jk2 is the better way. Without magic 0 values etc. After we get the lb_worker stable it would be better to build the new stuff in jk2. With the structure of jk1 I think it is a little bit difficult to build the desired behavior. Bernd [EMAIL PROTECTED] wrote: > Bernd, > > At this moment I believe we should add flags and stop using the '0' value > in the config file. > > Internally ( in the code ) - it doesn't matter, we can keep 0 or > use the flag ( I prefer the second ). > > I'm waiting for your patch - it seems there is another bug that must > be fixed before we can tag - but I hope we can finish all changes in > the next few days. > > > Costin > > On Mon, 6 May 2002, Bernd Koecke wrote: > > >>thanks for commiting my patch :). After thinking about it, I found the same >>problem like Mathias. It's a problem for my environment too. We have the same >>problem with shutdown and recovering here. I'm on the way of looking in jk2. The >>question for jk1 is, what want we do if the main worker fails because of an error? >> >>Because the normal intention of lb is to switch to another worker in such case. >>But for the special use of a main worker we don't want that (at least it is an >>error in my environment here :) ). My suggestion is to add an additional flag to >>the lb_worker struct where we hold the information that we have a main worker, >>e.g main_worker_mode. Because of this flag we send only requests with a session >>id to one of the other worker. And we could change the behavior after an error >>of an other worker and check his state only if we get a request with his session >>route. This would be easy if we set the main worker at the begining of the >>worker list and/or use the flag. But we need the flag if we want to use more the >>one main worker. >> >>But what should happen if the main worker is in error state? In my patch some >>weeks ago I added an additional flag which causes the module to reject a request >>if it comes in without a session id and the main worker is down. If this flag >>wasn't set or was not set to reject the module chooses one of the other worker. >>For our environment here rejecting the request is ok, because if a request >>without a session comes to a switched off node, we have a problem with our >>separated load balancer. This should never happen. We could make this rejecting >>be the standard if we have a main worker, but with a separate flag it would be >>more flexible. >> >>I will build a patch against cvs to make my intention clearer. >> >>Bernd >> >>[EMAIL PROTECTED] wrote: >> >>>Hi Mathias, >>> >>>I think we understand your use case, it is not very uncommon. >>>In fact, as I mentioned few times, it is the 'main' use >>>case for Apache ( multi-process ) when using the JNI worker. >>>In this case Apache acts as a 'natural' load-balancer, with >>>requests going to various processes ( more or less randomly ). >>>As in your case, requests without a session should allways go >>>to the worker that is in the same process. >>> >>>The main reason for using '0' for the "local" worker is that >>>in jk2 I want to switch from float to int - there is no reason >>>( AFAIK ) to do all the float computation, even a short int >>>will be enough for the purpose of implementing a round-roubin >>>with weitghs. >>> >>>BTW, one extension I'm trying to make is support for multiple >>>local workers - I'm still thining on how to do that. This will >>>cover the case of few big boxes, each with several tomcat >>>instances ( if you have many G of RAM and many processors, sometimes >>>is better to run more VMs instead of a single large process ) >>>In this case you still want some remote tomcats, for failover, >>>but most load should go to the local workers. >>> >>>For jk2 I already fixed the selection of the 'recovering' worker, >>>after timeout the worker will go through normal selection instead >>>of beeing automatically chosen. >>> >>>For jk1 - I'm waiting for patches :-) I wouldn't do a big change - >>>the current fix seemed like a good one. >>> >>>I agree that changing the meaning of 0 may be confusing ( is it >>>documented ? my workers.properties says it should never be used ). >>>We can fix that by using an additional flag - and not using >>>special values. >>> >>>Another special note - Jk2 will also support 'gracefull shutdown', >>>that means your case ( replacing a webapp ) will be handled >>>in a different way. You should be able to add/remove workers >>>without restarting apache ( and I hope mostly automated ). >>> >>>Let me know what you think - with patches if possible :-) >>> >>>Costin >>> >>> >>> >>>>The setup I use is the following, a load balancer (Alteon) is in front >>>>of several Apache servers, each hosted on a machine which also hosts a >>>>Tomcat. >>>>Let's call those Apache servers A1, A2 and A3 and the associated Tomcat >>>>servers T1, T2 and T3. >>>> >>>>I have been using Paul's patch which I modified so the lb_value field of >>>>fault tolerant workers would not be changed to a value other than INF. >>>> >>>>The basic setup is that Ai can talk to all Tj, but for requests not >>>>associated with a session, Ti will be used unless it is unavailable. >>>>Sessions belonging to Tk will be correctly routed. The load balancing >>>>worker definition is different for all three Ai, the lbfactor is set to >>>>0 for workers connecting to Tk for all k != i and set to 1.0 for the >>>>worker connecting to Ti. >>>> >>>>This setup allows to have sticky sessions independently of the Apache >>>>handling the request, which is a good thing since the Alteon cannot >>>>extract the ';jsessionid=.....' part from the URL in a way which allows >>>>the dispatching of the requests to the proper Ai (the cookie is dealed >>>>with correctly though). >>>> >>>>This works perfectly except when we roll out a new release of our >>>>webapps. In this case it would be ideal to be able to make the load >>>>balancer ignore one Apache server, deploy the new version of the webapp >>>>on this server, and switch this server back on and the other two off so >>>>the service interruption would be as short as possible for the >>>>customers. The immediate idea, if Ai/Ti is to be the first server to >>>>have the new webapp, is to stop Ti so Ai will not be selected by the >>>>load balancer. This does not work, indeed with Paul's patch Ti is the >>>>preferred server BUT if Ti fails then another Tk will be selected by Ai, >>>>therefore the load balancer will never declare Ai failed (even though we >>>>managed to make it behave like this by specifying a test URL which >>>>includes a jvmroute to Ti, but this uses lots of slb groups on the >>>>alteon) and it will continue to send requests to it. >>>> >>>>Bernd's patch allows Ai to reject requests if Ti is stopped, the load >>>>balancer will therefore quickly declare Ai inactive and will stop send >>>>it requests, thus allowing to roll out the new webapp very easily, just >>>>set up the new webapp, restart Ti, restart Ai, and as soon as the load >>>>balancer sees Ai, shut down the other two Ak, the current sessions will >>>>still be routed to the old webapp, and the new sessions will see the new >>>>version. When there are no more sessions on the old version, shut down >>>>Tk (k != i) and deploy the new webapp. >>>> >>>>My remark concerning the possible selection of recovering workers prior >>>>to the local worker (one with lb_value set to 0) deals with the load >>>>balancer not being able in this case to declare Ai inactive. >>>> >>>>I hope I have been clear enough, and that everybody got the point, if >>>>not I'd be glad to explain more thoroughly. >>>> >>>>Mathias. >>>> >>>>Paul Frieden wrote: >>>> >>>> >>>>>Hello, >>>>> >>>>>I'm afraid that I am no longer subscribed to the devel list. I would be >>>>>happy to add my advice for this issue, but I don't have time to keep up >>>>>with the entire devel list. If there is anything I can do, please just >>>>>mail me directly. >>>>> >>>>>I chose to use the value 0 for a worker because it used the inverse of >>>>>the value specified. The value 0 then resulted in essentially infinite >>>>>preference. I used that approach purely because it was the smallest >>>>>change possible, and the least likely to change the expected behavior >>>>>for anybody else. The path of least astonishment and whatnot. I would >>>>>be concerned about changing the current behavior now, because people >>>>>probably want a drop in replacement. If there is going to be a change >>>>>in the algorithm and behavior, a different approach may be better. >>>>> >>>>>I would also like to make a note of how we were using this code. In our >>>>>environment, we have an external dedicated load balancer, and three web >>>>>servers. The main problem that we ran into was with AOL users. AOL >>>>>uses a proxy that randomizes the source IP of requests. That means that >>>>>you can no longer count on the source IP to tell the load balancer which >>>>>server to send future requests to. We used this code to allow sessions >>>>>that arive on the wrong web server to be redirected to the tomcat on the >>>>>correct server. This neatly side-steps the whole issue of changing IPs, >>>>>because apache is able to make the decision based on the session ID. >>>>> >>>>>The reliability issue was a nice side effect for us in that it caught a >>>>>failed server more quickly than the load balancer did, and prevented the >>>>>user from having a connection time out or seeing an error message. >>>>> >>>>>I hope this provides some insight into why I changed the code that I >>>>>did, and why that behavior worked well for us. >>>>> >>>>>Paul >>>>> >>>>>[EMAIL PROTECTED] wrote: >>>>> >>>>> >>>>> >>>>>>Hi Mathias, >>>>>> >>>>>>I think it would be better to discuss this on tomcat-dev. >>>>>> >>>>>>The 'error' worker will not be choosen unless the >>>>>>timeout expires. When the timeout expires, we'll indeed >>>>>>select it ( in preference to the default ) - this is easy to fix >>>>>>if it creates problems, but I don't see why it would be a >>>>>>problem. >>>>>> >>>>>>If it is working, next request will be served normally by >>>>>>the default. If not, it'll go back to error state. >>>>>> >>>>>>In jk2 I removed that - error workers are no longer >>>>>>selected. But for jk1 I would rather leave the old >>>>>>behavior intact. >>>>>> >>>>>>Note that the reason for choosing 0 ( in jk2 ) as >>>>>>default is that I want to switch from float to ints, >>>>>>I'm not convinced floats are good for performance >>>>>>( or needed ). >>>>>> >>>>>>Again - I'm just learning and trying, if you have >>>>>>any idea I would be happy to hear them, patches >>>>>>are more than wellcome. >>>>>> >>>>>>Costin >>>>>> >>>>>>On Sat, 4 May 2002, Mathias Herberts wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Hi, I just joined the Tomcat-dev list and saw your patch to >>>>>>>jk_lb_worker.c (making it version 1.9). >>>>>>> >>>>>>>If I understand well your patch it offers the same behaviors as Paul's >>>>>>>patch but with an opposite semantic for a lbfactor of 0.0 in the >>>>>>>worker's definition, i.e. a value of 0.0 now means ALWAYS USE THIS >>>>>>>WORKER FOR REQUESTS WITH NO SESSIONS instead of NEVER USE THIS WORKER >>>>>>>FOR REQUESTS WITH NO SESSIONS. This seems fine to me. >>>>>>> >>>>>>>What disturbs me is what is happening when one worker is in error >>>>>>>state and not yet recovering. In get_most_suitable worker, such a >>>>>>>worker will be selected whatever its lb_value, meaning a recovering >>>>>>>worker will have priority over one with a lb_value of 0.0 and this >>>>>>>seems to break the behavior we had achieved with your patch. >>>>>>> >>>>>>>Did I miss something or is this really a problem? >>>>>>> >>>>>>>Mathias. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>> >>>-- >>>To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> >>>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> >>> >> >> >> > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > -- Dipl.-Inform. Bernd Koecke UNIX-Entwicklung Schlund+Partner AG Fon: +49-721-91374-0 E-Mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>