Sorry Glenn, by looking deeper into the mod_jk.log. When changing worker names, I realized, that I was actually restarting Apache with the same worker.properties every time.
There was a link earlier in the configuration chain, which made my switching useless :( We should definately reduce our linking !!! Thanks very much. p.s. If anybody else is interested in our LB/failover setup I am glad to provide some info. Best regards, Hans > -----Ursprungliche Nachricht----- > Von: Hans Schmid [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 16. Juli 2003 15:15 > An: Tomcat Developers List > Betreff: AW: jk 1.2.4 LB bug? > > > Thanks for your reply, > comments see inline > > > -----Ursprungliche Nachricht----- > > Von: Glenn Nielsen [mailto:[EMAIL PROTECTED] > > Gesendet: Mittwoch, 16. Juli 2003 12:26 > > An: Tomcat Developers List > > Betreff: Re: jk 1.2.4 LB bug? > > > > > > mod_jk will print out information about the lb config if you set > > the JkLogLevel to debug. > > > > done > > > I would suggest setting up a test system where you can test > > the below with JkLogLevel debug configured. Then grep the > > log for lines which have jk_lb_worker.c in them. > > > > OK > > > This will tell you several things. > > > > 1. Whether the worker.properties are getting reread when you > > do an apache restart. (They should be) > > > > Yes they were reread: > Initial: > [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close > [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers > got 6 workers > to destroy > [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers > will destroy > worker lb-einsurance > [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into > jk_worker_t::destroy > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into > jk_worker_t::destroy > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into > jk_worker_t::destroy up to 1 endpoint to close > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (605)]: In > jk_endpoint_t::ajp_close_endpoint > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (612)]: In > jk_endpoint_t::ajp_close_endpoint, closed sd = 12 > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into > jk_worker_t::destroy > [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close > [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close > [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into > jk_worker_t::destroy up to 1 endpoint to close > [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers > got 6 workers > to destroy > [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers > got 6 workers > to destroy > [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers > got 6 workers > to destroy > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into > jk_worker_t::destroy > [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers > will destroy > worker lb-einsurance > [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers > will destroy > worker lb-einsurance > [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers > will destroy > worker lb-einsurance > [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into > jk_worker_t::destroy up to 1 endpoint to close > [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into > jk_worker_t::destroy > [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into > jk_worker_t::destroy > [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into > jk_worker_t::destroy > > ... closing other not related worker > > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (172)]: Into > jk_uri_worker_map_t::uri_worker_map_alloc > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (375)]: Into > jk_uri_worker_map_t::uri_worker_map_open > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (396)]: > jk_uri_worker_map_t::uri_worker_map_open, rule map size is 12 > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (321)]: Into > jk_uri_worker_map_t::uri_worker_map_open, match rule > /einsurance/=lb-einsurance was added > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (345)]: Into > jk_uri_worker_map_t::uri_worker_map_open, exact rule > /einsurance=lb-einsurance was added > > ... 5 other workers (including other lb-workers and normal workers) > > added > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (408)]: Into > jk_uri_worker_map_t::uri_worker_map_open, there are 12 rules > [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (422)]: > jk_uri_worker_map_t::uri_worker_map_open, done > [Wed Jul 16 14:11:16 2003] [jk_worker.c (88)]: Into wc_open > [Wed Jul 16 14:11:16 2003] [jk_worker.c (222)]: Into build_worker_map, > creating 6 workers > [Wed Jul 16 14:11:16 2003] [jk_worker.c (228)]: > build_worker_map, creating > worker lb-einsurance > [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker > [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: > wc_create_worker, about to > create instance lb-einsurance of lb > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (586)]: Into lb_worker_factory > [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: > wc_create_worker, about to > validate and init lb-einsurance > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (420)]: Into > jk_worker_t::validate > [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker > [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: > wc_create_worker, about to > create instance ajp13-01 of ajp13 > [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into > ajp13_worker_factory > [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: > wc_create_worker, about to > validate and init ajp13-01 > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into > jk_worker_t::validate > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In > jk_worker_t::validate for worker ajp13-01 contact is tomcathost-ei:11009 > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into > jk_worker_t::init > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In > jk_worker_t::init, > setting socket timeout to 0 > [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done > [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker > [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: > wc_create_worker, about to > create instance ajp13-02 of ajp13 > [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into > ajp13_worker_factory > [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: > wc_create_worker, about to > validate and init ajp13-02 > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into > jk_worker_t::validate > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In > jk_worker_t::validate for worker ajp13-02 contact is tomcathost-ei:11019 > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into > jk_worker_t::init > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In > jk_worker_t::init, > setting socket timeout to 0 > [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done > [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker > [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: > wc_create_worker, about to > create instance ajp13-sb of ajp13 > [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into > ajp13_worker_factory > [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: > wc_create_worker, about to > validate and init ajp13-sb > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into > jk_worker_t::validate > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In > jk_worker_t::validate for worker ajp13-sb contact is > tomcathost-ei-sb:11015 > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into > jk_worker_t::init > [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In > jk_worker_t::init, > setting socket timeout to 0 > [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has > name ajp13-01 > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has > name ajp13-sb > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has > name ajp13-02 > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: > true > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (505)]: > local_worker_only: false > [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done > [Wed Jul 16 14:11:16 2003] [jk_worker.c (238)]: > build_worker_map, removing > old lb-einsurance worker > > this last line looks suspicous to me > > > 2. What the lb worker thinks the config is. > > > > initial: > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (586)]: Into lb_worker_factory > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (420)]: Into > jk_worker_t::validate > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has > name ajp13-01 > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has > name ajp13-sb > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has > name ajp13-02 > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: > true > [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (505)]: > local_worker_only: false > > but after the switching and graceful restart exactly the same > (which is the > error) !!!!!!!! > > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (420)]: Into > jk_worker_t::validate > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has > name ajp13-01 > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has > name ajp13-sb > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has > name ajp13-02 > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: > true > [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (505)]: > local_worker_only: false > > This explains the observed (wrong) fall-over behavior, (should be > ajp13-02, > ajp13-sb, ajp13-01) > > > original workers.properties: > worker.ajp13-01.lbfactor=1 > worker.ajp13-01.local_worker=1 > > worker.ajp13-02.lbfactor=1 > worker.ajp13-02.local_worker=0 > > worker.ajp13-sb.lbfactor=0 > worker.ajp13-sb.local_worker=1 > > local_worker_only=0 for the lb-worker > > changed to before graceful restart: (linking a different > worker.properties) > > worker.ajp13-01.lbfactor=1 > worker.ajp13-01.local_worker=0 > > worker.ajp13-02.lbfactor=1 > worker.ajp13-02.local_worker=1 > > worker.ajp13-sb.lbfactor=0 > worker.ajp13-sb.local_worker=1 > > local_worker_only=0 the lb-worker > > > > So it *seems* there might be something wrong with the reinitialisation of > the worker order ? > > > If you need further information, I can mail you the complete logs offline. > > Thanks for looking into this, > Hans > > > > > Then post those log lines here. > > > > Thanks, > > > > Glenn > > > > Hans Schmid wrote: > > > Hi, > > > > > > I noticed the following with mod_jk 1.2.4, Apache 1.3.26 and > > > Tomcat 3.3.1a on Solaris 8 JDK 1.4.1_03. > > > > > > Maybe a LB bug (Loadfactors do not recover after startup of new > > > tomcat/graceful Apache restart). > > > > > > Let me explain my scenario first: > > > > > > I want to gracefully upgrade our webapp without loosing > > sessions + have a > > > fail over scenario. > > > Therefor we have sticky sessions enabled. > > > > > > Setup: > > > 1 tomcat 01 running on Server A, > > > 0 tomcat 02 running on Server A, > > > 1 tomcat SB running on Server B > > > > > > 01 tomcat on Server A runs the application, SB tomcat on server B is > > > standby(fallback), > > > 02 tomcat is shutdown on Server A at the moment. > > > > > > All three Tomcats are in the same lb_worker: > > > > > > > > > worker.list=lb-worker > > > > > > worker.ajp13-01.port=11009 > > > worker.ajp13-01.host=A > > > worker.ajp13-01.type=ajp13 > > > worker.ajp13-01.lbfactor=1 > > > worker.ajp13-01.local_worker=1 > > > > > > worker.ajp13-02.port=11019 > > > worker.ajp13-02.host=A > > > worker.ajp13-02.type=ajp13 > > > worker.ajp13-02.lbfactor=1 > > > worker.ajp13-02.local_worker=0 > > > > > > worker.ajp13-sb.port=11015 > > > worker.ajp13-sb.host=B > > > worker.ajp13-sb.type=ajp13 > > > worker.ajp13-sb.lbfactor=0 > > > worker.ajp13-sb.local_worker=1 > > > > > > worker.lb-worker.type=lb > > > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb > > > worker.lb-worker.local_worker_only=0 > > > > > > > > > The worker List order should now be: > > > 1. worker.ajp13-01 lbfactor=1,local_worker=1 TC 01 > > > 2. worker.ajp13-sb lbfactor=0,local_worker=1 TC SB > > > 3. worker.ajp13-02 lbfactor=1,local_worker=0) TC 02 (not running) > > > > > > Now all requests go to worker.ajp13-01 (TC 01), none to > > worker.ajp13-sb (TC > > > SB lbfactor=0), > > > none to worker.ajp13-02.port (TC 02 not running). > > > > > > If Server A crashes (TC 01) all new requests go to Server B (TC SB > > > worker.ajp13-sb) > > > since this is then the only running Tomcat FINE > > > This is our Fail-Over Solution (lost running sessions, but OK). > > > > > > Now the webapp update Scenario: > > > > > > 1.) shutdown TC SB on Server B, update the webapp, start tc SB > > and test via > > > a seperate HTTPConnector port without Apache. > > > 2.) this does not affect anything on production, since the > > lbfactor=0 on TC > > > SB > > > -> no sessions arrive on tc SB > > > 3.) When the test was successful, our Standby Tomcat SB is updated > > > 4.) Now upgrade the webapp on Server A TC 02, which is currently not > > > running. > > > 5.) Start up TC 02 on Server A with the new version of the webapp, > > > immediately exchange the worker.properties with a different > version and > > > gracefully restart apache: > > > > > > worker.list=lb-worker > > > > > > worker.ajp13-01.port=11009 > > > worker.ajp13-01.host=A > > > worker.ajp13-01.type=ajp13 > > > worker.ajp13-01.lbfactor=1 > > > worker.ajp13-01.local_worker=0 <---- put old webapp on TC > 01 to the > > > foreign worker list > > > > > > worker.ajp13-02.port=11019 > > > worker.ajp13-02.host=A > > > worker.ajp13-02.type=ajp13 > > > worker.ajp13-02.lbfactor=1 > > > worker.ajp13-02.local_worker=1 <---- put new webapp on TC > > 02 in front of > > > the local worker list > > > > > > worker.ajp13-sb.port=11015 > > > worker.ajp13-sb.host=B > > > worker.ajp13-sb.type=ajp13 > > > worker.ajp13-sb.lbfactor=0 > > > worker.ajp13-sb.local_worker=1 > > > > > > worker.lb-worker.type=lb > > > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb > > > worker.lb-worker.local_worker_only=0 > > > > > > Just the two lines marked above with <---- swap > > > (local_worker values of TC 01 and TC 02) > > > > > > 6.) now all 3 Tomcats are running. All existing sessions still > > go to TC 01 > > > (sticky sessions; we do not loose running sessions) > > > 7.) What I expect: > > > TC 02 takes a while to startup. > > > The worker List order should now be: > > > 1. worker.ajp13-02 lbfactor=1,local_worker=1 TC 02 > > > 2. worker.ajp13-sb lbfactor=0,local_worker=1 TC SB > > > 3. worker.ajp13-01 lbfactor=1,local_worker=0) TC 01 (old webapp) > > > > > > Since TC 02 needs 3 minutes to start up (filling caches etc.) > it is not > > > immediately availlable. > > > During this time new sessions arrive at TC SB, since it is the > > next in the > > > worker list. OK fine this works. > > > Since these sessions are sticky as well, all users connecting > > during this > > > time stay on TC SB > > > during their whole session life. FINE > > > > > > 8.) As soon as TC 02 is up and running (finished all > > load-on-startup servlet > > > initialisition stuff) > > > I would expect that TC 02 gets all new Sessions (Number 1 in > the worker > > > List). > > > > > > This is not the case! All new Sessions still arrive at TC SB. > > > > > > 9.) After a while (one hour) we shutdown TC 01. Since no new sessions > > > arrived there since our > > > graceful restart of Apache, all old Sessions should have expired. > > > > > > 10.) even now (only 2 Tomcats running TC 02 and TC SB) and > even after a > > > graceful restart new Sessions > > > arrive at TC SB > > > > > > > > > Conclusion: > > > Now, do I misunderstand the supposed behaviour of lbfactor and > > local_worker > > > flag ? > > > I think that the behaviour in 8.) is wrong. 10.) is starange too. > > > > > > Thanks for any suggestion if I am completely wrong here > > > or further looking into this. > > > > > > Hans > > > > > > > > > > > >>-----Ursprungliche Nachricht----- > > >>Von: Glenn Nielsen [mailto:[EMAIL PROTECTED] > > >>Gesendet: Mittwoch, 9. Juli 2003 15:56 > > >>An: Tomcat Developers List > > >>Betreff: Re: jk 1.2.25 release ? > > >> > > >> > > >>I was hoping to get it released this week. > > >> > > >>But I just noticed that under Apache 2 mod_jk piped logs there > > >>are two instances of the piped log program running for the same > > >>log file. I want to track this down. > > >> > > >>I also just implemented load balancing this morning on a production > > >>server. I noticed that when none of the workers for the load balancer > > >>were available an HTTP status code of 200 was being logged in > mod_jk.log > > >>when request logging was enabled. So I want to look into this also. > > >> > > >>Hopefully now that I have load balancing in place with 2 > tomcat servers > > >>instead of 1 the Missouri Lottery web site I administer will scale to > > >>handle the big spike in load tonight for the $240 PowerBall > jackpot. :-) > > >> > > >>Regards, > > >> > > >>Glenn > > >> > > >>Henri Gomez wrote: > > >> > > >>>Any date ? > > >>> > > >>> > > >>>--------------------------------------------------------------------- > > >>>To unsubscribe, e-mail: [EMAIL PROTECTED] > > >>>For additional commands, e-mail: [EMAIL PROTECTED] > > >>> > > >> > > >> > > >> > > >>--------------------------------------------------------------------- > > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > > >>For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]