Thanks for your reply, comments see inline > -----Ursprungliche Nachricht----- > Von: Glenn Nielsen [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 16. Juli 2003 12:26 > An: Tomcat Developers List > Betreff: Re: jk 1.2.4 LB bug? > > > mod_jk will print out information about the lb config if you set > the JkLogLevel to debug. >
done > I would suggest setting up a test system where you can test > the below with JkLogLevel debug configured. Then grep the > log for lines which have jk_lb_worker.c in them. > OK > This will tell you several things. > > 1. Whether the worker.properties are getting reread when you > do an apache restart. (They should be) > Yes they were reread: Initial: [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into jk_worker_t::destroy up to 1 endpoint to close [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (605)]: In jk_endpoint_t::ajp_close_endpoint [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (612)]: In jk_endpoint_t::ajp_close_endpoint, closed sd = 12 [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into jk_worker_t::destroy up to 1 endpoint to close [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into jk_worker_t::destroy up to 1 endpoint to close [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy ... closing other not related worker [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (172)]: Into jk_uri_worker_map_t::uri_worker_map_alloc [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (375)]: Into jk_uri_worker_map_t::uri_worker_map_open [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (396)]: jk_uri_worker_map_t::uri_worker_map_open, rule map size is 12 [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (321)]: Into jk_uri_worker_map_t::uri_worker_map_open, match rule /einsurance/=lb-einsurance was added [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (345)]: Into jk_uri_worker_map_t::uri_worker_map_open, exact rule /einsurance=lb-einsurance was added ... 5 other workers (including other lb-workers and normal workers) added [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (408)]: Into jk_uri_worker_map_t::uri_worker_map_open, there are 12 rules [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (422)]: jk_uri_worker_map_t::uri_worker_map_open, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (88)]: Into wc_open [Wed Jul 16 14:11:16 2003] [jk_worker.c (222)]: Into build_worker_map, creating 6 workers [Wed Jul 16 14:11:16 2003] [jk_worker.c (228)]: build_worker_map, creating worker lb-einsurance [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance lb-einsurance of lb [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (586)]: Into lb_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init lb-einsurance [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (420)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance ajp13-01 of ajp13 [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into ajp13_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init ajp13-01 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In jk_worker_t::validate for worker ajp13-01 contact is tomcathost-ei:11009 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into jk_worker_t::init [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In jk_worker_t::init, setting socket timeout to 0 [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance ajp13-02 of ajp13 [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into ajp13_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init ajp13-02 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In jk_worker_t::validate for worker ajp13-02 contact is tomcathost-ei:11019 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into jk_worker_t::init [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In jk_worker_t::init, setting socket timeout to 0 [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance ajp13-sb of ajp13 [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into ajp13_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init ajp13-sb [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In jk_worker_t::validate for worker ajp13-sb contact is tomcathost-ei-sb:11015 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into jk_worker_t::init [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In jk_worker_t::init, setting socket timeout to 0 [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has name ajp13-01 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has name ajp13-sb [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has name ajp13-02 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: true [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (505)]: local_worker_only: false [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (238)]: build_worker_map, removing old lb-einsurance worker this last line looks suspicous to me > 2. What the lb worker thinks the config is. > initial: [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (586)]: Into lb_worker_factory [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (420)]: Into jk_worker_t::validate [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has name ajp13-01 [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has name ajp13-sb [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has name ajp13-02 [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: true [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (505)]: local_worker_only: false but after the switching and graceful restart exactly the same (which is the error) !!!!!!!! [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (420)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has name ajp13-01 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has name ajp13-sb [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has name ajp13-02 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: true [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (505)]: local_worker_only: false This explains the observed (wrong) fall-over behavior, (should be ajp13-02, ajp13-sb, ajp13-01) original workers.properties: worker.ajp13-01.lbfactor=1 worker.ajp13-01.local_worker=1 worker.ajp13-02.lbfactor=1 worker.ajp13-02.local_worker=0 worker.ajp13-sb.lbfactor=0 worker.ajp13-sb.local_worker=1 local_worker_only=0 for the lb-worker changed to before graceful restart: (linking a different worker.properties) worker.ajp13-01.lbfactor=1 worker.ajp13-01.local_worker=0 worker.ajp13-02.lbfactor=1 worker.ajp13-02.local_worker=1 worker.ajp13-sb.lbfactor=0 worker.ajp13-sb.local_worker=1 local_worker_only=0 the lb-worker So it *seems* there might be something wrong with the reinitialisation of the worker order ? If you need further information, I can mail you the complete logs offline. Thanks for looking into this, Hans > Then post those log lines here. > > Thanks, > > Glenn > > Hans Schmid wrote: > > Hi, > > > > I noticed the following with mod_jk 1.2.4, Apache 1.3.26 and > > Tomcat 3.3.1a on Solaris 8 JDK 1.4.1_03. > > > > Maybe a LB bug (Loadfactors do not recover after startup of new > > tomcat/graceful Apache restart). > > > > Let me explain my scenario first: > > > > I want to gracefully upgrade our webapp without loosing > sessions + have a > > fail over scenario. > > Therefor we have sticky sessions enabled. > > > > Setup: > > 1 tomcat 01 running on Server A, > > 0 tomcat 02 running on Server A, > > 1 tomcat SB running on Server B > > > > 01 tomcat on Server A runs the application, SB tomcat on server B is > > standby(fallback), > > 02 tomcat is shutdown on Server A at the moment. > > > > All three Tomcats are in the same lb_worker: > > > > > > worker.list=lb-worker > > > > worker.ajp13-01.port=11009 > > worker.ajp13-01.host=A > > worker.ajp13-01.type=ajp13 > > worker.ajp13-01.lbfactor=1 > > worker.ajp13-01.local_worker=1 > > > > worker.ajp13-02.port=11019 > > worker.ajp13-02.host=A > > worker.ajp13-02.type=ajp13 > > worker.ajp13-02.lbfactor=1 > > worker.ajp13-02.local_worker=0 > > > > worker.ajp13-sb.port=11015 > > worker.ajp13-sb.host=B > > worker.ajp13-sb.type=ajp13 > > worker.ajp13-sb.lbfactor=0 > > worker.ajp13-sb.local_worker=1 > > > > worker.lb-worker.type=lb > > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb > > worker.lb-worker.local_worker_only=0 > > > > > > The worker List order should now be: > > 1. worker.ajp13-01 lbfactor=1,local_worker=1 TC 01 > > 2. worker.ajp13-sb lbfactor=0,local_worker=1 TC SB > > 3. worker.ajp13-02 lbfactor=1,local_worker=0) TC 02 (not running) > > > > Now all requests go to worker.ajp13-01 (TC 01), none to > worker.ajp13-sb (TC > > SB lbfactor=0), > > none to worker.ajp13-02.port (TC 02 not running). > > > > If Server A crashes (TC 01) all new requests go to Server B (TC SB > > worker.ajp13-sb) > > since this is then the only running Tomcat FINE > > This is our Fail-Over Solution (lost running sessions, but OK). > > > > Now the webapp update Scenario: > > > > 1.) shutdown TC SB on Server B, update the webapp, start tc SB > and test via > > a seperate HTTPConnector port without Apache. > > 2.) this does not affect anything on production, since the > lbfactor=0 on TC > > SB > > -> no sessions arrive on tc SB > > 3.) When the test was successful, our Standby Tomcat SB is updated > > 4.) Now upgrade the webapp on Server A TC 02, which is currently not > > running. > > 5.) Start up TC 02 on Server A with the new version of the webapp, > > immediately exchange the worker.properties with a different version and > > gracefully restart apache: > > > > worker.list=lb-worker > > > > worker.ajp13-01.port=11009 > > worker.ajp13-01.host=A > > worker.ajp13-01.type=ajp13 > > worker.ajp13-01.lbfactor=1 > > worker.ajp13-01.local_worker=0 <---- put old webapp on TC 01 to the > > foreign worker list > > > > worker.ajp13-02.port=11019 > > worker.ajp13-02.host=A > > worker.ajp13-02.type=ajp13 > > worker.ajp13-02.lbfactor=1 > > worker.ajp13-02.local_worker=1 <---- put new webapp on TC > 02 in front of > > the local worker list > > > > worker.ajp13-sb.port=11015 > > worker.ajp13-sb.host=B > > worker.ajp13-sb.type=ajp13 > > worker.ajp13-sb.lbfactor=0 > > worker.ajp13-sb.local_worker=1 > > > > worker.lb-worker.type=lb > > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb > > worker.lb-worker.local_worker_only=0 > > > > Just the two lines marked above with <---- swap > > (local_worker values of TC 01 and TC 02) > > > > 6.) now all 3 Tomcats are running. All existing sessions still > go to TC 01 > > (sticky sessions; we do not loose running sessions) > > 7.) What I expect: > > TC 02 takes a while to startup. > > The worker List order should now be: > > 1. worker.ajp13-02 lbfactor=1,local_worker=1 TC 02 > > 2. worker.ajp13-sb lbfactor=0,local_worker=1 TC SB > > 3. worker.ajp13-01 lbfactor=1,local_worker=0) TC 01 (old webapp) > > > > Since TC 02 needs 3 minutes to start up (filling caches etc.) it is not > > immediately availlable. > > During this time new sessions arrive at TC SB, since it is the > next in the > > worker list. OK fine this works. > > Since these sessions are sticky as well, all users connecting > during this > > time stay on TC SB > > during their whole session life. FINE > > > > 8.) As soon as TC 02 is up and running (finished all > load-on-startup servlet > > initialisition stuff) > > I would expect that TC 02 gets all new Sessions (Number 1 in the worker > > List). > > > > This is not the case! All new Sessions still arrive at TC SB. > > > > 9.) After a while (one hour) we shutdown TC 01. Since no new sessions > > arrived there since our > > graceful restart of Apache, all old Sessions should have expired. > > > > 10.) even now (only 2 Tomcats running TC 02 and TC SB) and even after a > > graceful restart new Sessions > > arrive at TC SB > > > > > > Conclusion: > > Now, do I misunderstand the supposed behaviour of lbfactor and > local_worker > > flag ? > > I think that the behaviour in 8.) is wrong. 10.) is starange too. > > > > Thanks for any suggestion if I am completely wrong here > > or further looking into this. > > > > Hans > > > > > > > >>-----Ursprungliche Nachricht----- > >>Von: Glenn Nielsen [mailto:[EMAIL PROTECTED] > >>Gesendet: Mittwoch, 9. Juli 2003 15:56 > >>An: Tomcat Developers List > >>Betreff: Re: jk 1.2.25 release ? > >> > >> > >>I was hoping to get it released this week. > >> > >>But I just noticed that under Apache 2 mod_jk piped logs there > >>are two instances of the piped log program running for the same > >>log file. I want to track this down. > >> > >>I also just implemented load balancing this morning on a production > >>server. I noticed that when none of the workers for the load balancer > >>were available an HTTP status code of 200 was being logged in mod_jk.log > >>when request logging was enabled. So I want to look into this also. > >> > >>Hopefully now that I have load balancing in place with 2 tomcat servers > >>instead of 1 the Missouri Lottery web site I administer will scale to > >>handle the big spike in load tonight for the $240 PowerBall jackpot. :-) > >> > >>Regards, > >> > >>Glenn > >> > >>Henri Gomez wrote: > >> > >>>Any date ? > >>> > >>> > >>>--------------------------------------------------------------------- > >>>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >> > >> > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]