Sorry Glenn,

by looking deeper into the mod_jk.log. When changing worker names, I
realized, that I was actually
restarting Apache with the same worker.properties every time.

There was a link earlier in the configuration chain, which made my switching
useless :(

We should definately reduce our linking !!!

Thanks very much.

p.s. If anybody else is interested in our LB/failover setup I am glad to
provide some info.

Best regards,
Hans

> -----Ursprungliche Nachricht-----
> Von: Hans Schmid [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 16. Juli 2003 15:15
> An: Tomcat Developers List
> Betreff: AW: jk 1.2.4 LB bug?
>
>
> Thanks for your reply,
> comments see inline
>
> > -----Ursprungliche Nachricht-----
> > Von: Glenn Nielsen [mailto:[EMAIL PROTECTED]
> > Gesendet: Mittwoch, 16. Juli 2003 12:26
> > An: Tomcat Developers List
> > Betreff: Re: jk 1.2.4 LB bug?
> >
> >
> > mod_jk will print out information about the lb config if you set
> > the JkLogLevel to debug.
> >
>
> done
>
> > I would suggest setting up a test system where you can test
> > the below with JkLogLevel debug configured.  Then grep the
> > log for lines which have jk_lb_worker.c in them.
> >
>
> OK
>
> > This will tell you several things.
> >
> > 1. Whether the worker.properties are getting reread when you
> >    do an apache restart.  (They should be)
> >
>
> Yes they were reread:
> Initial:
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
> got 6 workers
> to destroy
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
> will destroy
> worker lb-einsurance
> [Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
> jk_worker_t::destroy
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
> jk_worker_t::destroy
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
> jk_worker_t::destroy up to 1 endpoint to close
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (605)]: In
> jk_endpoint_t::ajp_close_endpoint
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (612)]: In
> jk_endpoint_t::ajp_close_endpoint, closed sd = 12
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
> jk_worker_t::destroy
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
> jk_worker_t::destroy up to 1 endpoint to close
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
> got 6 workers
> to destroy
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
> got 6 workers
> to destroy
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
> got 6 workers
> to destroy
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
> jk_worker_t::destroy
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
> will destroy
> worker lb-einsurance
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
> will destroy
> worker lb-einsurance
> [Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
> will destroy
> worker lb-einsurance
> [Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
> jk_worker_t::destroy up to 1 endpoint to close
> [Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
> jk_worker_t::destroy
> [Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
> jk_worker_t::destroy
> [Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
> jk_worker_t::destroy
>
> ... closing other not related worker
>
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (172)]: Into
> jk_uri_worker_map_t::uri_worker_map_alloc
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (375)]: Into
> jk_uri_worker_map_t::uri_worker_map_open
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (396)]:
> jk_uri_worker_map_t::uri_worker_map_open, rule map size is 12
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (321)]: Into
> jk_uri_worker_map_t::uri_worker_map_open, match rule
> /einsurance/=lb-einsurance was added
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (345)]: Into
> jk_uri_worker_map_t::uri_worker_map_open, exact rule
> /einsurance=lb-einsurance was added
>
> ... 5 other workers (including other lb-workers and normal workers)
>
> added
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (408)]: Into
> jk_uri_worker_map_t::uri_worker_map_open, there are 12 rules
> [Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (422)]:
> jk_uri_worker_map_t::uri_worker_map_open, done
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (88)]: Into wc_open
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (222)]: Into build_worker_map,
> creating 6 workers
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (228)]:
> build_worker_map, creating
> worker lb-einsurance
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
> wc_create_worker, about to
> create instance lb-einsurance of lb
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (586)]: Into lb_worker_factory
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
> wc_create_worker, about to
> validate and init lb-einsurance
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (420)]: Into
> jk_worker_t::validate
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
> wc_create_worker, about to
> create instance ajp13-01 of ajp13
> [Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
> ajp13_worker_factory
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
> wc_create_worker, about to
> validate and init ajp13-01
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
> jk_worker_t::validate
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
> jk_worker_t::validate for worker ajp13-01 contact is tomcathost-ei:11009
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into
> jk_worker_t::init
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In
> jk_worker_t::init,
> setting socket timeout to 0
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
> wc_create_worker, about to
> create instance ajp13-02 of ajp13
> [Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
> ajp13_worker_factory
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
> wc_create_worker, about to
> validate and init ajp13-02
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
> jk_worker_t::validate
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
> jk_worker_t::validate for worker ajp13-02 contact is tomcathost-ei:11019
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into
> jk_worker_t::init
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In
> jk_worker_t::init,
> setting socket timeout to 0
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
> wc_create_worker, about to
> create instance ajp13-sb of ajp13
> [Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
> ajp13_worker_factory
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
> wc_create_worker, about to
> validate and init ajp13-sb
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
> jk_worker_t::validate
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
> jk_worker_t::validate for worker ajp13-sb contact is
> tomcathost-ei-sb:11015
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into
> jk_worker_t::init
> [Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In
> jk_worker_t::init,
> setting socket timeout to 0
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
> name ajp13-01
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
> name ajp13-sb
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
> name ajp13-02
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
> true
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (505)]:
> local_worker_only: false
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
> [Wed Jul 16 14:11:16 2003]  [jk_worker.c (238)]:
> build_worker_map, removing
> old lb-einsurance worker
>
> this last line looks suspicous to me
>
> > 2. What the lb worker thinks the config is.
> >
>
> initial:
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (586)]: Into lb_worker_factory
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (420)]: Into
> jk_worker_t::validate
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
> name ajp13-01
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
> name ajp13-sb
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
> name ajp13-02
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
> true
> [Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (505)]:
> local_worker_only: false
>
> but after the switching and graceful restart exactly the same
> (which is the
> error) !!!!!!!!
>
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (420)]: Into
> jk_worker_t::validate
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
> name ajp13-01
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
> name ajp13-sb
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
> name ajp13-02
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
> true
> [Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (505)]:
> local_worker_only: false
>
> This explains the observed (wrong) fall-over behavior, (should be
> ajp13-02,
> ajp13-sb, ajp13-01)
>
>
> original workers.properties:
> worker.ajp13-01.lbfactor=1
> worker.ajp13-01.local_worker=1
>
> worker.ajp13-02.lbfactor=1
> worker.ajp13-02.local_worker=0
>
> worker.ajp13-sb.lbfactor=0
> worker.ajp13-sb.local_worker=1
>
> local_worker_only=0 for the lb-worker
>
> changed to before graceful restart: (linking a different
> worker.properties)
>
> worker.ajp13-01.lbfactor=1
> worker.ajp13-01.local_worker=0
>
> worker.ajp13-02.lbfactor=1
> worker.ajp13-02.local_worker=1
>
> worker.ajp13-sb.lbfactor=0
> worker.ajp13-sb.local_worker=1
>
> local_worker_only=0 the lb-worker
>
>
>
> So it *seems* there might be something wrong with the reinitialisation of
> the worker order ?
>
>
> If you need further information, I can mail you the complete logs offline.
>
> Thanks for looking into this,
> Hans
>
>
>
> > Then post those log lines here.
> >
> > Thanks,
> >
> > Glenn
> >
> > Hans Schmid wrote:
> > > Hi,
> > >
> > > I noticed the following with mod_jk 1.2.4, Apache 1.3.26 and
> > > Tomcat 3.3.1a on Solaris 8 JDK 1.4.1_03.
> > >
> > > Maybe a LB bug (Loadfactors do not recover after startup of new
> > > tomcat/graceful Apache restart).
> > >
> > > Let me explain my scenario first:
> > >
> > > I want to gracefully upgrade our webapp without loosing
> > sessions + have a
> > > fail over scenario.
> > > Therefor we have sticky sessions enabled.
> > >
> > > Setup:
> > > 1 tomcat 01 running on Server A,
> > > 0 tomcat 02 running on Server A,
> > > 1 tomcat SB running on Server B
> > >
> > > 01 tomcat on Server A runs the application, SB tomcat on server B is
> > > standby(fallback),
> > > 02 tomcat is shutdown on Server A at the moment.
> > >
> > > All three Tomcats are in the same lb_worker:
> > >
> > >
> > > worker.list=lb-worker
> > >
> > > worker.ajp13-01.port=11009
> > > worker.ajp13-01.host=A
> > > worker.ajp13-01.type=ajp13
> > > worker.ajp13-01.lbfactor=1
> > > worker.ajp13-01.local_worker=1
> > >
> > > worker.ajp13-02.port=11019
> > > worker.ajp13-02.host=A
> > > worker.ajp13-02.type=ajp13
> > > worker.ajp13-02.lbfactor=1
> > > worker.ajp13-02.local_worker=0
> > >
> > > worker.ajp13-sb.port=11015
> > > worker.ajp13-sb.host=B
> > > worker.ajp13-sb.type=ajp13
> > > worker.ajp13-sb.lbfactor=0
> > > worker.ajp13-sb.local_worker=1
> > >
> > > worker.lb-worker.type=lb
> > > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb
> > > worker.lb-worker.local_worker_only=0
> > >
> > >
> > > The worker List order should now be:
> > >  1. worker.ajp13-01 lbfactor=1,local_worker=1  TC 01
> > >  2. worker.ajp13-sb lbfactor=0,local_worker=1  TC SB
> > >  3. worker.ajp13-02 lbfactor=1,local_worker=0) TC 02  (not running)
> > >
> > > Now all requests go to worker.ajp13-01 (TC 01), none to
> > worker.ajp13-sb (TC
> > > SB lbfactor=0),
> > > none to worker.ajp13-02.port (TC 02 not running).
> > >
> > > If Server A crashes (TC 01) all new requests go to Server B (TC SB
> > > worker.ajp13-sb)
> > > since this is then the only running Tomcat FINE
> > > This is our Fail-Over Solution (lost running sessions, but OK).
> > >
> > > Now the webapp update Scenario:
> > >
> > > 1.) shutdown TC SB on Server B, update the webapp, start tc SB
> > and test via
> > > a seperate HTTPConnector port without Apache.
> > > 2.) this does not affect anything on production, since the
> > lbfactor=0 on TC
> > > SB
> > > -> no sessions arrive on tc SB
> > > 3.) When the test was successful, our Standby Tomcat SB is updated
> > > 4.) Now upgrade the webapp on Server A TC 02, which is currently not
> > > running.
> > > 5.) Start up TC 02 on Server A with the new version of the webapp,
> > > immediately exchange the worker.properties with a different
> version and
> > > gracefully restart apache:
> > >
> > > worker.list=lb-worker
> > >
> > > worker.ajp13-01.port=11009
> > > worker.ajp13-01.host=A
> > > worker.ajp13-01.type=ajp13
> > > worker.ajp13-01.lbfactor=1
> > > worker.ajp13-01.local_worker=0     <---- put old webapp on TC
> 01 to the
> > > foreign worker list
> > >
> > > worker.ajp13-02.port=11019
> > > worker.ajp13-02.host=A
> > > worker.ajp13-02.type=ajp13
> > > worker.ajp13-02.lbfactor=1
> > > worker.ajp13-02.local_worker=1     <---- put new webapp on TC
> > 02 in front of
> > > the local worker list
> > >
> > > worker.ajp13-sb.port=11015
> > > worker.ajp13-sb.host=B
> > > worker.ajp13-sb.type=ajp13
> > > worker.ajp13-sb.lbfactor=0
> > > worker.ajp13-sb.local_worker=1
> > >
> > > worker.lb-worker.type=lb
> > > worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb
> > > worker.lb-worker.local_worker_only=0
> > >
> > > Just the two lines marked above with <---- swap
> > > (local_worker values of TC 01 and TC 02)
> > >
> > > 6.) now all 3 Tomcats are running. All existing sessions still
> > go to TC 01
> > > (sticky sessions; we do not loose running sessions)
> > > 7.) What I expect:
> > > TC 02 takes a while to startup.
> > > The worker List order should now be:
> > >  1. worker.ajp13-02 lbfactor=1,local_worker=1  TC 02
> > >  2. worker.ajp13-sb lbfactor=0,local_worker=1  TC SB
> > >  3. worker.ajp13-01 lbfactor=1,local_worker=0) TC 01  (old webapp)
> > >
> > > Since TC 02 needs 3 minutes to start up (filling caches etc.)
> it is not
> > > immediately availlable.
> > > During this time new sessions arrive at TC SB, since it is the
> > next in the
> > > worker list. OK fine this works.
> > > Since these sessions are sticky as well, all users connecting
> > during this
> > > time stay on TC SB
> > > during their whole session life. FINE
> > >
> > > 8.) As soon as TC 02 is up and running (finished all
> > load-on-startup servlet
> > > initialisition stuff)
> > > I would expect that TC 02 gets all new Sessions (Number 1 in
> the worker
> > > List).
> > >
> > > This is not the case! All new Sessions still arrive at TC SB.
> > >
> > > 9.) After a while (one hour) we shutdown TC 01. Since no new sessions
> > > arrived there since our
> > > graceful restart of Apache, all old Sessions should have expired.
> > >
> > > 10.) even now (only 2 Tomcats running TC 02  and TC SB) and
> even after a
> > > graceful restart new Sessions
> > > arrive at TC SB
> > >
> > >
> > > Conclusion:
> > > Now, do I misunderstand the supposed behaviour of lbfactor and
> > local_worker
> > > flag ?
> > > I think that the behaviour in 8.) is wrong. 10.) is starange too.
> > >
> > > Thanks for any suggestion if I am completely wrong here
> > > or further looking into this.
> > >
> > > Hans
> > >
> > >
> > >
> > >>-----Ursprungliche Nachricht-----
> > >>Von: Glenn Nielsen [mailto:[EMAIL PROTECTED]
> > >>Gesendet: Mittwoch, 9. Juli 2003 15:56
> > >>An: Tomcat Developers List
> > >>Betreff: Re: jk 1.2.25 release ?
> > >>
> > >>
> > >>I was hoping to get it released this week.
> > >>
> > >>But I just noticed that under Apache 2 mod_jk piped logs there
> > >>are two instances of the piped log program running for the same
> > >>log file.  I want to track this down.
> > >>
> > >>I also just implemented load balancing this morning on a production
> > >>server.  I noticed that when none of the workers for the load balancer
> > >>were available an HTTP status code of 200 was being logged in
> mod_jk.log
> > >>when request logging was enabled. So I want to look into this also.
> > >>
> > >>Hopefully now that I have load balancing in place with 2
> tomcat servers
> > >>instead of 1 the Missouri Lottery web site I administer will scale to
> > >>handle the big spike in load tonight for the $240 PowerBall
> jackpot. :-)
> > >>
> > >>Regards,
> > >>
> > >>Glenn
> > >>
> > >>Henri Gomez wrote:
> > >>
> > >>>Any date ?
> > >>>
> > >>>
> > >>>---------------------------------------------------------------------
> > >>>To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >>>For additional commands, e-mail: [EMAIL PROTECTED]
> > >>>
> > >>
> > >>
> > >>
> > >>---------------------------------------------------------------------
> > >>To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >>For additional commands, e-mail: [EMAIL PROTECTED]
> > >>
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to