All Apologies, this is unrelated. How do I unsubscribe from this mailing list, I thought it would be useful and small but its overwhelming my inbox?
Thanks in Advance. Luke Walshe BT Operate, HGIPCC Technical Specialist Telephone: +44 (0)1314483482, Email: [EMAIL PROTECTED] -----Original Message----- From: Ahmed Musa [mailto:[EMAIL PROTECTED] Sent: 21 February 2008 09:25 To: Tomcat Users List Subject: Re: mod_jk Problems - - worker went to error state and dont recover Hello Rainer, Thanks for your informations - the Situation gets more clear now. I will read again some dics - following your links and will make further tests also with the improved logging. Thanks a lot for your time with best regards ahmed -------- Original-Nachricht -------- > Datum: Wed, 20 Feb 2008 18:59:01 +0100 > Von: Rainer Jung <[EMAIL PROTECTED]> > An: Tomcat Users List <users@tomcat.apache.org> > Betreff: Re: mod_jk Problems - - worker went to error state and dont recover > Ahmed Musa wrote: > > Hello, > > Wow -thank you very much Rainer for your very quick and informative > answer. > > I will go to 1.2.26 and think about some "smoother" Values for > reply_timeout and max_reply_timeouts. > > I will search for the requests which causes the Problems - becasue i > still log the response time in your mentioned way - but I am not sure that the > Userrequests are responsible for the Situation. > > One note: for Apache httpd 2.x %d is microseconds (there is no format > for milliseconds), for Tomcat %D is milliseconds. As long as you are > searching for the root cause, it might make sense to have both access > logs active to check about duration differences. > > > So one further question - does mod_jk itself checks if the Backend is > reachable - without userrequests? > > No. Everything only works on top of user requests. > > > When there are connections to the Backend - are they closed after the > respone or are the hold open for further requests. > > In general hold open. There are parameters on how long they are held > open without more requests before they get shut down, and also how many > might be kept open even when no requests are coming in. Those are the > connection pool parameters, which you will find on > > http://tomcat.apache.org/connectors-doc/reference/workers.html > > Tomcat also has a connectionTimeout on the connector, which will shut > down a connection from the Tomcat side if it is idle for to long. > > If you don't want to reuse connections at all, there's also a setting (a > JkOption in Apache). > > > Is it possible that the Checkpoint Firewall in Between can be > responsible for the connectivity problem? > > It can cut a connection that's idle for too long. Since you have > cping/cpong active via connect_timeout and prepost_timeout, you should > get a cping error message, if the connection was dropped by the firewall > during idle times and mod_jk tries to use it again. The reply timeout in > the error log indicates, that the backend isn't answering. Of course if > it takes *very* long to answer, it might be that the firewall dropped > the connection in between, but then the root cause would still be the > long response time of the backend. > > > Another point is the "not recovering" of the worker. Yes, you are right > - in this situation i have many reply_timeouts - but these happens in a > period of time - for example 30 minutes - but the worker is still dead even > then when there are no more reply_timeouts. It remains dead. > > It was necessary to restart it manually via jkstatus. > > I assume you are using stickyness, so when a session started on a node, > it will stay there. So when a worker is in error for a long time, all > new sessions will start on other nodes. If the worker is ready for > recovery, it needs a request, that doesn't carry a session to get probed > with this request. > > In jkstatus, the status of an error worker should switch to REC, when > mod_jk decides that it could send a non-sticky request there (to probe) > and to PRB, during the time this request is on the node, and finally > either to OK or back to ERR depending on the result of the request. > > You can log the number of errors (and accesses) that happened on the > node in the httpd access log. If you think that the node simply stays in > error for a long time, then the error count (and access count) should > stay constant. I would expect, that they do not. > > Have a look at how LogFormat in Apache httpd works, and then add some of > those documented in > > http://tomcat.apache.org/connectors-doc/reference/apache.html > > like: > > JK_LB_LAST_NAME > JK_LB_LAST_ACCESSED > JK_LB_LAST_ERRORS > JK_LB_LAST_BUSY > JK_LB_LAST_STATE > > using the syntax %{JK_LB_LAST_STATE}n etc. > > > > > Another point is the learning - i read the dics - the infos on the > apache Website i dont't find other ones - are there other ones ? - and they are > not going in depth - if you read the spec and watch the logs it is - for me > - very hard to match the things. Also the many possibilities that mod_jk > has to prove if there is a connection to the Backend,... - i understand them > but check the reality in an error situation is very hard. Under matching i > mean "Which Part of the Communication sequence failed - why - and causes > which error message". > > But i will try - and study also the mailing list.. > > It's hard for us too (sometimes). > > > Thank you for your time - tomorrow we will have the new version and will > see what happens. > > > > best > > ahmed > > > Regards, > > Rainer > > > -------- Original-Nachricht -------- > >> Datum: Wed, 20 Feb 2008 15:56:42 +0100 > >> Von: Rainer Jung <[EMAIL PROTECTED]> > >> An: Tomcat Users List <users@tomcat.apache.org> > >> Betreff: Re: mod_jk Problems - - worker went to error state and dont > recover > > > >> [EMAIL PROTECTED] wrote: > >>> See Thread at: http://www.techienuggets.com/Detail?tx=25608 Posted on > >> behalf of a User > >>> Hallo to all, After long unsuccessful research i hope someone can > >>> give me a hint to the following problems. > >>> > >>> Our Apache-mod_jk-Tomcat Infrastructur was running without Problems > >>> for about one year-than since two month mod_jk errors occurs. > >>> We upgraded the mod_jk Version, made improvements in the > >>> worker.properties - the problems changed and get less but sometimes > they > >>> appear further on. > >>> > >>> It seems that the mod_jk worker loose the connection to their > >>> Tomcat-Backendserver - there are messages in the mod_jk log Files > which > >>> points in this direction. Normally this seems not to be a big problem > - > >>> but under certain conditions (which ?) the worker goes to an error > state > >>> and cannot recover itself- must be done manually. > >>> > >>> Problem 1: The Tomcats are reachable - unknown why the workers think > the > >> server is dead ? > >>> Problem 2: I have no idea why the worker goes to an error state and > >> cannot recover. > >> > >> 2 is a consequence of 1 > >> > >>> Problem3: I miss explanations of logged messages - i read the messages > - > >> but cannot match them to the situation - when does a worker post this > >> messages > >> > >> 1 is a consequence of these messages > >> > >>> [Wed Feb 20 10:04:01.889 2008] [19237:3086010048] [info] > >> jk_handler::mod_jk.c (2270): Aborting connection for worker=ajp_ggi > >>> [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] > >> ajp_get_reply::jk_ajp_common.c (1623): (INETP1011) Timeout with waiting > reply from > >> tomcat. Tomcat is down, stopped or network problems (errno=110) > >>> [Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error] > >> ajp_service::jk_ajp_common.c (2034): (INETP1011) receiving reply from > tomcat failed with > >> out recovery in send loop attempt=0 > >>> [Wed Feb 20 10:04:41.799 2008] [19294:3086010048] [error] > >> service::jk_lb_worker.c (1105): unrecoverable error 504, request > failed. Tomcat failed in > >> the middle of request, we can't recover to another instance. > >> > >> The second line tells us, that your configured reply_timeout fired. > >> You set it to 120000 (2 minutes), so there are requests taking longer > >> than 2 minutes on the backend, before the first response packet comes > >> back from the backend. > >> > >> With your configuration mod_jk then doesn't wait any longer on the > reply > >> *and puts the backend into error mode*. > >> > >> Up until version 1.2.25, if you use a reply-timeout, you need to set it > >> to a high number which justifies the resoning "if it takes that long, > >> that something is wrong with the backend". > >> > >> Reality shows: there is no such number. Often there are few requests > >> that take unaccetably long on the backend *although* the backend is > >> still working. > >> > >> So in 1.2.25 we added max_reply_timeouts. With this set in addition to > >> reply_timeout, mod_jk will abort waiting for a reply after > >> reply_timeout, but allow some timeouts before actually deciding to put > >> the backend into error. > >> > >> Unfortunately the implementation of max_reply_timeouts in 1.2.25 was > >> wrong, so you need to go to 1.2.26 to get it working right. > >> > >> See: > >> > >> http://issues.apache.org/bugzilla/show_bug.cgi?id=43229 > >> > >> Caution: this does *not* explain, why the backends are not > automatically > >> recovered after a minute of error condition. Maybe you have times, > where > >> you getr to many of those reply_timeouts (see log file), and although > we > >> recover after a minute the backend almost immediately goes back into > >> error status. > >> > >>> -> Which Timeout - how does mod_jk think Tomcat is down ? Where can i > >> found details to errno=110 ?... > >> > >> reply_timeout, see above and also > >> > >> http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html > >> > >> errno: a standard unix feature. The numbers are platform dependent. I > >> would assume in your case > >> > >> ETIMEDOUT 110 /* Connection timed out */ > >> > >> so no wonder, that's exactly what we expect (and doesn't tell us the > >> reason, i.e. what's wrong on the *backend* taking that long for a > >> response). > >> > >>> -> receiving reply from tomcat failed with out recovery in send loop > >> attempt=0 - ? with out recovery in send loop - means? > >> > >> That your configuration doesn't allow us to send the request to another > >> backend. recovery_options 7 include: if mod_jk was able to send the > >> request to a backend, do not try to send it to another backend in case > >> of an error during the response handling. Even if you would allow > >> sending to another backend, it would not help with *not* putting the > >> worker into error state. More likely would be, that you would put all > >> workers into error state, because all of them might run into the same > >> timeout, one after the other. > >> > >>> -> unrecoverable error 504 - details to this error ? > >> That's simply how we return the situation back to the client (browser). > >> > >>> Ok - i turn the logging level to debug - the course of events get > >>> more > >>> clear - but also more questions appear - there are socket numbers - > >>> which sockets - what are these numbers e.g will be shutting down > socket > >>> 35 for worker INETP1021 - The sockets are good for ? - how many are > >>> there/per worker ? can i configure them ? > >> Should not be the problem here. For apache httpd if you do *not* > >> configure anything, we automatically choose the number of httpd threads > >> as the maximum number of connections. No need to change anything here. > >>> => Generally -How can i solve such problems - i tried to look into > >>> the > >>> mod_jk code - searching for error codes, error messages - but cannot > >>> find some relevant informations, - i am studying the log Files - but > >>> don't find out what really happens. > >> Post to the list. Improve our dics. > >> > >> The error message contains the word "timeout" and "reply" and you have > a > >> "reply_timeout". > >> > >> Long running requests are a frequent problem. If you want to get rid of > >> them, start by adding response times to your httpd and your tomcat > >> access log format (%D). Then have a look, which URLs are producing long > >> running requests, during what time of day are they happening etc. This > >> might give you a clue about the reasons. > >> > >> And if they are very frequent: do Java Thread Dumps of your backends > and > >> analyze them. > >> > >>> So - maybe someone has an idea why the worker think that the > >>> corresponding Tomcat is dead, and why he will not recover by itself. ! > >> Tomecat is dead: from the point of view of mod_jk it simply means: we > >> didn't get an answer, when we expected one. Details depend on the > >> additional log lines (could not connect, reply timeout etc.). > >> > >>> And i am also searching for tips how i can help myself - and where to > >>> find something about the error codes, messages,..in mod_jk > >>> > >>> thanks for your attention > >>> Best > >>> ahmed musa (writing from vienna) > >>> > >> Regards, > >> > >> Rainer > >> > >>> Current Infrastructur > >>> We have 3 Apache Webserver (2.2.6) -based on CentOS release 4.3 > >> /Kernelversion 2.6.9-34 > >>> In front of the Webserver there are two (two Locations) > HW-Loadbalancer > >> (but they have no role in this story) > >>> The Webservers are hosted at our ISP. > >>> > >>> The Webserver balance the requests via mod_jk (Version 1.2.25) for > >>> approx. 10 Webapps to 18 Backend-Tomcatserver (Bladeserver - because > of > >>> underlying Application-Parts the OS is Windows 2003 Server - a long > >>> story not worth to explain :-) ). The Tomcatserver gain Data via > >>> Requests against DB2 Server/DB2-Databases on the Mainframe. The > >>> Tomcatserver are Inhouse -and were rebooted nightly because of > automated > >>> Deployment processes. > >>> > >>> Between the Webserver and the Tomcatserver is a Checkpoint Firewall. > >>> All webapps are deployed on all Tomcats - only mod_jk manages the > >>> requests to certain Tomcat- instances. > >>> (on one Bladeserver there are two identically Tomcat Instances > >>> running). > >>> > >>> Versions: Tomcat - 5.5.17_11, JDK 1.5.0_11-b03. The requests against > >>> the public Website(s) are normal short living requests - not many - > The > >>> most Webapps (Portals) need a login, have a strong focus on business > >>> logic - so the instances are big (many MBs in RAM), the sessions are > >>> sticky and the session timeout is 20 minutes. But there are also less > >>> requests. To the User requests - Monitoring requests from our ISP are > >> added. > >>> The Problems appears at Servers/Portals which very less Userrequests. > >>> > >>> worker.properties > >>> worker.list=ajp_bam,ajp_ggi,ajp_ad,ajp_svp,.......,jkstatus > >>> > >>> worker.template.type=ajp13 > >>> worker.template.lbfactor=5 > >>> worker.template.socket_keepalive=1 > >>> worker.template.connect_timeout=7000 > >>> worker.template.prepost_timeout=5000 > >>> worker.template.reply_timeout=120000 > >>> worker.template.retries=6 > >>> worker.template.activation=Active > >>> worker.template.recovery_options=7 > >>> > >>> worker.lbtemplate.type=lb > >>> worker.lbtemplate.max_reply_timeouts=6 > >>> worker.lbtemplate.method=Session > >>> > >>> #Produktions Worker > >>> # AS-INETP101 - 106 - 6/6 GGI > >>> worker.INETP1011.host=AS-INETP101.AEAT.ALLIANZ.AT > >>> worker.INETP1011.port=65001 > >>> worker.INETP1011.reference=worker.template > >>> > >>> ....many more of the same > >>> > >>> then > >>> > >>> worker.ajp_ad.reference=worker.lbtemplate > >>> worker.ajp_ad.balance_workers=INETP1032,INETP1062 > >>> > >>> .... many more portals > >>> > >>> at least jkstatus > >>> > >>> The JKMount is very simple > >>> JkMount /* ajp_ad --- for the other portals mostly the same > >>> > >>> The Portals are Virtual Hosts on the Apache. > >>> > >>> Tomcat - server.xml > >>> example > >>> <Connector port="65001" maxThreads="300" protocol="AJP/1.3" /> > >>> <Engine name="Catalina" jvmRoute="INETP5021" > defaultHost="default"> > >>> ...... > >>> <Host name="slfinsol.com" appBase="webapps" unpackWARs="true" > >>> autoDeploy="false" deployOnStartup="false" xmlValidation="false" > >>> xmlNamespaceAware="false"> > >>> <Alias>www.slfinsol.com</Alias> > >>> <Alias>web1.slfinsol.com</Alias> > >>> ... > >>> <Alias>testweb.slfinsol.com</Alias> > >>> ..... > >>> <Valve className="org.apache.catalina.valves.AccessLogValve" > >>> directory="logs" prefix="swl_access_log." suffix=".txt" > pattern="common" > >>> resolveHosts="false" /> > >>> <Valve > >>> className="at.allianz.tomcat.valve.RequestTimeValve"/> > >>> <Valve > >>> className="at.allianz.tomcat.valve.WebcollaborationWorkaroundValve"/> > >>> <Context path="" docBase="swl" /> > >>> <Context path="/monitor5" docBase="monitor" /> > >>> <Context path="/swl" docBase="swl" /> > >>> </Host> > > --------------------------------------------------------------------- > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]