Thank you for the help.  You spend a great deal of time help folks on here
(as evidenced by the absurd number of times you have answered questions like
this :))...and it is appreciated.

I can't believe I missed the millisecond thing. :)

I'll check the log and the dump for the individual problems and post to this
thread with the results (or more questions).

LES


Rainer Jung-3 wrote:
> 
> On 24.05.2010 23:36, LES wrote:
>>
>> I am having some trouble keeping a mod_jk setup stable.  At this point, I
>> feel like I am too far into trial and error mode and would like some help
>> figuring out how to identify the problem.
>>
>> My current setup involves, two linux (RHEL 5) server each running two
>> tomcat
>> instances (6.0.20).  A third RHEL 5 box is running apache (2.2.3) with
>> mod_jk(1.2.28).  I am using terracotta to "cluster" the tomcat sessions.
>>
>> The problem that I am having is that under small load (and unfortunately,
>> intermittently), I get random nodes that produce errors.  Typically these
>> errors indicate that mod_jk can no longer contact tomcat (see excerpts
>> below).  In most cases, the the user request just hangs (never returns).
>> So, it also appears that the errors are not causing a session failover --
>> though I need to confirm that again after my recent round of changes.  In
>> most cases, these nodes that are in error recover on their own.  However,
>> during the failure event, I get a bunch of unhappy users.  I am hoping to
>> find a way to make the nodes more stable and then address the fail-over
>> aspect.
>>
>> I have tried different mod_jk parameters and think I have settled on a
>> decent set of them.  I have all of the garbage collection information
>> logging out and do not seem to have any gc events that are taking longer
>> than the request timeout.  I am gathering jvm and os stats and do not see
>> a
>> hardware constraint (memory, cpu, io).  So, I am a bit of a loss on where
>> to
>> look.
>>
>> I am pasting in all of the relevant files/excerpts that I can think of. 
>> I
>> appreciate any advice on what additional data to gather to shed light on
>> this problem (outright solutions are welcome too :)).
>>
>> Please let me know if there is any other information that would be
>> helpful.
>>
>> Thanx,
>> LES
>>
>>
>> ************* workers.properties **************
>> # Define 1 real worker using ajp13
>> worker.list=lb,jkstatus,cas
>> # Set properties for worker1 (ajp13)
>> worker.template.type=ajp13
>> worker.template.retries=4
>> worker.template.lbfactor=1
>> worker.template.reply_timeout=300000
>> worker.template.max_reply_timeouts=4
>> worker.template.connection_pool_timeout=60
>> worker.template.ping_mode=A
>> #worker.template.socket_timeout=10
> 
> This is in milliseconds, I guess you want 10000:
> 
>> worker.template.socket_connect_timeout=10
>>
>> worker.tomcat01-instance1.reference=worker.template
>> worker.tomcat01-instance1.host=tomcat01.barnhardt.local
>> worker.tomcat01-instance1.port=8009
>>
>> worker.tomcat01-instance2.reference=worker.template
>> worker.tomcat01-instance2.host=tomcat01.barnhardt.local
>> worker.tomcat01-instance2.port=18009
>>
>> worker.tomcat02-instance1.reference=worker.template
>> worker.tomcat02-instance1.host=tomcat02.barnhardt.local
>> worker.tomcat02-instance1.port=8009
>>
>> worker.tomcat02-instance2.reference=worker.template
>> worker.tomcat02-instance2.host=tomcat02.barnhardt.local
>> worker.tomcat02-instance2.port=18009
>>
>> worker.cas.type=ajp13
>> worker.cas.host=localhost
>> worker.cas.port=8009
>> worker.cas.lbfactor=1
>> worker.cas.connection_pool_timeout=600
>> worker.cas.socket_keepalive=1
> 
> I don't like the raw socket_timeout, but well ...
> 
>> worker.cas.socket_timeout=60
>>
>> # Set properties for lb which use the other workers
>> worker.lb.type=lb
>> #worker.lb.method=B
>> worker.lb.sticky_session=True
>> worker.lb.balance_workers=tomcat01-instance1,tomcat01-instance2,tomcat02-instance1,tomcat02-instance2
>>
>> # Define a 'jkstatus' worker using status
>> worker.jkstatus.type=status
>> ***********************************************
>>
>>
>> ****** Errors from log *******
>>
>> //////This particular error(info) seems to happen constantly - is it a
>> normal operational thing?
> 
> Yes, it is not an error, it is an "info2 message. It simply says that 
> all connections from your apache process to tomcat were closed and a 
> fresh one had to be opened.
> 
>> [Mon May 24 10:22:56 2010] [26131:4045374208] [info]
>> ajp_send_request::jk_ajp_common.c (1496): (tomcat02-instance2) all
>> endpoints
>> are disconnected, detected by connect check (1), cping (0), send (0)
>> [Mon May 24 11:55:21 2010] [2711:4045374208] [info]
>> ajp_send_request::jk_ajp_common.c (1496): (tomcat02-instance1) all
>> endpoints
>> are disconnected, detected by connect check (1), cping (0), send (0)
>> [Mon May 24 13:08:25 2010] [27439:4045374208] [info]
>> ajp_send_request::jk_ajp_common.c (1496): (tomcat01-instance1) all
>> endpoints
>> are disconnected, detected by connect check (1), cping (0), send (0)
> 
> So I'd say somethoing gets stuck in your tomcat (likely: your webapp) 
> and mod_jk detects that by use of the reply timeout. Since you have a 5 
> minute reply timeout, chances are good to find those request and the 
> cause for their hanging or excessively long response time by use of
> 
> - a tomcat access log with an improved patern containing "%D" and if 
> your Tomcat is recent enough also "%I"
> - and regular thread dumps
> 
>> ////This error happens intermittently and seems to cause some the the
>> cluster problems I mentioned above
>> [Mon May 24 07:19:21 2010] [27432:4045374208] [error]
>> ajp_get_reply::jk_ajp_common.c (1926): (tomcat01-instance2) Timeout with
>> waiting reply from tomcat. Tomcat is down, stopped or network problems
>> (errno=110)
>> [Mon May 24 07:19:23 2010] [27432:4045374208] [info]
>> ajp_service::jk_ajp_common.c (2447): (tomcat01-instance2) sending request
>> to
>> tomcat failed (recoverable), because of reply timeout (attempt=1)
>> [Mon May 24 07:24:23 2010] [27432:4045374208] [error]
>> ajp_get_reply::jk_ajp_common.c (1926): (tomcat01-instance2) Timeout with
>> waiting reply from tomcat. Tomcat is down, stopped or network problems
>> (errno=110)
>> [Mon May 24 07:24:25 2010] [27432:4045374208] [info]
>> ajp_service::jk_ajp_common.c (2447): (tomcat01-instance2) sending request
>> to
>> tomcat failed (recoverable), because of reply timeout (attempt=2)
> 
> I guess the nex one is due to the socket_connect_timeout set to 10 
> milliseconds instead of 10 seconds:
> 
>> ////I get this error occassionally, too
>> [Sun May 23 03:48:51 2010] [15814:4045374208] [info]
>> jk_open_socket::jk_connect.c (594): connect to 192.168.60.157:8009 failed
>> (errno=115)
>> [Sun May 23 03:48:51 2010] [15814:4045374208] [info]
>> ajp_connect_to_endpoint::jk_ajp_common.c (922): Failed opening socket to
>> (192.168.60.157:8009) (errno=115)
>> [Sun May 23 03:48:51 2010] [15814:4045374208] [error]
>> ajp_send_request::jk_ajp_common.c (1507): (tomcat02-instance1) connecting
>> to
>> backend failed. Tomcat is probably not started or is listening on the
>> wrong
>> port (errno=115)
> 
> Error number 104 (errno=104) is "Connection reset by peer" n RHEL 5:
> 
>> ////Third time is a charm...another error for the hat trick
>> [Sat May 22 21:41:17 2010] [13933:4045374208] [info]
>> ajp_connection_tcp_get_message::jk_ajp_common.c (1150):
>> (tomcat01-instance1)
>> can't receive the response header message from tomcat, network problems
>> or
>> tomcat (192.168.60.156:8009) is down (errno=104)
> 
> Regards,
> 
> Rainer
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/mod_jk-stability-issues-tp28662097p28667920.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to