On 17/11/2011 08:48, Jeremy wrote:
> Mark, thanks for taking a crack at this.  I'm not that familiar with how
> mod_jk and AJP work under the covers, but based on my level of
> understanding, your scenario comes close but does not exactly match the
> logging we see.  My detailed comments below.
> 
> On Wed, Nov 16, 2011 at 11:53 PM, Mark Thomas <ma...@apache.org> wrote:
> 
>> On 17/11/2011 01:29, Jeremy wrote:
>>> OK, I know this seems crazy, but I've looked long and hard and cannot
>>> explain this as other than a Tomcat bug.  I'm more than willing to dig up
>>> extra information where I can, but this is a very rare problem seen in
>>> production and not reproducible at will.  Also, because this is in
>>> production, there is very little I can change, and even if I did change
>> it,
>>> it would take a month or more to verify that it is really a fix.  So what
>>> I'm really looking for is a detailed understanding of the problem so we
>> can
>>> detect it when it happens and clean up the after-effects.
>>>
>>> What we are seeing is one POST request being handled twice by our web
>> app.
>>>  It looks as though somehow the request is duplicated by Tomcat.  Here's
>>> our setup:
>>>
>>> Apache 2.2.14 using APR 1.3.8 and mod_jk 1.2.28 running on Ubuntu 10.04.3
>>> as front end, load balancing.
>>> Tomcat 6.0.29 using built-in AJP/1.3 connector running on Ubuntu 9.04
>> with
>>> 64-bit Sun Java 1.5.0_19.  Tomcat is not using APR.
>>> Tomcat mod_jk connector has keepAliveTimeout=5000 and
>>> connectionTimeout=60000 (60 seconds).
>>>
>>> Here is a simplified timeline of what we see in various logs.  I'm going
>> to
>>> leave some fields out and change others to protect confidentiality and
>> make
>>> it easier to follow, but I'll try to keep from changing anything
>> important.
>>>  Timestamps are just minutes and seconds.
>>>
>>>
>>> 51:48 POST request received by Apache over SSL.  We see the request
>> logged
>>> in Apache's access.log with this timestamp, but the log entry is actually
>>> written at 52:08 (or seems to be, since that is the time stamp of the
>> next
>>> entry in the access.log, though the entries immediately before this one
>>> have timestamps of 52:02).
>>>
>>> 51:48 log4j application logs show processing of the request by the
>>> application in thread TP-Processor18.   Processing continues with updates
>>> to database and external systems until:
>>>
>>> 51:55 last application log entry from TP-Processor18 shows normal
>>> completion of processing, which should then return from handleRequest().
>>>  The database and three external systems the app communicates with all
>> show
>>> normal processing.
>>>
>>> 51:55 log4j application log entries from TP-Processor6 show it beginning
>> to
>>> handle the request.  Processing continues normally, just as it did in
>>> TP-Processor18 until:
>>>
>>> 52:08 last application log entry from TP-Processor6.  Again external
>>> systems show normal processing.
>>>
>>> 52:08 Tomcat access log shows entry for post request stating it was
>> handled
>>> by TP-Processor6 in 7283ms
>>>
>>> 52:08 mod_jk.log: loadbalancer www.site.com 19.400742
>>>
>>> 53:06 Tomcat access log shows entry for post request stating it was
>> handled
>>> by TP-Processor18 in 77440ms
>>>
>>>
>>> How is it possible that Tomcat has 2 threads handling the same request?
>>
>> Because I think there are two requests here, rather than one.
>>
>> My guess as to what is happening is as follows:
>> - httpd sends request to Tomcat
>> - Tomcat processes it
>> - network glitch or similar means httpd never receives the response
>>
> 
> I'm with you up to here.  However, if the response were sent, then I'd
> expect to see  TP-Processor18 log its response in the Tomcat access log at
> 51:55.  Not some 71 seconds later,

It depends. If the response wasn't fully sent you could be seeing a
retry timeout of some form here.

> Also we have mod_jk set up with a reply timeout of 0, which means a retry
> would be triggered by something other than a timeout.

Ah. OK. That probably rules out that option then.

> We do have mod_jk
> retry set to 2 on he Apache servers; not sure how that is logged or what
> protections are in place against creating this problem.

Someone more familiar with the mod_jk code will hopefully suggest an
appropriate log level to get some insight into this.

> 
> - httpd resends request to Tomcat
>> - Tomcat processes it
>> - httpd receives the response
>> - httpd sends response to user
>>
>> Examining the mod_jk logs on httpd should show this although I don't
>> know what level of logging you'll need to see it.
>>
> 
> Me neither.  We're using cping/cpong to test the connectors before sending
> requests.   Those do sometimes fail, but I think that keeps the request
> from being sent in the first place.  The mod_jk log just shows the one
> request taking 19 seconds, doesn't show anything about retrying.
> 
> I'm particularly curious about how TP-Processor6 manages to take over just
> as TP-Processor18 should be sending its response and how TP-Processor18's
> response ends up getting logged over a minute later.
> 
>>
>> The only hitch with the above is that I'd expect httpd to take longer to
>> realise that there is a problem before re-trying the request.
>>
>>
> Yep, me too.
> 
> 
>>> How can we detect this as opposed to the user legitimately sending the
>> same
>>> request twice?
>>
>> Use mod_unique_id with mod_rewrite to add the unique ID as a url
>> parameter and then include this in Tomcat's access log and/or write a
>> filter that checks for duplicates.
>>
>>
> Checking for duplicates is very problematic when the rest of the system is
> not performing as expected.  If the first request failed and is being
> automatically retried as a recover step, then we want to process the
> duplicate.  Or maybe the first request is still in process and has not yet
> persisted the record that it has handled this ID before the second thread
> starts processing it and checks to see if has already been handled.  The
> current implementation has several checks to prevent duplicate processing
> and they are all failing in this instance, which is part of why I'm asking
> for help on this forum.  Even though the logs suggest the first request was
> handled normally, it appears the database changes made by the first request
> processor are not visible to the second request processor at the time it
> starts processing the request.  I have to dig into that some more, but of
> course it's secondary to why we're getting these duplicate requests
> internally generated by Tomcat.
> 
> 
>> Thanks for your help!
>>
>> The situation you describe could also be triggered by a Tomcat bug. You
>> might want to try protocol="org.apache.coyote.ajp.AjpProtocol" on the
>> connector to switch the the Coyote AJP BIO implementation.
>>
>>
> We're currently using protocol="AJP/1.3" and do not have APR installed, so
> I do not things changing to  protocol="org.apache.coyote.ajp.AjpProtocol"
> would actually change anything.

Yes, it will. You are currently using:
org.apache.jk.server.JkCoyoteHandler

I'm suggesting a switch to:
org.apache.coyote.ajp.AjpProtocol

not the APR/native AJP connector which is:
org.apache.coyote.ajp.AjpAprProtocol

An upgrade to the next 6.0.35 when released is also worth considering.
There have been some fixes to the connectors that might help (e.g. the
fixes for CVE-2011-3190).

Something to look for in your logs is what happens with the processors
in the request before the one that looks odd. Is there a pattern of, for
example, failed requests there? Note that there is not a 1 to 1 mapping
between request processor objects and processor threads. That makes this
tricky to be 100% sure but you might see something.

HTH,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to