Thank you all for your suggestions.

I've been sniffing TCP traffic all day today and noticed the following pattern:
1. Client sends an HTTP post to Tomcat
2. Tomcat receives the request and takes 15 minutes to process it
3. Tomcat eventually completes the request and attempts to write it
back to the client
4. Client never receives the response

So, you are right that the connection is dropped somewhere between the
client and Tomcat. I removed the load balancer from the equation by
hitting the app server directly.

I am troubleshooting this in our customer's environment and have no
idea how to go about tracking down where the connection gets dropped.
I guess we could somehow track network hops? Any ideas on how to track
this down?

Thanks,

Alec

On Thu, Aug 4, 2011 at 1:52 PM, André Warnier <a...@ice-sa.com> wrote:
> Alec Swan wrote:
>>
>> Hello,
>>
>> I agree that this is not a bug in Tomcat code, but I would appreciate
>> any help with troubleshooting this.
>>
>> Let me tell you about our app. We have a thousands of Metro SOAP
>> clients which hit our Tomcat server. Some SOAP HTTP requests take over
>> an hour to process because queries take a very long time to execute in
>> the database. The clients are configured to never timeout waiting on
>> the server response.
>>
>
> And could there be something else, between the client and the Tomcat, which
> could time out if no activity happens on that connection ?
> (I have known of routers/firewalls which did this, to save on connections to
> maintain).
>
>> So, I can see the following possible causes of the Client Abort
>> Exception logged by Tomcat:
>>
>> 1. The client drops its connection to the server - I would have
>> expected the client to get some sort of offline exception and
>> terminate the request. Instead, the client keeps waiting, so maybe
>> this is not the problem.
>> 2. Tomcat times out (our connectionTimeout is set to 600000) and
>> somehow fails to write the response back.
>
> I think that you can rule this one out.  The connectionTimeout, if I
> remember correctly, only applies to the time between
> - when the client establishes the HTTP connection to Tomcat
> - and the time when Tomcat receives the actual request on that connection
> It is essentially meant to avoid attacks where many clients would create
> connections to Tomcat and never send a request (and thus block threads (or
> at least connected sockets) for nothing).
>
> But check the documentation to see if maybe this parameter value is taken as
> a default for other unspecified timeouts.
>
>> 3. Tomcat's DBCP connection pool times out
>>
>> Could anybody help us enable the right kind of logging to verify the
>> last two possible causes? Can you think of any other ways to
>> troubleshoot this?
>>
>
> You can set up an AccessLogValve and measure the time taken by each request.
> And you can take thread dumps, as mentioned previously, to see what these
> Tomcat threads are really doing while the client is waiting.
> You can also trace the real packets exchanged via an external protocol
> analyser such as Wireshark. That will tell you if it is the client dropping
> the connection, or if something in-between is doing that.  Tomcat logs can
> only tell you what happens in Tomcat.
> And for Tomcat, if an intermediate agent is dropping the connection, it will
> also look like a Client Abort.
>
>
> Generally speaking, I don't think that there are many explanations for a
> "Client Abort" exception.  If it was Tomcat doing (or not doing) something
> and aborting the application, the error message would be different.
> So, considering that this is a clue, and that you have no other clue until
> now, I would start by examining that as a possible cause, and not other
> hypothetical scenarios for which you have not seen a clue yet.
>
>
>> Thanks,
>>
>> Alec
>>
>> On Thu, Aug 4, 2011 at 11:05 AM, André Warnier <a...@ice-sa.com> wrote:
>>>
>>> Hi.
>>>
>>> It dos not look to me like the problem is in Tomcat per se, but rather in
>>> the application running inside of Tomcat, and/or the client application.
>>>
>>> Let me explain what leads me to that supposition :
>>>
>>> - according to (1) and (2) below, the SOAP request was sent by the
>>> client,
>>> received by Tomcat, and passed by Tomcat to the corresponding application
>>>
>>> Then according to the log, an exception occurs *within the application
>>> classes* (not inside Tomcat code).  Scrolling down the log, it appears
>>> that
>>> :
>>>
>>>> Caused by: ClientAbortException: java.io.IOException
>>>
>>> That would mean that when the webapp within the Tomcat application is
>>> trying
>>> to send some output to the client, it finds the client socket closed (by
>>> the
>>> client).
>>>
>>> This could happen for instance if the client "got tired of waiting" so to
>>> speak, hit some internal timeout waiting for a server response, and
>>> decided
>>> to give up and close the socket on which it was waiting for a response.
>>> Then some time later the application tries to send some response, but
>>> hits
>>> the closed socket and throws the exception.
>>>
>>> That would explain (3) in your list, because from the point of view of
>>> Tomcat then, this application invocation is finished (with an error, but
>>> finished).
>>>
>>> Now why the application appears in (4) to still be waiting, I can only
>>> then
>>> attribute to some logic error in the application : it gave up waiting and
>>> closed the socket, but still says that it is waiting ?
>>>
>>>
>>> In any case, in a standard configuration, I do not think that Tomcat
>>> would
>>> kill an application by its own decision, even if this application took 30
>>> minutes to return a response.
>>>
>>> Maybe a few thread dumps a couple of seconds apart, when Tomcat has
>>> received
>>> the SOAP request, may explain what the responding thread is doing that
>>> takes
>>> so long before sending a response.  There is information in the Tomcat
>>> FAQ
>>> about how to do that.
>>>
>>> I would also inspect the client code to check if there is a timeout for a
>>> server response, and what it does exactly when that timeout is reached.
>>> A standard browser would have a timeout of approx. 5 minutes, and then it
>>> would display a page saying "the server is taking too long to respond.."
>>> etc..
>>> Biut with other code, it depends on the code.
>>>
>>>
>>> Alec Swan wrote:
>>>>
>>>> Hello,
>>>>
>>>> We are experiencing a problem on our production servers causing our
>>>> HTTP clients to hang. I apologize for cross-posting with Metro forum
>>>> but this is a critical problem for us.
>>>>
>>>> It looks like client SOAP HTTP requests get dropped/terminated by
>>>> Tomcat or metro without notifying the client that request was
>>>> terminated leaving the client waiting indefinitely for response.
>>>>
>>>> Here is what we observed:
>>>> 1. Client makes a SOAP call to the server
>>>> 2. Tomcat receives HTTP request and we can see it listed on Tomcat
>>>> Manager
>>>> page
>>>> 3. After about 5 minutes HTTP request disappears from the list of
>>>> active HTTP request
>>>> 4. The client is still waiting for response
>>>>
>>>> We also noticed the following exception logged in catalina.out. It is
>>>> thrown around the same time when the problem above occurs but we are
>>>> not sure if there is a direct correlation between them.
>>>>
>>>> Aug 1, 2011 12:07:44 AM
>>>> com.sun.xml.ws.transport.http.servlet.WSServletDelegate doGet
>>>>
>>>> SEVERE: caught throwable
>>>>
>>>> javax.xml.ws.WebServiceException: com.ctc.wstx.exc.WstxIOException: null
>>>>
>>>>       at
>>>> com.sun.xml.ws.encoding.StreamSOAPCodec.encode(StreamSOAPCodec.java:112)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.encoding.SOAPBindingCodec.encode(SOAPBindingCodec.java:278)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.HttpAdapter.encodePacket(HttpAdapter.java:383)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.HttpAdapter.access$100(HttpAdapter.java:93)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.HttpAdapter$HttpToolkit.handle(HttpAdapter.java:529)
>>>>
>>>>       at
>>>> com.sun.xml.ws.transport.http.HttpAdapter.handle(HttpAdapter.java:288)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.servlet.ServletAdapter.handle(ServletAdapter.java:143)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doGet(WSServletDelegate.java:155)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doPost(WSServletDelegate.java:189)
>>>>
>>>>       at
>>>>
>>>> com.sun.xml.ws.transport.http.servlet.WSServlet.doPost(WSServlet.java:76)
>>>>
>>>>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
>>>>
>>>>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>>>>
>>>>       at
>>>>
>>>> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:864)
>>>>
>>>>       at
>>>>
>>>> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
>>>>
>>>>       at
>>>> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1665)
>>>>
>>>>       at java.lang.Thread.run(Unknown Source)
>>>>
>>>> Caused by: com.ctc.wstx.exc.WstxIOException: null
>>>>
>>>>       at
>>>>
>>>> com.ctc.wstx.sw.BaseStreamWriter.finishDocument(BaseStreamWriter.java:1687)
>>>>
>>>>       at
>>>>
>>>> com.ctc.wstx.sw.BaseStreamWriter.writeEndDocument(BaseStreamWriter.java:585)
>>>>
>>>>       at
>>>> com.sun.xml.ws.message.saaj.SAAJMessage.writeTo(SAAJMessage.java:368)
>>>>
>>>>       at
>>>> com.sun.xml.ws.encoding.StreamSOAPCodec.encode(StreamSOAPCodec.java:109)
>>>>
>>>>       ... 24 more
>>>>
>>>> Caused by: ClientAbortException: java.io.IOException
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:319)
>>>>
>>>>       at
>>>> org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:288)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
>>>>
>>>>       at com.ctc.wstx.io.UTF8Writer.flush(UTF8Writer.java:99)
>>>>
>>>>       at
>>>> com.ctc.wstx.sw.BufferingXmlWriter.flush(BufferingXmlWriter.java:214)
>>>>
>>>>       at
>>>> com.ctc.wstx.sw.BufferingXmlWriter.close(BufferingXmlWriter.java:194)
>>>>
>>>>       at
>>>>
>>>> com.ctc.wstx.sw.BaseStreamWriter.finishDocument(BaseStreamWriter.java:1685)
>>>>
>>>>       ... 27 more
>>>>
>>>> Caused by: java.io.IOException
>>>>
>>>>       at
>>>>
>>>> org.apache.coyote.http11.InternalAprOutputBuffer.flushBuffer(InternalAprOutputBuffer.java:716)
>>>>
>>>>       at
>>>>
>>>> org.apache.coyote.http11.InternalAprOutputBuffer.flush(InternalAprOutputBuffer.java:304)
>>>>
>>>>       at
>>>>
>>>> org.apache.coyote.http11.Http11AprProcessor.action(Http11AprProcessor.java:1021)
>>>>
>>>>       at org.apache.coyote.Response.action(Response.java:183)
>>>>
>>>>       at
>>>>
>>>> org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:314)
>>>>
>>>>       ... 33 more
>>>>
>>>>
>>>> Any help will be greatly appreciated.
>>>>
>>>> Thanks!
>>>>
>>>> Alec
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
>>>> For additional commands, e-mail: users-h...@tomcat.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
>>> For additional commands, e-mail: users-h...@tomcat.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: users-h...@tomcat.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to