> -----Original Message-----
> From: Eric Robinson <eric.robin...@psmnv.com>
> Sent: Friday, October 23, 2020 7:09 PM
> To: Tomcat Users List <users@tomcat.apache.org>
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> Hi Mark --
>
> Thanks tons for digging into this. See my answers below.
>
> > -----Original Message-----
> > From: Mark Thomas <ma...@apache.org>
> > Sent: Friday, October 23, 2020 5:09 PM
> > To: users@tomcat.apache.org
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > Hi Eric (and those following along),
> >
> > Eric sent me some network captures off-list from multiple points in
> > the network from the system where this is happening. The following is
> > the summary so far:
> >
> > Overview:
> > A small number of requests are receiving a completely empty (no
> > headers, no body) response.
> >
> >
> > Information Gathering:
> > Successful requests that are similar to the failed request:
> > - Take 7ms to 13ms to complete
> > - Have relatively small responses (~1k)
> > - Use HTTP/1.0
> > - Do not use keep-alive (request has Connection: close header)
> > - The request target is a JSP
> >
> > The failed request:
> > - Completes in ~6ms
>
> I think we've seen the failed requests take as much as 50ms.
>
> > - Has no response headers or body
> > - Records a successful entry in the access log
> >
> > System:
> > Tomcat 7.0.76, BIO HTTP connector
> > Java 1.8.0_221,
> > CentOS 7.5 server running in Azure
> >
> > Requests are received from an nginx reverse proxy. It looks like nginx
> > is using
> > HTTP/1.0 without keep-alive to proxy requests to Tomcat. This actually
> > makes things a little easier as we have one TCP/IP connection per request.
> >
> > After switching to NIO, the issue is still observable (info received
> > off-list along with access to network traces etc.).
> >
> > The issue appeared ~1 months ago after running without error since
> > October 2018. No known changes were made ~1 month ago.
> >
> > The TCP sequence numbers show that, as far as the network stack is
> > concerned, Tomcat did not write any data before closing the connection
> > cleanly.
> >
> > There is no other activity on the client port associated with the
> > failed request in the provided trace.
> >
> > The failed request does not appear to violate the HTTP specification.
> >
> > Neither the request nor the response are compressed.
> >
> > No WebSocket or other HTTP upgrade requests present in the network
> > traces.
> >
> > No obviously relevant bugs fixed since 7.0.76.
> >
> >
> > Follow-up questions:
> > - JVM
> >   - Vendor?
> >   - OS package or direct from Vendor?
> >
>
> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
>
>
> > - Tomcat
> >   - OS package, 3rd-party package or direct from ASF?
> >
>
> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
>

Drat, slight correction. I now recall that although we initially installed 
7.0.76 from the CentOS repo, the application vendor made us lower the version 
to 7.0.72, and I DO NOT know where we got that. However, it has not changed 
since October-ish of 2018.

> > - Config
> >   - Any changes at all around the time the problems started? I'm
> >     thinking OS updates, VM restarted etc?
> >
>
> server.xml has not changed since 4/20/20, which was well before the
> problem manifested, and all the other files in the conf folder are even older
> than that. We're seeing this symptom on both production servers. One of
> them was rebooted a week ago, but the other has been up continuously for
> 258 days.
>
> >
> > Conclusions:
> > - It isn't timeout related. The request is completing in orders of
> >   magnitude less time than the timeout.
> >
> > - Based on the timings it looks like the JSP is processing the request
> >   and generating the response.
> >
> > - It happens with BIO so sendfile isn't a factor.
> >
> > - No compression so GZIP issues aren't a factor.
> >
> > - Given that the issue occurs with both BIO and NIO that rules out a bug
> >   in any BIO or NIO specific code. Note that while 7.0.x has largely
> >   separate code for BIO and NIO there are still significant sections of
> >   code that are essentially identical so it isn't quite as simple as
> >   just ruling out all the code in the BIO and NIO specific classes.
> >   It also makes a JVM issue seem less likely at this point.
> >
> >
> > Current thinking:
> > - I can think of various ways this might be happening but they all seem
> >   pretty unlikely. The next steps will be to enable existing logging
> >   (and potentially add some custom logging) to try and narrow down where
> >   the response data is disappearing.
> >
> > - Having reviewed the BIO code, there is a mercifully simple way to see
> >   how many bytes Tomcat thinks it has written to the response. The "%b"
> >   pattern in the access log valve will show how many bytes from the
> >   request body that Tomcat has written to the network socket without an
> >   IOException. I'd prefer something that recorded header bytes as well
> >   but this is a good first start and doesn't require custom patches.
> >
> >
> > Recommendations:
> > - Switch back to the BIO connector if you haven't already. It has fewer
> >   moving parts than NIO so it is simpler debug.
> > - Add "%b" to the access log pattern for Tomcat's access log valve to
> >   record the number of body (excluding headers) bytes Tomcat believes it
> >   has written to the response.
> >
> >
> > Next steps:
> > - Wait for the issue to re-occur after the recommended changes above and
> >   depending on what is recorded in the access log for %b for a failed
> >   request, shift the focus accordingly.
> > - Answers to the additional questions would be nice but the access log
> >   %b value for a failed request is the key piece of information required
> >   at this point.
> >
>
> Good news! I enabled that parameter a few days ago and we have already
> caught some instances of the problem occurring. Here is the logging format...
>
> <Valve className="org.apache.catalina.valves.AccessLogValve"
> directory="logs"
>                prefix="localhost_access." suffix=".log" pattern="%h %l %D %u 
> %t
> %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />
>
> Due to some sensitive content in the HTTP requests below, I have globally
> replaced certain words and addresses with random-ish strings, but I don't
> think I've done anything to impact the issue.
>
> Following is an example from Wednesday.
>
> This is a request being sent from the nginx proxy to the first of 2 upstream
> servers, 10.51.14.46
>
> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream prematurely
> closed connection while reading response header from upstream, client:
> 99.88.77.66, server: redacted.domain.com, request: "GET
> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=20
> 20-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656", host:
> "redacted.domain.com"
>
> Here is the matching localhost_access log entry from that server….
>
> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
> F405E25E49E3DCB81A36A87DED1FE573
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656
>
> Tomcat appears to think it sent 40423 bytes. However, even though it shows
> an HTTP 200 response, WireShark shows the response did not actually go out
> on the wire.
>
> The proxy never got a response, so it sent the request to the next upstream,
> 10.51.14.57.
>
> Here is the matching localhost_access log entry from that server…
>
> 10.51.14.133 - 1 - [21/Oct/2020:15:51:22 -0400]
> F405E25E49E3DCB81A36A87DED1FE573
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 663 -
> ?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656
>
> Tomcat on that server did not recognize the JSESSIONID cookie, so it sent
> back a 663 byte message stating, "The session is invalid."
>
> The client was then kicked out of the software.
>
> >
> > HTH,
> >
> > Mark
> >
> > (I'm trimming the previous discussion on this thread as I think all
> > the relevant info is summarized above but do point out if I have
> > missed
> > something.)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
>
> Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

Reply via email to