Hi Eric (and those following along),

Eric sent me some network captures off-list from multiple points in the
network from the system where this is happening. The following is the
summary so far:

Overview:
A small number of requests are receiving a completely empty (no headers,
no body) response.


Information Gathering:
Successful requests that are similar to the failed request:
- Take 7ms to 13ms to complete
- Have relatively small responses (~1k)
- Use HTTP/1.0
- Do not use keep-alive (request has Connection: close header)
- The request target is a JSP

The failed request:
- Completes in ~6ms
- Has no response headers or body
- Records a successful entry in the access log

System:
Tomcat 7.0.76, BIO HTTP connector
Java 1.8.0_221,
CentOS 7.5 server running in Azure

Requests are received from an nginx reverse proxy. It looks like nginx
is using HTTP/1.0 without keep-alive to proxy requests to Tomcat. This
actually makes things a little easier as we have one TCP/IP connection
per request.

After switching to NIO, the issue is still observable (info received
off-list along with access to network traces etc.).

The issue appeared ~1 months ago after running without error since
October 2018. No known changes were made ~1 month ago.

The TCP sequence numbers show that, as far as the network stack is
concerned, Tomcat did not write any data before closing the connection
cleanly.

There is no other activity on the client port associated with the failed
request in the provided trace.

The failed request does not appear to violate the HTTP specification.

Neither the request nor the response are compressed.

No WebSocket or other HTTP upgrade requests present in the network traces.

No obviously relevant bugs fixed since 7.0.76.


Follow-up questions:
- JVM
  - Vendor?
  - OS package or direct from Vendor?

- Tomcat
  - OS package, 3rd-party package or direct from ASF?

- Config
  - Any changes at all around the time the problems started? I'm
    thinking OS updates, VM restarted etc?


Conclusions:
- It isn't timeout related. The request is completing in orders of
  magnitude less time than the timeout.

- Based on the timings it looks like the JSP is processing the request
  and generating the response.

- It happens with BIO so sendfile isn't a factor.

- No compression so GZIP issues aren't a factor.

- Given that the issue occurs with both BIO and NIO that rules out a bug
  in any BIO or NIO specific code. Note that while 7.0.x has largely
  separate code for BIO and NIO there are still significant sections of
  code that are essentially identical so it isn't quite as simple as
  just ruling out all the code in the BIO and NIO specific classes.
  It also makes a JVM issue seem less likely at this point.


Current thinking:
- I can think of various ways this might be happening but they all seem
  pretty unlikely. The next steps will be to enable existing logging
  (and potentially add some custom logging) to try and narrow down where
  the response data is disappearing.

- Having reviewed the BIO code, there is a mercifully simple way to see
  how many bytes Tomcat thinks it has written to the response. The "%b"
  pattern in the access log valve will show how many bytes from the
  request body that Tomcat has written to the network socket without an
  IOException. I'd prefer something that recorded header bytes as well
  but this is a good first start and doesn't require custom patches.


Recommendations:
- Switch back to the BIO connector if you haven't already. It has fewer
  moving parts than NIO so it is simpler debug.
- Add "%b" to the access log pattern for Tomcat's access log valve to
  record the number of body (excluding headers) bytes Tomcat believes it
  has written to the response.


Next steps:
- Wait for the issue to re-occur after the recommended changes above and
  depending on what is recorded in the access log for %b for a failed
  request, shift the focus accordingly.
- Answers to the additional questions would be nice but the access log
  %b value for a failed request is the key piece of information required
  at this point.


HTH,

Mark

(I'm trimming the previous discussion on this thread as I think all the
relevant info is summarized above but do point out if I have missed
something.)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to