On 21/09/2020 08:18, Martin Grigorov wrote:
> On Fri, Sep 18, 2020 at 6:16 PM Mark Thomas <[email protected]> wrote:
>
>> On 18/09/2020 14:07, Martin Grigorov wrote:
>>
>> <snip/>
>>
>>> What is the difference
>>> between org.apache.coyote.http2.StreamStateMachine.State#CLOSED_RX
>>> and org.apache.coyote.http2.StreamStateMachine.State#CLOSED_TX ?
>>
>> Compare the parameters used to construct the enums.
>>
>>> I read some parts of https://tools.ietf.org/html/rfc7540 but I don't see
>>> anything related to two types of CLOSED state.
>>
>> Section 5.1. Definition of the closed state (page 20) explains the
>> difference between the two.
>>
>
> Still I do not understand what RX and TX stand for. But this is not
> important.
TX - transmit
RX - receive
> The following patch fixes the issue for me/Vegeta:
>
> @@ -1570,12 +1571,15 @@ class Http2UpgradeHandler extends AbstractStream
> implements InternalHttpUpgradeH
>
> @Override
> public void reset(int streamId, long errorCode) throws Http2Exception
> {
> - Stream stream = getStream(streamId, true);
> - boolean active = stream.isActive();
> - stream.checkState(FrameType.RST);
> - stream.receiveReset(errorCode);
> - if (active) {
> - activeRemoteStreamCount.decrementAndGet();
> + boolean unknownIsError = Http2Error.CANCEL.getCode() != errorCode;
> + Stream stream = getStream(streamId, unknownIsError);
> + if (stream != null) {
> + boolean active = stream.isActive();
> + stream.checkState(FrameType.RST);
> + stream.receiveReset(errorCode);
> + if (active) {
> + activeRemoteStreamCount.decrementAndGet();
> + }
> }
> }
>
> I.e. do not throw an error if the remote peer is trying to cancel an
> already closed stream.
RFC 7540 allows the connection to be closed with a protocol error if the
reset is received "a significant time after sending END_STREAM".
Tomcat retains state for maxConcurrentStreams + 10% with closed streams
being pruned (oldest first) every 10 new streams.
It isn't clear at this point why the client is sending the RST_STREAM.
This normally indicates an error condition. From the description of the
load test I don't see why the client would be sending a RST_STREAM.
If the RST_STREAM is valid and it isn't received a significant time
after the END_STREAM then Tomcat needs to handle this.
If the RST_STREAM is valid but received a significant amount of time
after the END_STREAM that would be a client error would be a client error.
Of course, significant isn't defined in the specification.
Any special handling for RST_STREAM also needs to be applied to
WINDOW_UPDATE. It also needs to differentiate between a frame received
for a past closed stream and a frame received for an stream that never
existed.
I think the key question here is why is the client sending the
RST_STREAM in the first place. Is Tomcat doing something it shouldn't /
not doing something it should to cause this? If so, we need to fix the
root cause rather than tackle the symptom.
There are a couple of options for debugging this to see why the client
is sending the RST_STREAM:
- Enable debug logging for HTTP/2 in Tomcat. This is very verbose and
if the root cause is threading related typically changes the timing
enough to mask the issue.
- Client side logging (I don't know what the capabilities are here).
- JVM level TLS debug logging (even more verbose than Tomcat's HTTP/2
debug and harder to work with)
- Wireshark (or similar)
I'd recommend Wireshark. I don't think JSSE supports SSLKEYLOGFILE so
I'd use the APR/Native connector with SSLKEYLOGFILE to get the cleartext
in Wireshark.
> With this change and Vegeta's -max-workers=100 I can get 12 K reqs/sec.
That is odd. If the client is sending RST_STREAM messages then I'd
expect to see requests failing as RST_STREAM indicates something has
gone wrong or the client has aborted the request.
> With more workers it starts failing with:
>
> "status_codes": {
> "0": 1000,
> "200": 1
> },
> "errors": [
> "Get \"https://localhost:18080/testbed/plaintext\": net/http: request
> canceled while waiting for connection (Client.Timeout exceeded while
> awaiting headers)",
> "Get \"https://localhost:18080/testbed/plaintext\": context deadline
> exceeded",
> "Get \"https://localhost:18080/testbed/plaintext\": context deadline
> exceeded (Client.Timeout exceeded while awaiting headers)"
> ]
> i.e. out of 1001 requests only one succeeds and the others fail with
> timeout. I will try to debug this one too and run a profiler too.
I'd probably start by enabling Tomcat's HTTP/2 debug logging for this
one. I'm wondering if something is triggering a connection level error
early in the processing.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]