On Mon, Sep 21, 2020 at 1:36 PM Martin Grigorov <mgrigo...@apache.org> wrote:
> On Mon, Sep 21, 2020 at 12:08 PM Martin Grigorov <mgrigo...@apache.org> > wrote: > > > > > > > On Mon, Sep 21, 2020 at 11:23 AM Mark Thomas <ma...@apache.org> wrote: > > > >> On 21/09/2020 08:18, Martin Grigorov wrote: > >> > On Fri, Sep 18, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote: > >> > > >> >> On 18/09/2020 14:07, Martin Grigorov wrote: > >> >> > >> >> <snip/> > >> >> > >> >>> What is the difference > >> >>> between org.apache.coyote.http2.StreamStateMachine.State#CLOSED_RX > >> >>> and org.apache.coyote.http2.StreamStateMachine.State#CLOSED_TX ? > >> >> > >> >> Compare the parameters used to construct the enums. > >> >> > >> >>> I read some parts of https://tools.ietf.org/html/rfc7540 but I > don't > >> see > >> >>> anything related to two types of CLOSED state. > >> >> > >> >> Section 5.1. Definition of the closed state (page 20) explains the > >> >> difference between the two. > >> >> > >> > > >> > Still I do not understand what RX and TX stand for. But this is not > >> > important. > >> > >> TX - transmit > >> > >> RX - receive > >> > > > > Thanks! > > > > > >> > >> > >> > The following patch fixes the issue for me/Vegeta: > >> > > >> > @@ -1570,12 +1571,15 @@ class Http2UpgradeHandler extends > AbstractStream > >> > implements InternalHttpUpgradeH > >> > > >> > @Override > >> > public void reset(int streamId, long errorCode) throws > >> Http2Exception > >> > { > >> > - Stream stream = getStream(streamId, true); > >> > - boolean active = stream.isActive(); > >> > - stream.checkState(FrameType.RST); > >> > - stream.receiveReset(errorCode); > >> > - if (active) { > >> > - activeRemoteStreamCount.decrementAndGet(); > >> > + boolean unknownIsError = Http2Error.CANCEL.getCode() != > >> errorCode; > >> > + Stream stream = getStream(streamId, unknownIsError); > >> > + if (stream != null) { > >> > + boolean active = stream.isActive(); > >> > + stream.checkState(FrameType.RST); > >> > + stream.receiveReset(errorCode); > >> > + if (active) { > >> > + activeRemoteStreamCount.decrementAndGet(); > >> > + } > >> > } > >> > } > >> > > >> > I.e. do not throw an error if the remote peer is trying to cancel an > >> > already closed stream. > >> > >> RFC 7540 allows the connection to be closed with a protocol error if the > >> reset is received "a significant time after sending END_STREAM". > >> > >> Tomcat retains state for maxConcurrentStreams + 10% with closed streams > >> being pruned (oldest first) every 10 new streams. > >> > > > > The spec talks about time but Tomcat uses the number of streams. > > I understand that under normal usage this could be enough "time" but > under > > heavy load this number is reached quickly and the time is short. > > > > I've tried with these values for > > maxConcurrentStreams/maxConcurrentStreamExecution: 8 (number of CPU > cores), > > 16, 100 (the default), 1024 and 8196. The throughput is around 10K only > > until 100. After that it drops (as you said earlier). > > > > > >> > >> It isn't clear at this point why the client is sending the RST_STREAM. > >> > > > > I don't know either. But Vegeta is using the Golang standard libraries, > so > > any Golang application should behave the same way. > > Also as I investigated last week - Golang/Node.js/Rust/Netty HTTP2 > servers > > do not complain about it. > > > > > >> This normally indicates an error condition. From the description of the > >> load test I don't see why the client would be sending a RST_STREAM. > >> > >> If the RST_STREAM is valid and it isn't received a significant time > >> after the END_STREAM then Tomcat needs to handle this. > >> > >> If the RST_STREAM is valid but received a significant amount of time > >> after the END_STREAM that would be a client error would be a client > error. > >> > >> Of course, significant isn't defined in the specification. > >> > >> Any special handling for RST_STREAM also needs to be applied to > >> WINDOW_UPDATE. It also needs to differentiate between a frame received > >> for a past closed stream and a frame received for an stream that never > >> existed. > >> > >> I think the key question here is why is the client sending the > >> RST_STREAM in the first place. Is Tomcat doing something it shouldn't / > >> not doing something it should to cause this? If so, we need to fix the > >> root cause rather than tackle the symptom. > >> > >> There are a couple of options for debugging this to see why the client > >> is sending the RST_STREAM: > >> - Enable debug logging for HTTP/2 in Tomcat. This is very verbose and > >> if the root cause is threading related typically changes the timing > >> enough to mask the issue. > >> > > > > Here are the logs of 20 seconds load: > > > > > https://gist.githubusercontent.com/martin-g/37dacc12149d5cecbfd4e16dc2248cfd/raw/76693cd1dd1f37ba126edb84ebd7631802aa91a6/tomcat-http2.log > > > > Actually the above was just the last logs. > The previous logs were in .gz files which I've missed. > > https://gist.githubusercontent.com/martin-g/37dacc12149d5cecbfd4e16dc2248cfd/raw/00596b7dc31d30d275384334ff17c5416f5b363c/tomcat-http2.log > - this file contains the first 7 seconds of the load run. > I have 3 times more information for the last 13 seconds but it is more or > less the same information. > > There are many like this: > > 2020-09-21 14:25:04.850 DEBUG 232086 --- [https-jsse-nio-18080-exec-8] > o.a.coyote.http11.Http11NioProtocol : Found processor [null] for > socket [org.apache.tomcat.util.net.SecureNioChannel@2b435926 > :java.nio.channels.SocketChannel[connected > local=/127.0.0.1:18080 remote=/127.0.0.1:42229]] > 2020-09-21 14:25:04.850 DEBUG 232086 --- [https-jsse-nio-18080-exec-6] > o.a.coyote.http2.Http2UpgradeHandler : Connection [2] > > java.io.IOException: Unable to unwrap data, invalid status [CLOSED] > at org.apache.tomcat.util.net > .SecureNioChannel.read(SecureNioChannel.java:766) > at org.apache.tomcat.util.net > .NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1468) > When I tested HTTP/2, I used h2c (unencrypted) to get a better idea at what's going on. Otherwise you don't really know what proportion of TLS or HTTP/2 impacts performance more [and then you have to test more things like OpenSSL, be careful that the ciphers used are the same and equally well optimized in each impl, etc]. Then once you feel happy about h2c, you can see how things go with TLS. Rémy > at org.apache.tomcat.util.net > .SocketWrapperBase$OperationState.start(SocketWrapperBase.java:1039) > at org.apache.tomcat.util.net > .SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1450) > at org.apache.tomcat.util.net > .SocketWrapperBase.read(SocketWrapperBase.java:1293) > at org.apache.tomcat.util.net > .SocketWrapperBase.read(SocketWrapperBase.java:1265) > at > org.apache.coyote.http2.Http2AsyncParser.readConnectionPreface(Http2AsyncParser.java:55) > at > org.apache.coyote.http2.Http2UpgradeHandler.init(Http2UpgradeHandler.java:238) > at > org.apache.coyote.http2.Http2AsyncUpgradeHandler.init(Http2AsyncUpgradeHandler.java:39) > at > org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:313) > at > org.apache.coyote.http2.Http2AsyncUpgradeHandler.upgradeDispatch(Http2AsyncUpgradeHandler.java:39) > at > org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54) > at > org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59) > at > org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) > at org.apache.tomcat.util.net > .NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590) > at org.apache.tomcat.util.net > .SocketProcessorBase.run(SocketProcessorBase.java:49) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > at java.base/java.lang.Thread.run(Thread.java:832) > > > > > > > > > This is with my patch in #reset(). Without it there are also errors that > > stream XYZ is already closed > > > > > >> - Client side logging (I don't know what the capabilities are here). > >> - JVM level TLS debug logging (even more verbose than Tomcat's HTTP/2 > >> debug and harder to work with) > >> - Wireshark (or similar) > >> > >> I'd recommend Wireshark. I don't think JSSE supports SSLKEYLOGFILE so > >> I'd use the APR/Native connector with SSLKEYLOGFILE to get the cleartext > >> in Wireshark. > >> > >> > With this change and Vegeta's -max-workers=100 I can get 12 K > reqs/sec. > >> > >> That is odd. If the client is sending RST_STREAM messages then I'd > >> expect to see requests failing as RST_STREAM indicates something has > >> gone wrong or the client has aborted the request. > >> > > > > According to https://tools.ietf.org/html/rfc7540#section-5.1 when the > > stream is in HALF_CLOSED_REMOTE state it should send either END_STREAM or > > RST_STREAM to the remote peer. > > In the logs I see such sequence of events: > > > > ... > > o.a.coyote.http2.StreamStateMachine : Connection [124], Stream > > [5,279], State changed from [OPEN] to [HALF_CLOSED_REMOTE] > > o.a.coyote.http2.Http2UpgradeHandler : Connection [124], Stream > > [5,279], Writing the headers > > org.apache.coyote.http2.Stream : Connection [124], Stream > > [5,279], flushing output with buffer at position [12], writeInProgress > > [false] and closed [true] > > org.apache.coyote.http2.AbstractStream : Connection [124], Stream > > [5,279], reduce flow control window by [12] to [4194292] > > o.a.coyote.http2.Http2UpgradeHandler : Connection [124], Stream > > [5,279], Data length [12] > > o.a.coyote.http2.StreamStateMachine : Connection [124], Stream > > [5,279], State changed from [HALF_CLOSED_REMOTE] to [CLOSED_TX] > > org.apache.coyote.http2.Stream : Connection [124], Stream > > [5,279] has been recycled > > > > > > From the logs it is not clear whether any of these flags is being sent to > > the client. I will investigate! > > > > Note: ", Writing the headers" is added by me. The current log entry is > > "upgradeHandler.writeHeaders=Connection [{0}], Stream [{1}]" > > > > > >> > >> > With more workers it starts failing with: > >> > > >> > "status_codes": { > >> > "0": 1000, > >> > "200": 1 > >> > }, > >> > "errors": [ > >> > "Get \"https://localhost:18080/testbed/plaintext\": net/http: > >> request > >> > canceled while waiting for connection (Client.Timeout exceeded while > >> > awaiting headers)", > >> > "Get \"https://localhost:18080/testbed/plaintext\": context > >> deadline > >> > exceeded", > >> > "Get \"https://localhost:18080/testbed/plaintext\": context > >> deadline > >> > exceeded (Client.Timeout exceeded while awaiting headers)" > >> > ] > >> > i.e. out of 1001 requests only one succeeds and the others fail with > >> > timeout. I will try to debug this one too and run a profiler too. > >> > >> I'd probably start by enabling Tomcat's HTTP/2 debug logging for this > >> one. I'm wondering if something is triggering a connection level error > >> early in the processing. > >> > >> Mark > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > >> For additional commands, e-mail: users-h...@tomcat.apache.org > >> > >> >