On 03/07/2019 15:52, Sanford Liu wrote: > Hi Mark, > > I have updated the Tomcat's version to 9.0.21(Docker image tag is > tomcat:9.0.21-jdk8. Sorry for my word 'official', it is actually built by > Docker). > The Tomcat Native's version is 1.2.21. It is built from > the tomcat-native.tar.gz, which is provided in the tomcat 9.0.21 > distribution. > > I have tested again with the same configuration and the same steps. The > results is the same as I mentioned above. After a single "keep-alive" > request was sent, a next following "non-keep-alive" request didn't receive > the response in a long time(TCP connection was established). > > I read the source code of 9.0.21 roughly. It seems that the mechanism is > not changed. The Poller still has a chance to use a large number of > "nextPollerTime" to poll events.
Agreed. I've applied to fix to 9.0.x, 8.5.x and 7.0.x that will be in the next release of each. Mark > > > Best Regards, > > Chang Liu > > > > Mark Thomas <ma...@apache.org> 于2019年7月3日周三 下午7:39写道: > >> On 03/07/2019 10:59, Sanford Liu wrote: >>> Hi Team, >>> My team are facing a no responding issue in the below circumstances: >>> >>> 1. Env: >>> ApacheTomcat:8.5.15, JDK: 1.8.0_121 >> >> That Tomcat version is more than 2 years old. >> >>> 2. Tomcat configuration: >>> enable APR: protocol="org.apache.coyote.http11.Http11AprProtocol" >>> set maxThreads of the Executor: maxThreads="1200" >> >> Which version of Tomcat Native are you using. >> >>> 3. This web server was under a massive load. >>> All requests were HTTP 1.1 requests and were marked with a "Connection: >>> close" HTTP header. >>> At this point, web server showed some latency for the responses, but it >> is >>> still running. >>> >>> 4. Some "keep-alive" requests were coming. >>> Those requests were marked with a "Connection: keep-alive" HTTP header. >>> >>> 5. The following non-keep-alive requests was not responding for a long >> time. >>> We run the thread dump with jstack at this time, we saw Acceptor thread >> is >>> WAITING: >> >> <snip/> >> >>> But The Poller thread is RUNNING: >>> "http-apr-8080-Poller@5809" daemon prio=5 tid=0x25 nid=NA runnable >>> java.lang.Thread.State: RUNNABLE >>> at org.apache.tomcat.jni.Poll.poll(Poll.java:-1) >> >> <snip/> >> >>> All the Executor threads are WAITING like this: >>> "http-apr-8080-exec-21@6078" daemon prio=5 tid=0x3f nid=NA waiting >>> java.lang.Thread.State: WAITING >>> at sun.misc.Unsafe.park(Unsafe.java:-1) >>> at >> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) >> >> <snip/> >> >>> We dived in the source code, found some clues: >>> >>> 1. Acceptor thread is parking by the LimitLatch. It means LimitLatch is >> not >>> released by other occupations, which also means the previous connections >> do >>> not perform the close operation in our circumstances. >>> (AprEndpoint.java#L955) >> >> There have been some bugs fixed in this area since 8.5.15. >> >>> 2. The close logic is take place in the Poller thread. >>> (AprEndpoint.java#L1624) >>> >>> 3. If the polling logic takes lot of time, the Poller thread will be >>> blocked(although it is still running, it could be blocked by the native >>> method), and the >>> destroySocket method will be suspended. (AprEndpoint.java#L1680) >>> >>> 4. Because the Acceptor processes the new connection directly(not >> registers >>> to the poller, AprEndpoint.java#L2268). So the pollerSpace[i] always >> equals >>> to actualPollerSize at this case(AprEndpoint.java#L1679), the >>> "nextPollerTime" will be increased so large. But when some "keep-alive" >>> requests arrive, the Handler implementation will process those >> connections >>> and register each back to the poller (AbstractProtocol.java#L933), so the >>> pollerSpace is changed, and the Poller will use a large value of >>> "nextPollerTime" to poll the socket, so the Poller thread would blocked >> in >>> a long time. >>> >>> To prove that, we setup a similar environment to reproduce this issue: >>> >>> 1. Use official tomcat docker image(tomcat:8.5.15) to run a simple http >>> server >> >> Note: There is no ASF provided official docker image. If 8.5.15 is the >> latest version provided by Docker I'd strongly recommend that you use a >> more up to date version of Tomcat. >> >>> 2. Change the config: >>> <Connector >>> port="8080" >>> protocol="org.apache.coyote.http11.Http11AprProtocol" >>> connectionTimeout="20000" >>> redirectPort="8443" >>> maxConnections="20" /> // make testing easy to reached max >> connections >>> >>> 3. Keep sending lots of "non-keep-alive" requests in 5 min >>> $ wrk2 -t8 -c32 -d10m -R128 -s ./closed.lua >>> http://127.0.0.1:8080/hello?latency=10 >>> >>> 4. Send a single "keep-alive" request and do not close this connection on >>> client side >>> >>> 5. After that, send another "non-keep-alive" request. We can see no >>> response returned in a reasonable time(waiting in 30 sec). >>> >>> A workaround: >>> By set deferAccept="false" for the connector configuration, we can force >>> Acceptor to register the new connection to the >>> poller(AprEndpoint.java#L2250), and "nextPollerTime" will not lose >> control. >>> >>> So, is that a real issue for Tomcat? >> >> It does look like there is an edge case here that isn't handled correctly. >> >> The use of multiple pollers stems from a work-around for Windows >> platforms that are now obsolete. I thought we discussed removing that >> code. I need to find that discussion and remind myself of the conclusion. >> >> Mark >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org