On 03/07/2019 10:59, Sanford Liu wrote: > Hi Team, > My team are facing a no responding issue in the below circumstances: > > 1. Env: > ApacheTomcat:8.5.15, JDK: 1.8.0_121
That Tomcat version is more than 2 years old. > 2. Tomcat configuration: > enable APR: protocol="org.apache.coyote.http11.Http11AprProtocol" > set maxThreads of the Executor: maxThreads="1200" Which version of Tomcat Native are you using. > 3. This web server was under a massive load. > All requests were HTTP 1.1 requests and were marked with a "Connection: > close" HTTP header. > At this point, web server showed some latency for the responses, but it is > still running. > > 4. Some "keep-alive" requests were coming. > Those requests were marked with a "Connection: keep-alive" HTTP header. > > 5. The following non-keep-alive requests was not responding for a long time. > We run the thread dump with jstack at this time, we saw Acceptor thread is > WAITING: <snip/> > But The Poller thread is RUNNING: > "http-apr-8080-Poller@5809" daemon prio=5 tid=0x25 nid=NA runnable > java.lang.Thread.State: RUNNABLE > at org.apache.tomcat.jni.Poll.poll(Poll.java:-1) <snip/> > All the Executor threads are WAITING like this: > "http-apr-8080-exec-21@6078" daemon prio=5 tid=0x3f nid=NA waiting > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Unsafe.java:-1) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) <snip/> > We dived in the source code, found some clues: > > 1. Acceptor thread is parking by the LimitLatch. It means LimitLatch is not > released by other occupations, which also means the previous connections do > not perform the close operation in our circumstances. > (AprEndpoint.java#L955) There have been some bugs fixed in this area since 8.5.15. > 2. The close logic is take place in the Poller thread. > (AprEndpoint.java#L1624) > > 3. If the polling logic takes lot of time, the Poller thread will be > blocked(although it is still running, it could be blocked by the native > method), and the > destroySocket method will be suspended. (AprEndpoint.java#L1680) > > 4. Because the Acceptor processes the new connection directly(not registers > to the poller, AprEndpoint.java#L2268). So the pollerSpace[i] always equals > to actualPollerSize at this case(AprEndpoint.java#L1679), the > "nextPollerTime" will be increased so large. But when some "keep-alive" > requests arrive, the Handler implementation will process those connections > and register each back to the poller (AbstractProtocol.java#L933), so the > pollerSpace is changed, and the Poller will use a large value of > "nextPollerTime" to poll the socket, so the Poller thread would blocked in > a long time. > > To prove that, we setup a similar environment to reproduce this issue: > > 1. Use official tomcat docker image(tomcat:8.5.15) to run a simple http > server Note: There is no ASF provided official docker image. If 8.5.15 is the latest version provided by Docker I'd strongly recommend that you use a more up to date version of Tomcat. > 2. Change the config: > <Connector > port="8080" > protocol="org.apache.coyote.http11.Http11AprProtocol" > connectionTimeout="20000" > redirectPort="8443" > maxConnections="20" /> // make testing easy to reached max connections > > 3. Keep sending lots of "non-keep-alive" requests in 5 min > $ wrk2 -t8 -c32 -d10m -R128 -s ./closed.lua > http://127.0.0.1:8080/hello?latency=10 > > 4. Send a single "keep-alive" request and do not close this connection on > client side > > 5. After that, send another "non-keep-alive" request. We can see no > response returned in a reasonable time(waiting in 30 sec). > > A workaround: > By set deferAccept="false" for the connector configuration, we can force > Acceptor to register the new connection to the > poller(AprEndpoint.java#L2250), and "nextPollerTime" will not lose control. > > So, is that a real issue for Tomcat? It does look like there is an edge case here that isn't handled correctly. The use of multiple pollers stems from a work-around for Windows platforms that are now obsolete. I thought we discussed removing that code. I need to find that discussion and remind myself of the conclusion. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org