On 03/07/2019 10:59, Sanford Liu wrote:
> Hi Team,
> My team are facing a no responding issue in the below circumstances:
> 
> 1. Env:
>  ApacheTomcat:8.5.15, JDK: 1.8.0_121

That Tomcat version is more than 2 years old.

> 2. Tomcat configuration:
>  enable APR: protocol="org.apache.coyote.http11.Http11AprProtocol"
>  set maxThreads of the Executor: maxThreads="1200"

Which version of Tomcat Native are you using.

> 3. This web server was under a massive load.
>  All requests were HTTP 1.1 requests and were marked with a "Connection:
> close" HTTP header.
>  At this point, web server showed some latency for the responses, but it is
> still running.
> 
> 4. Some "keep-alive" requests were coming.
>  Those requests were marked with a "Connection: keep-alive" HTTP header.
> 
> 5. The following non-keep-alive requests was not responding for a long time.
>  We run the thread dump with jstack at this time, we saw Acceptor thread is
> WAITING:

<snip/>

>  But The Poller thread is RUNNING:
>  "http-apr-8080-Poller@5809" daemon prio=5 tid=0x25 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>  at org.apache.tomcat.jni.Poll.poll(Poll.java:-1)

<snip/>

>  All the Executor threads are WAITING like this:
>  "http-apr-8080-exec-21@6078" daemon prio=5 tid=0x3f nid=NA waiting
>   java.lang.Thread.State: WAITING
>  at sun.misc.Unsafe.park(Unsafe.java:-1)
>  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

<snip/>

> We dived in the source code, found some clues:
> 
> 1. Acceptor thread is parking by the LimitLatch. It means LimitLatch is not
> released by other occupations, which also means the previous connections do
> not perform the close operation in our circumstances.
> (AprEndpoint.java#L955)

There have been some bugs fixed in this area since 8.5.15.

> 2. The close logic is take place in the Poller thread.
> (AprEndpoint.java#L1624)
> 
> 3. If the polling logic takes lot of time, the Poller thread will be
> blocked(although it is still running, it could be blocked by the native
> method), and the
> destroySocket method will be suspended. (AprEndpoint.java#L1680)
> 
> 4. Because the Acceptor processes the new connection directly(not registers
> to the poller, AprEndpoint.java#L2268). So the pollerSpace[i] always equals
> to actualPollerSize at this case(AprEndpoint.java#L1679), the
> "nextPollerTime" will be increased so large. But when some "keep-alive"
> requests arrive, the Handler implementation will process those connections
> and register each back to the poller (AbstractProtocol.java#L933), so the
> pollerSpace is changed, and the Poller will use a large value of
> "nextPollerTime" to poll the socket, so the Poller thread would blocked in
> a long time.
> 
> To prove that, we setup a similar environment to reproduce this issue:
> 
> 1. Use official tomcat docker image(tomcat:8.5.15) to run a simple http
> server

Note: There is no ASF provided official docker image. If 8.5.15 is the
latest version provided by Docker I'd strongly recommend that you use a
more up to date version of Tomcat.

> 2. Change the config:
>     <Connector
>     port="8080"
>     protocol="org.apache.coyote.http11.Http11AprProtocol"
>     connectionTimeout="20000"
>     redirectPort="8443"
>     maxConnections="20" /> // make testing easy to reached max connections
> 
> 3. Keep sending lots of "non-keep-alive" requests in 5 min
>  $ wrk2 -t8 -c32 -d10m -R128 -s ./closed.lua
> http://127.0.0.1:8080/hello?latency=10
> 
> 4. Send a single "keep-alive" request and do not close this connection on
> client side
> 
> 5. After that, send another "non-keep-alive" request. We can see no
> response returned in a reasonable time(waiting in 30 sec).
> 
> A workaround:
> By set deferAccept="false" for the connector configuration, we can force
> Acceptor to register the new connection to the
> poller(AprEndpoint.java#L2250), and "nextPollerTime" will not lose control.
> 
> So, is that a real issue for Tomcat?

It does look like there is an edge case here that isn't handled correctly.

The use of multiple pollers stems from a work-around for Windows
platforms that are now obsolete. I thought we discussed removing that
code. I need to find that discussion and remind myself of the conclusion.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to