On 03/07/2019 15:52, Sanford Liu wrote:
> Hi Mark,
> 
> I have updated the Tomcat's version to 9.0.21(Docker image tag is
> tomcat:9.0.21-jdk8. Sorry for my word 'official', it is actually built by
> Docker).
> The Tomcat Native's version is 1.2.21. It is built from
> the tomcat-native.tar.gz, which is provided in the tomcat 9.0.21
> distribution.
> 
> I have tested again with the same configuration and the same steps. The
> results is the same as I mentioned above. After a single "keep-alive"
> request was sent, a next following "non-keep-alive" request didn't receive
> the response in a long time(TCP connection was established).
> 
> I read the source code of 9.0.21 roughly. It seems that the mechanism is
> not changed. The Poller still has a chance to use a large number of
> "nextPollerTime" to poll events.

Agreed. I've applied to fix to 9.0.x, 8.5.x and 7.0.x that will be in
the next release of each.

Mark


> 
> 
> Best Regards,
> 
> Chang Liu
> 
> 
> 
> Mark Thomas <ma...@apache.org> 于2019年7月3日周三 下午7:39写道:
> 
>> On 03/07/2019 10:59, Sanford Liu wrote:
>>> Hi Team,
>>> My team are facing a no responding issue in the below circumstances:
>>>
>>> 1. Env:
>>>  ApacheTomcat:8.5.15, JDK: 1.8.0_121
>>
>> That Tomcat version is more than 2 years old.
>>
>>> 2. Tomcat configuration:
>>>  enable APR: protocol="org.apache.coyote.http11.Http11AprProtocol"
>>>  set maxThreads of the Executor: maxThreads="1200"
>>
>> Which version of Tomcat Native are you using.
>>
>>> 3. This web server was under a massive load.
>>>  All requests were HTTP 1.1 requests and were marked with a "Connection:
>>> close" HTTP header.
>>>  At this point, web server showed some latency for the responses, but it
>> is
>>> still running.
>>>
>>> 4. Some "keep-alive" requests were coming.
>>>  Those requests were marked with a "Connection: keep-alive" HTTP header.
>>>
>>> 5. The following non-keep-alive requests was not responding for a long
>> time.
>>>  We run the thread dump with jstack at this time, we saw Acceptor thread
>> is
>>> WAITING:
>>
>> <snip/>
>>
>>>  But The Poller thread is RUNNING:
>>>  "http-apr-8080-Poller@5809" daemon prio=5 tid=0x25 nid=NA runnable
>>>   java.lang.Thread.State: RUNNABLE
>>>  at org.apache.tomcat.jni.Poll.poll(Poll.java:-1)
>>
>> <snip/>
>>
>>>  All the Executor threads are WAITING like this:
>>>  "http-apr-8080-exec-21@6078" daemon prio=5 tid=0x3f nid=NA waiting
>>>   java.lang.Thread.State: WAITING
>>>  at sun.misc.Unsafe.park(Unsafe.java:-1)
>>>  at
>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>
>> <snip/>
>>
>>> We dived in the source code, found some clues:
>>>
>>> 1. Acceptor thread is parking by the LimitLatch. It means LimitLatch is
>> not
>>> released by other occupations, which also means the previous connections
>> do
>>> not perform the close operation in our circumstances.
>>> (AprEndpoint.java#L955)
>>
>> There have been some bugs fixed in this area since 8.5.15.
>>
>>> 2. The close logic is take place in the Poller thread.
>>> (AprEndpoint.java#L1624)
>>>
>>> 3. If the polling logic takes lot of time, the Poller thread will be
>>> blocked(although it is still running, it could be blocked by the native
>>> method), and the
>>> destroySocket method will be suspended. (AprEndpoint.java#L1680)
>>>
>>> 4. Because the Acceptor processes the new connection directly(not
>> registers
>>> to the poller, AprEndpoint.java#L2268). So the pollerSpace[i] always
>> equals
>>> to actualPollerSize at this case(AprEndpoint.java#L1679), the
>>> "nextPollerTime" will be increased so large. But when some "keep-alive"
>>> requests arrive, the Handler implementation will process those
>> connections
>>> and register each back to the poller (AbstractProtocol.java#L933), so the
>>> pollerSpace is changed, and the Poller will use a large value of
>>> "nextPollerTime" to poll the socket, so the Poller thread would blocked
>> in
>>> a long time.
>>>
>>> To prove that, we setup a similar environment to reproduce this issue:
>>>
>>> 1. Use official tomcat docker image(tomcat:8.5.15) to run a simple http
>>> server
>>
>> Note: There is no ASF provided official docker image. If 8.5.15 is the
>> latest version provided by Docker I'd strongly recommend that you use a
>> more up to date version of Tomcat.
>>
>>> 2. Change the config:
>>>     <Connector
>>>     port="8080"
>>>     protocol="org.apache.coyote.http11.Http11AprProtocol"
>>>     connectionTimeout="20000"
>>>     redirectPort="8443"
>>>     maxConnections="20" /> // make testing easy to reached max
>> connections
>>>
>>> 3. Keep sending lots of "non-keep-alive" requests in 5 min
>>>  $ wrk2 -t8 -c32 -d10m -R128 -s ./closed.lua
>>> http://127.0.0.1:8080/hello?latency=10
>>>
>>> 4. Send a single "keep-alive" request and do not close this connection on
>>> client side
>>>
>>> 5. After that, send another "non-keep-alive" request. We can see no
>>> response returned in a reasonable time(waiting in 30 sec).
>>>
>>> A workaround:
>>> By set deferAccept="false" for the connector configuration, we can force
>>> Acceptor to register the new connection to the
>>> poller(AprEndpoint.java#L2250), and "nextPollerTime" will not lose
>> control.
>>>
>>> So, is that a real issue for Tomcat?
>>
>> It does look like there is an edge case here that isn't handled correctly.
>>
>> The use of multiple pollers stems from a work-around for Windows
>> platforms that are now obsolete. I thought we discussed removing that
>> code. I need to find that discussion and remind myself of the conclusion.
>>
>> Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: users-h...@tomcat.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to