After some more digging I've been able to further narrow down the problem
somewhat, but still not able to pin point the exact cause;
The issue is not load related, but rather seems to be related to the timing
of the TCP connection being closed.
Depending on the timing the poller and exec appear to get into a loop -
drilled thsi down to an error status returned in AprEndpoint - which might
need to be thrown as an exception rather than return 0 from
the fillReadBuffer in AprEndpoint.
Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups
Poller thread - AprEndpoint:1652 - Poller adds socket to timeout
Poller thread - AprEndpoint:1675 - Poller gets rv = 1
Poller thread - AprEndpoint:1694 - Poller gets connection (socket
Id 565911936 )
Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event
OPEN_READ
Poller thread - AbstractEndpoint:903 - executor executes
AprEndpoint$SocketProcessor (id 63) -> No exception thrown.
Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014
Interesting comment ? at AprEndpoint:2445 :
} else if (-result == Status.APR_EGENERAL && isSecure()) {
//Status.APR_EGENERAL=20014
// Not entirely sure why this is necessary. Testing to date
has not
// identified any issues with this but log it so it can be
tracked
// if it is suspected of causing issues in the future.
if (log.isDebugEnabled()) {
log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(),
this));
}
return 0;
Does this need to throw an exception to get caught higher up as an error?
Exec thread - HJttp11InputBuffer:708 - Reads from buffer, returns nRead=0
Exec thread - AbstractProtocol:789 - process reuqest ; state returns OPEN
Exec thread - AbstractProtocol:2678 - Exec adds socket (id 565911936)
Exec thread - AprEndpoint:1435 - Calls notify() on AprEndpoint$Poller (id
55)
It seems to just bounce between those two operations, the poller adding the
executor for OPEN_READ, but the execute just does a add of the socket for
APR_POLLIN, the poller then adds it back as OPEN_READ again, and exec adds
back to poller for APR_POLLIN.
desc in AprEndpoint contains;
[48, 565911936, 1, 565903728, 0, 0, 0, 0, 0, ....]
desc[0] looks to indicate Poll.APR_POLLHUP
wrapper.pollerFlags = 1
On Wed, Jan 4, 2017 at 5:54 PM, David Oswell <[email protected]> wrote:
> Ok the issue appears to be related to the state the connection handler
> thinks it is in, after the connection is disconnected the state returned
> by org.apache.coyote.http11.Http11Processor.process remains as OPEN,
> which causes the underlying poll thread to get notified and repeats.
>
> Breakpointing on all of the CLOSED states does not get triggered, unless
> at low requests per second, on my server at around 200 requests/second on
> one connection the disconnect doesn't get triggered, but at 1-2 it does.
> Because it doesn't get flagged as disconnect it just constantly gets
> processed and always shows a state of OPEN_READ.
>
> As a side note, this only seems to occur on the servers (have 2 where the
> same behaviour occurs); running it on my local machine the issue doesn't
> appear. So not sure if somehow related to Win2012R2 &APR interaction??? -
> Yes I know the obvious fix is to not use Win2012 :)
>
>
>
> On Wed, Jan 4, 2017 at 12:29 PM, David Oswell <[email protected]> wrote:
>
>> Hi Mark,
>>
>> Thanks for the response - So far I've narrowed down to this;
>> 0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246
>> 1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
>> 2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
>> 3 - ntoskrnl.exe!KeWaitForSingleObject+0x373
>> 4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977
>> 5 - ntoskrnl.exe!_misaligned_access+0x13f9
>> 6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133
>> 7 - ntdll.dll!RtlCopyMemory+0x3
>> 8 - WS2_32.dll!WSAPoll+0xb9
>> 9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10
>> 10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac
>>
>> From what I can see there is one thread which is the main cpu user when
>> idle, it cycles through the above and;
>> jvm.dll!JVM_FindSignal+0x2cba
>> jvm.dll!JVM_ResolveClass+0x371c7
>> jvm.dll!JVM_ResolveClass+0x571d7
>> jvm.dll!JVM_MonitorWait+0xfe
>> jvm.dll!JVM_FindSignal+0x73f
>>
>> Which looks to correspond to https-openssl-apr-8443-poller thread, which
>> I see cycling through RUNNABLE/TIMED_WAITING states (below are from when
>> there are zero requests occuring);
>> Name: https-openssl-apr-8443-Poller
>> State: RUNNABLE
>> Total blocked: 11,222,146 Total waited: 11,222,507
>>
>> Stack trace:
>> org.apache.tomcat.jni.Poll.poll(Native Method)
>> org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpo
>> int.java:1675)
>> java.lang.Thread.run(Thread.java:745)
>>
>> -------------
>>
>> Name: https-openssl-apr-8443-Poller
>> State: RUNNABLE
>> Total blocked: 11,624,292 Total waited: 11,624,681
>>
>> Stack trace:
>> org.apache.tomcat.jni.Poll.add(Native Method)
>> org.apache.tomcat.util.net.AprEndpoint$Poller.addToPoller(
>> AprEndpoint.java:1449)
>> org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpo
>> int.java:1649)
>> java.lang.Thread.run(Thread.java:745)
>>
>> ------------------
>>
>> Name: https-openssl-apr-8443-Poller
>> State: TIMED_WAITING on org.apache.tomcat.util.net.Apr
>> Endpoint$Poller@4f5e5841
>> Total blocked: 12,441,180 Total waited: 12,441,614
>>
>> Stack trace:
>> java.lang.Object.wait(Native Method)
>> org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoin
>> t.java:1573)
>> java.lang.Thread.run(Thread.java:745)
>>
>> ---------------------
>>
>> Name: https-openssl-apr-8443-Poller
>> State: RUNNABLE
>> Total blocked: 13,278,363 Total waited: 13,278,853
>>
>> Stack trace:
>> sun.misc.Unsafe.unpark(Native Method)
>> java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.unpark
>> Successor(AbstractQueuedSynchronizer.java:662)
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.releas
>> e(AbstractQueuedSynchronizer.java:1264)
>> java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLoc
>> k.java:457)
>> java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(Link
>> edBlockingQueue.java:176)
>> java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockin
>> gQueue.java:430)
>> org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:74)
>> org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:31)
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolEx
>> ecutor.java:1361)
>> org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(
>> ThreadPoolExecutor.java:163)
>> org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(
>> ThreadPoolExecutor.java:141)
>> org.apache.tomcat.util.net.AbstractEndpoint.processSocket(Ab
>> stractEndpoint.java:903)
>> org.apache.tomcat.util.net.AprEndpoint.processSocket(AprEndp
>> oint.java:866)
>> org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoin
>> t.java:1731)
>> java.lang.Thread.run(Thread.java:745)
>>
>>
>> If I look at the 8080 poller thread (after load), it's waiting
>> appropriately.
>> I'll try and get the remote debugger connected now and see what time is
>> being passed for waiting.
>>
>>
>>
>> On Wed, Jan 4, 2017 at 11:30 AM, Mark Thomas <[email protected]> wrote:
>>
>>> On 04/01/2017 16:02, David Oswell wrote:
>>> > Hello all,
>>> >
>>> > wondering if anyone has seen similar behaviour - sorry if already
>>> reported,
>>> > googling hasn't found much other than a somewhat similar older issue
>>> on APR
>>> > going to 100% CPU when idle due to a poll loop/timeout issue - however
>>> this
>>> > seems a bit different as this is only occurring with SSL;
>>> >
>>> > Running (ESX Based):
>>> > Server version: Apache Tomcat/8.5.9
>>> > Server built: Dec 5 2016 20:18:12 UTC
>>> > Server number: 8.5.9.0
>>> > OS Name: Windows Server 2012 R2
>>> > OS Version: 6.3
>>> > Architecture: amd64
>>> > JVM Version: 1.8.0_112-b15
>>> > JVM Vendor: Oracle Corporation
>>> > Apache Tomcat Native library 1.2.10 using APR version 1.5.2.
>>> >
>>> > APR capabilities: IPv6 [true], sendfile [true], accept filters
>>> [false],
>>> > random [true].
>>> > APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
>>> > OpenSSL successfully initialized (OpenSSL 1.0.2j 26 Sep 2016)
>>> >
>>> > Server setup as 4 vCPU / 8GB mem
>>> >
>>> > Problem:
>>> > After running load on the server (this is a QA system, so just load
>>> tests)
>>> > Tomcat goes to 70-100% CPU usage when it should be idle.
>>> > This only occurs when putting load on the SSL connectors w/APR. When
>>> > re-applying load (to SSL connector), the CPU usage drops to appropriate
>>> > levels for the load, but then returns to the higher idle CPU usage.
>>> > If applying load back to the non-SSL connector, CPU usage stays high
>>> > (higher than after a restart and sending to non-SSL connector first).
>>> >
>>> > Performing the same load tests on non-SSL (still with APR), or non-APR
>>> > connectors CPU returns to idle after the load run is completed.
>>> >
>>> > The load applied is 8 persistent connections, which send basic POST
>>> > requests, the service does some basic operations and responds,
>>> generally -
>>> > if not maxed out - in 1-2ms.
>>> > Even when using 1 connection, the CPU usage stays constant when the
>>> > requests stop.
>>> > Again this only occurs when using the SSL/APR connector, if using a
>>> non-APR
>>> > connector, or the APR-non-SSL connector it's fine.
>>> >
>>> > It seems to be a problem with just how APR-SSL works or possibly
>>> OpenSSL ?
>>>
>>> An issue with the Tomcat code is more likely than with the OpenSSL code.
>>>
>>> It sounds like one or more threads is entering a fairly tight loop. Use
>>> a combination of top to find out which thread(s) is(are) using the CPU
>>> and then a thread dump to find where in the code it is happening.
>>>
>>> Then add a breakpoint at that point in the code, run your test and once
>>> the high CPU starts, attach your IDE and step through the code to see
>>> where/how the loop is happening.
>>>
>>> Instructions for running Tomcat under a debugger are on the Wiki.
>>>
>>> The more of the above you are able to complete, the simpler it is going
>>> to be to get this fixed.
>>>
>>> Mark
>>>
>>>
>>> > Not sure how to further isolate to see which one is the problem.
>>> >
>>> >
>>> > SSL Connector setup:
>>> > <Connector port="8443"
>>> > protocol="org.apache.coyote.http11.Http11AprProtocol"
>>> > maxThreads="150"
>>> > SSLEnabled="true"
>>> > maxKeepAliveRequests="-1">
>>> > <SSLHostConfig>
>>> > <Certificate
>>> > certificateFile="conf/localhost-rsa-cert.pem"
>>> > certificateKeyFile="conf/loca
>>> lhost-rsa-key.pem"
>>> > certificateKeyPassword="password"
>>> > type="RSA" />
>>> > </SSLHostConfig>
>>> > </Connector>
>>> >
>>> > Non-SSL Connector:
>>> > <Connector port="8080"
>>> > protocol="org.apache.coyote.http11.Http11AprProtocol"
>>> > connectionTimeout="20000"
>>> > maxKeepAliveRequests="-1"
>>> > redirectPort="8443" >
>>> > </Connector>
>>> >
>>> > Initializing ProtocolHandler ["http-apr-8080"]
>>> > Initializing ProtocolHandler ["https-openssl-apr-8443"]
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>