Hello Experts
Several of my production servers were recently upgraded from Tomcat 9.0.14 to
9.0.21; immediately after the upgrade the servers started accumulating memory
and open-files (on Linux) in a steady trend that was not observed before.
After a couple of days (without reaching the memory or open-files limit and
without throwing "OutOfMemoryError: Java heap space" or "IOException: Too many
open files") the servers became unresponsive: any HTTPS request timed-out while
HTTP requests continued to work correctly.
Restarting the servers resolved the symptoms but the behavior persists and a
restart is necessary every couple of days.
I loaded a heap dump from an unresponsive server into MAT and received the
following Leak Suspect:
105,871 instances of "org.apache.coyote.http2.Stream", loaded by
"java.net.URLClassLoader..." occupy 7,581,549,904 (80.68%) bytes.
These instances are referenced from one instance of
"java.util.concurrent.ConcurrentHashMap$Node[]", loaded by "<system class
loader>"
The HashMap referenced in the report is "connections" inside ConnectionHandler.
I suspect that these objects accumulate as clients may not close their
connections correctly; regardless, I'd expect Tomcat to close the connections
upon timeout.
With keepAliveTimeout="20000" defined on UpgradeProtocol, I tested one simple
HTTP2 connection's persistence on Chrome's net-internals.
With 9.0.14 I can see the following at 20 seconds (as expected):
...
t=7065701 [st= 64] HTTP2_SESSION_UPDATE_RECV_WINDOW
--> delta = 6894
--> window_size = 15728640
t=7085708 [st=20071] HTTP2_SESSION_PING
--> is_ack = false
--> type = "received"
--> unique_id = 2
t=7085708 [st=20071] HTTP2_SESSION_PING
--> is_ack = true
--> type = "sent"
--> unique_id = 2
t=7085708 [st=20071] HTTP2_SESSION_CLOSE
--> description = "Connection closed"
--> net_error = -100 (ERR_CONNECTION_CLOSED)
t=7085708 [st=20071] HTTP2_SESSION_POOL_REMOVE_SESSION
t=7085708 [st=20071] -HTTP2_SESSION
With 9.0.21 the connection does not close, even after several minutes.
I believe the change in behavior stems the following commit:
https://github.com/apache/tomcat/commit/c16d9d810a1f64cd768ff33058936cf8907e3117
and so I may be doing something wrong.
Please let me know whether I have misconfigured, misunderstood, misdiagnosed,
misbehaved or mis-something-else, and whether I should provide additional
information
Current setup of the production servers:
AdoptOpenJDK (build 11.0.3+7)
Amazon Linux 2
<Connector port="443" protocol="org.apache.coyote.http11.Http11NioProtocol"
maxHttpHeaderSize="16384"
maxThreads="500" minSpareThreads="25"
enableLookups="false" disableUploadTimeout="true"
connectionTimeout="10000"
compression="on"
SSLEnabled="true" scheme="https" secure="true">
<UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol"
keepAliveTimeout="20000"/>
<SSLHostConfig protocols="+TLSv1.2+TLSv1.3">
<Certificate certificateKeystoreFile="tomcat.keystore"
certificateKeyAlias="tomcat"
certificateKeystorePassword=""
certificateKeystoreType="PKCS12"/>
</SSLHostConfig>
</Connector>
Thanks
Chen
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]