Hello Experts Several of my production servers were recently upgraded from Tomcat 9.0.14 to 9.0.21; immediately after the upgrade the servers started accumulating memory and open-files (on Linux) in a steady trend that was not observed before. After a couple of days (without reaching the memory or open-files limit and without throwing "OutOfMemoryError: Java heap space" or "IOException: Too many open files") the servers became unresponsive: any HTTPS request timed-out while HTTP requests continued to work correctly. Restarting the servers resolved the symptoms but the behavior persists and a restart is necessary every couple of days. I loaded a heap dump from an unresponsive server into MAT and received the following Leak Suspect:
105,871 instances of "org.apache.coyote.http2.Stream", loaded by "java.net.URLClassLoader..." occupy 7,581,549,904 (80.68%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Node[]", loaded by "<system class loader>" The HashMap referenced in the report is "connections" inside ConnectionHandler. I suspect that these objects accumulate as clients may not close their connections correctly; regardless, I'd expect Tomcat to close the connections upon timeout. With keepAliveTimeout="20000" defined on UpgradeProtocol, I tested one simple HTTP2 connection's persistence on Chrome's net-internals. With 9.0.14 I can see the following at 20 seconds (as expected): ... t=7065701 [st= 64] HTTP2_SESSION_UPDATE_RECV_WINDOW --> delta = 6894 --> window_size = 15728640 t=7085708 [st=20071] HTTP2_SESSION_PING --> is_ack = false --> type = "received" --> unique_id = 2 t=7085708 [st=20071] HTTP2_SESSION_PING --> is_ack = true --> type = "sent" --> unique_id = 2 t=7085708 [st=20071] HTTP2_SESSION_CLOSE --> description = "Connection closed" --> net_error = -100 (ERR_CONNECTION_CLOSED) t=7085708 [st=20071] HTTP2_SESSION_POOL_REMOVE_SESSION t=7085708 [st=20071] -HTTP2_SESSION With 9.0.21 the connection does not close, even after several minutes. I believe the change in behavior stems the following commit: https://github.com/apache/tomcat/commit/c16d9d810a1f64cd768ff33058936cf8907e3117 and so I may be doing something wrong. Please let me know whether I have misconfigured, misunderstood, misdiagnosed, misbehaved or mis-something-else, and whether I should provide additional information Current setup of the production servers: AdoptOpenJDK (build 11.0.3+7) Amazon Linux 2 <Connector port="443" protocol="org.apache.coyote.http11.Http11NioProtocol" maxHttpHeaderSize="16384" maxThreads="500" minSpareThreads="25" enableLookups="false" disableUploadTimeout="true" connectionTimeout="10000" compression="on" SSLEnabled="true" scheme="https" secure="true"> <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" keepAliveTimeout="20000"/> <SSLHostConfig protocols="+TLSv1.2+TLSv1.3"> <Certificate certificateKeystoreFile="tomcat.keystore" certificateKeyAlias="tomcat" certificateKeystorePassword="" certificateKeystoreType="PKCS12"/> </SSLHostConfig> </Connector> Thanks Chen --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org