In the last two weeks I've had two occurrences where a single CentOS 7 production server hosting a public webpage has become unresponsive. The first time, all 300 available "https-jsse-nio-8443" threads were consumed, with the max age being around 45minutes, and all in a "S" status. This time all 300 were consumed in "S" status with the oldest being around ~16minutes. A restart of Tomcat on both occasions freed these threads and the website became responsive again. The connections are post/get methods which shouldn't take very long at all.
CPU/MEM/JVM all appear to be within normal operating limits. I've not had much luck searching for articles for this behavior nor finding remedies. The default timeout values are used in both Tomcat and in the applications that run within as far as I can tell. Hopefully someone will have some insight on why the behavior could be occurring, why isn't Tomcat killing the connections? Even in a RST/ACK status, shouldn't Tomcat terminate the connection without an ACK from the client after the default timeout? Is there a graceful way to script the termination of threads in case Tomcat isn't able to for whatever reason? My research for killing threads results in system threads or application threads, not Tomcat Connector connection threads, so I'm not sure if this is even viable. I'm also looking into ways to terminate these aged sessions via the F5.At this time I'm open to any suggestions that would be able to automate a resolution to keep the system from experiencing downtime, or for any insight on where to look for a root cause. Thanks in advance for any guidance you can lend. Thanks, David