On 27.03.2020 21:39, Eric Robinson wrote:
FYI, I don't have 1800 tomcat instances on one server. I have about 100 
instances on each of 18 servers.

When one of these (attempted) connections fails, do you not get some error message which gives a clue as to what the failure is due to ?
(should be a log somewhere, no ?)

Also, just for info :
in the past, I have run into problems under Linux (no more connections accepted, neither incoming nor outgoing) whenever the actual number of TCP connections went above a certain number (maybe it was 64K). A TCP connection goes through a number of states (which you see with a netstat display), such as "ESTABLISHED" but also "TIME_WAIT", "CLOSE_WAIT" etc.. In some of these states, the connection no longer has any link to any process, but the connection still counts against the limit (of the OS/TCP stack).

The case I'm talking about was a bit like yours : a webapp running under tomcat was making a connection to a remote host, but this connection was wrapped inside an object of some kind. When the webapp no longer needed the connection, it just discarded the wrapping object, which was left without references to it, and thus candidate for destruction at some point. But the discarded object never explicitly closed the underlying connection.

Over a period of time, this left an accumulation of (no longer used) connections in the "CLOSE_WAIT" state (closed by the remote host side, but not by the webapp side), which just sat there until a GC happened, at which time the destruction of these objects really happened, and some implicit close was done at the OS level, which eliminated these pending underlying CLOSE_WAIT connections. And since the available heap was quite large, it took a long time before a GC happened, which allowed such CLOSE_WAIT connections to accumulate in the hundreds or thousands before being "recycled". Until a certain number was reached, and then the host became all but unreachable and very slow. That was a long time ago, and thus a lot of Java versions and Linux versions ago, so maybe something happened since then to avoid such a situation.
But maybe also, you are suffering of some similar phenomenon.
You could try to use netstat some more, and when you are having the problem, you should count at ALL the TCP connections, including the ones in CLOSE_WAIT, and just check if you do not have an obscene number of them in total. There is definitely some limit number past which the OS starts acting funny.

(Note : unlike for TIME_WAIT e.g., there is no time limit for a connection in the CLOSE_WAIT state; it will stay in that state as long as the client side has not explicitly closed it, in some kind of zombie half-life) See e.g. : https://users.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to