On 27.03.2020 21:39, Eric Robinson wrote:
FYI, I don't have 1800 tomcat instances on one server. I have about 100
instances on each of 18 servers.
When one of these (attempted) connections fails, do you not get some error message which
gives a clue as to what the failure is due to ?
(should be a log somewhere, no ?)
Also, just for info :
in the past, I have run into problems under Linux (no more connections accepted, neither
incoming nor outgoing) whenever the actual number of TCP connections went above a certain
number (maybe it was 64K).
A TCP connection goes through a number of states (which you see with a netstat display),
such as "ESTABLISHED" but also "TIME_WAIT", "CLOSE_WAIT" etc.. In some of these states,
the connection no longer has any link to any process, but the connection still counts
against the limit (of the OS/TCP stack).
The case I'm talking about was a bit like yours : a webapp running under tomcat was making
a connection to a remote host, but this connection was wrapped inside an object of some
kind. When the webapp no longer needed the connection, it just discarded the wrapping
object, which was left without references to it, and thus candidate for destruction at
some point. But the discarded object never explicitly closed the underlying connection.
Over a period of time, this left an accumulation of (no longer used) connections in the
"CLOSE_WAIT" state (closed by the remote host side, but not by the webapp side), which
just sat there until a GC happened, at which time the destruction of these objects really
happened, and some implicit close was done at the OS level, which eliminated these pending
underlying CLOSE_WAIT connections.
And since the available heap was quite large, it took a long time before a GC happened,
which allowed such CLOSE_WAIT connections to accumulate in the hundreds or thousands
before being "recycled".
Until a certain number was reached, and then the host became all but unreachable and very
slow.
That was a long time ago, and thus a lot of Java versions and Linux versions ago, so maybe
something happened since then to avoid such a situation.
But maybe also, you are suffering of some similar phenomenon.
You could try to use netstat some more, and when you are having the problem, you should
count at ALL the TCP connections, including the ones in CLOSE_WAIT, and just check if you
do not have an obscene number of them in total. There is definitely some limit number
past which the OS starts acting funny.
(Note : unlike for TIME_WAIT e.g., there is no time limit for a connection in the
CLOSE_WAIT state; it will stay in that state as long as the client side has not explicitly
closed it, in some kind of zombie half-life)
See e.g. :
https://users.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org