Hi.
As a follow-upon another thread originally entitled "apache/tomcat
communication issues (502 response)", I'd like to pursue the CLOSE-WAIT
subject.
Sorry if this post is a bit long, I want to make sure that I do provide
all the necessary information.
Like the original poster, I am seeing on my systems a fair number of
sockets apparently stuck for a long time in the CLOSE_WAIT state.
(Sometimes several hundreds of them).
They seem to predominantly concern Tomcat and other java processes, but
as Alan pointed out previously and I confirm, my perspective is slanted,
because we use a lot of common java programs and webapps on our servers,
and the ones mostly affected talk to eachother and come from the same
vendor.
Unfortunately also, I do not have the sources of these programs/webapps
available, and will not get them, and I can't do without these programs.
It has been previously established that a socket in a
long-time-lingering CLOSE-WAIT status, is due to one or the other side
of a TCP connection not properly closing its side of the connection when
it is done with it.
I also surmise (without having a definite proof of this), that this is
essentially "bad", as it ties up some resources that could be otherwise
freed.
I have also been told or discovered that, our servers being Linux Debian
servers, programs such as "ps", "netstat" and "lsof" can help in
determining precisely how many such lingering sockets there are, and who
the culprit processes are (to some extent).
In our case, we know which are the programs involved, because we know
which ones open a listening socket and on what fixed port, and we also
know which are the other processes talking to them.
But, as mentioned previously, we do not have the source of these
programs and will not get them, but cannot practically do without them
for now. But we do have full root control of the Linux servers where
these programs are running.
So my question is : considering the situation above, is there something
I can do locally to free these lingering CLOSE_WAIT sockets, and under
which conditions ?
(I must admit that I am a bit lost among the myriad options of lsof)
For example, suppose I start with a "netstat -pan" command and I see the
display below (sorry for the line-wrapping).
I see a number of sockets in the CLOSE_WAIT state, and for those I have
a process-id, which I can associate to a particular process.
For example, I see this line :
tcp6 12 0 ::ffff:127.0.0.1:41764 ::ffff:127.0.0.1:11002
CLOSE_WAIT 29649/java
which tells me that there is a local process 29649/java, whith a "local"
socket port 41674 in the CLOSE_WAIT state, related to another socket
#11002 on the same host.
On the other hand, I see this line :
tcp 0 0 127.0.0.1:11002 127.0.0.1:41764
FIN_WAIT2 -
which shows a "local" socket on port 11002, related to this other local
socket port #41764, with no process-id/program displayed.
What does that tell me ?
I also know that the process-id 29649 corresponds to a local java
process, of the daemon variety, multi-threaded. That program "talks to"
another known server program, written in C, of which instances are
started on an ad-hoc base by inetd, and which "listens" on port 11002
(in fact it is inetd who does, and it passes this socket on to the
process it forks, I understand that).
(The link with Tomcat is that I also see frequently the same situation,
where the process "owning" the CLOSE_WAIT socket is Tomcat, more
specifically one webapp running inside it. It's just that in this
particular snapshot it isn't.)
What it looks like to me in this case, is that at some point one of the
threads of process # 29649 opened a client socket #41674 to the local
inetd port #11002; that inetd then started the underlying server process
(the C program); that the underlying C program then at some point
exited; but that process #41674 never closes one of the sides of its
connection with port #11002.
Can I somehow detect this condition, and "force" the offending thread of
process #29649 to close that socket (or just force this thread to exit) ?
I realise this may be a complex question, and that the answers may be
different if it is a Tomcat webapp than a stand-alone process. I would
be content to just have answers for the webapp case.
Full display of "netstat -pan | grep WAIT" :
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 127.0.0.1:11002 127.0.0.1:41763
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41764
FIN_WAIT2 -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41738
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41739
FIN_WAIT2 -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41741
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41735
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41755
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41752
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41753
FIN_WAIT2 -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41758
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41759
FIN_WAIT2 -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41744
TIME_WAIT -
tcp 0 0 127.0.0.1:11002 127.0.0.1:41749
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41762
FIN_WAIT2 -
tcp6 0 0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41737
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41743
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41740
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41734
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41754
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41757
TIME_WAIT -
tcp6 0 0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41751
TIME_WAIT -
tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41748
FIN_WAIT2 -
tcp6 12 0 ::ffff:127.0.0.1:41711 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:41708 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:41764 ::ffff:127.0.0.1:11002
CLOSE_WAIT 29649/java
tcp6 12 0 ::ffff:127.0.0.1:41753 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:41759 ::ffff:127.0.0.1:11002
CLOSE_WAIT 29649/java
tcp6 12 0 ::ffff:127.0.0.1:41739 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:39436 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:38989 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:39364 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:39390 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 12 0 ::ffff:127.0.0.1:40859 ::ffff:127.0.0.1:11002
CLOSE_WAIT 13333/java
tcp6 1 0 ::ffff:127.0.0.1:39412 ::ffff:127.0.0.1:11101
CLOSE_WAIT 2864/java
tcp6 1 0 ::ffff:127.0.0.1:41249 ::ffff:127.0.0.1:11101
CLOSE_WAIT 2864/java
tcp6 1 0 ::ffff:127.0.0.1:41748 ::ffff:127.0.0.1:11101
CLOSE_WAIT 2864/java
tcp6 1 0 ::ffff:127.0.0.1:41731 ::ffff:127.0.0.1:11101
CLOSE_WAIT 2864/java
tcp6 1 0 ::ffff:127.0.0.1:41762 ::ffff:127.0.0.1:11101
CLOSE_WAIT 2864/java
tcp6 0 0 ::ffff:212.85.38.176:80 ::ffff:212.85.38.:56212
TIME_WAIT -
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org