Hi.
As a follow-upon another thread originally entitled "apache/tomcat communication issues (502 response)", I'd like to pursue the CLOSE-WAIT subject.

Sorry if this post is a bit long, I want to make sure that I do provide all the necessary information.

Like the original poster, I am seeing on my systems a fair number of sockets apparently stuck for a long time in the CLOSE_WAIT state.
(Sometimes several hundreds of them).
They seem to predominantly concern Tomcat and other java processes, but as Alan pointed out previously and I confirm, my perspective is slanted, because we use a lot of common java programs and webapps on our servers, and the ones mostly affected talk to eachother and come from the same vendor. Unfortunately also, I do not have the sources of these programs/webapps available, and will not get them, and I can't do without these programs.

It has been previously established that a socket in a long-time-lingering CLOSE-WAIT status, is due to one or the other side of a TCP connection not properly closing its side of the connection when it is done with it. I also surmise (without having a definite proof of this), that this is essentially "bad", as it ties up some resources that could be otherwise freed. I have also been told or discovered that, our servers being Linux Debian servers, programs such as "ps", "netstat" and "lsof" can help in determining precisely how many such lingering sockets there are, and who the culprit processes are (to some extent).

In our case, we know which are the programs involved, because we know which ones open a listening socket and on what fixed port, and we also know which are the other processes talking to them. But, as mentioned previously, we do not have the source of these programs and will not get them, but cannot practically do without them for now. But we do have full root control of the Linux servers where these programs are running.

So my question is : considering the situation above, is there something I can do locally to free these lingering CLOSE_WAIT sockets, and under which conditions ?
(I must admit that I am a bit lost among the myriad options of lsof)

For example, suppose I start with a "netstat -pan" command and I see the display below (sorry for the line-wrapping). I see a number of sockets in the CLOSE_WAIT state, and for those I have a process-id, which I can associate to a particular process.
For example, I see this line :
tcp6 12 0 ::ffff:127.0.0.1:41764 ::ffff:127.0.0.1:11002 CLOSE_WAIT 29649/java which tells me that there is a local process 29649/java, whith a "local" socket port 41674 in the CLOSE_WAIT state, related to another socket #11002 on the same host.
On the other hand, I see this line :
tcp 0 0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2 - which shows a "local" socket on port 11002, related to this other local socket port #41764, with no process-id/program displayed.
What does that tell me ?

I also know that the process-id 29649 corresponds to a local java process, of the daemon variety, multi-threaded. That program "talks to" another known server program, written in C, of which instances are started on an ad-hoc base by inetd, and which "listens" on port 11002 (in fact it is inetd who does, and it passes this socket on to the process it forks, I understand that).

(The link with Tomcat is that I also see frequently the same situation, where the process "owning" the CLOSE_WAIT socket is Tomcat, more specifically one webapp running inside it. It's just that in this particular snapshot it isn't.)

What it looks like to me in this case, is that at some point one of the threads of process # 29649 opened a client socket #41674 to the local inetd port #11002; that inetd then started the underlying server process (the C program); that the underlying C program then at some point exited; but that process #41674 never closes one of the sides of its connection with port #11002. Can I somehow detect this condition, and "force" the offending thread of process #29649 to close that socket (or just force this thread to exit) ?

I realise this may be a complex question, and that the answers may be different if it is a Tomcat webapp than a stand-alone process. I would be content to just have answers for the webapp case.


Full display of "netstat -pan | grep WAIT" :

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:11002 127.0.0.1:41763 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2 - tcp 0 0 127.0.0.1:11002 127.0.0.1:41738 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41739 FIN_WAIT2 - tcp 0 0 127.0.0.1:11002 127.0.0.1:41741 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41735 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41755 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41752 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41753 FIN_WAIT2 - tcp 0 0 127.0.0.1:11002 127.0.0.1:41758 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41759 FIN_WAIT2 - tcp 0 0 127.0.0.1:11002 127.0.0.1:41744 TIME_WAIT - tcp 0 0 127.0.0.1:11002 127.0.0.1:41749 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41762 FIN_WAIT2 - tcp6 0 0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41737 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41743 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41740 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41734 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41754 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41757 TIME_WAIT - tcp6 0 0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41751 TIME_WAIT - tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41748 FIN_WAIT2 - tcp6 12 0 ::ffff:127.0.0.1:41711 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:41708 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:41764 ::ffff:127.0.0.1:11002 CLOSE_WAIT 29649/java tcp6 12 0 ::ffff:127.0.0.1:41753 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:41759 ::ffff:127.0.0.1:11002 CLOSE_WAIT 29649/java tcp6 12 0 ::ffff:127.0.0.1:41739 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:39436 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:38989 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:39364 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:39390 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 12 0 ::ffff:127.0.0.1:40859 ::ffff:127.0.0.1:11002 CLOSE_WAIT 13333/java tcp6 1 0 ::ffff:127.0.0.1:39412 ::ffff:127.0.0.1:11101 CLOSE_WAIT 2864/java tcp6 1 0 ::ffff:127.0.0.1:41249 ::ffff:127.0.0.1:11101 CLOSE_WAIT 2864/java tcp6 1 0 ::ffff:127.0.0.1:41748 ::ffff:127.0.0.1:11101 CLOSE_WAIT 2864/java tcp6 1 0 ::ffff:127.0.0.1:41731 ::ffff:127.0.0.1:11101 CLOSE_WAIT 2864/java tcp6 1 0 ::ffff:127.0.0.1:41762 ::ffff:127.0.0.1:11101 CLOSE_WAIT 2864/java tcp6 0 0 ::ffff:212.85.38.176:80 ::ffff:212.85.38.:56212 TIME_WAIT -





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to