Skimmed quickly through your post there while working, so forgive me if this is irrelevant.
CLOSE_WAIT is a state where the connection has been closed on the tcp/ip level, but the application (in this case java) has not closed the socket descriptor yet. As a coincidence we just fixed this very same issue in our application, which uses the httpclient library. There is a known issue with the httpclient library where sockets are not closed after the connection ends (issue or feature you be the judge), we worked around this by explicitly calling a close ourselves. If httpclient is used that could be the culprit. See http://www.nabble.com/tcp-connections-left-with-CLOSE_WAIT-td13757202.html for a better description Rgds, Taylan André Warnier wrote: > > Hi. > As a follow-upon another thread originally entitled "apache/tomcat > communication issues (502 response)", I'd like to pursue the > CLOSE-WAIT subject. > > Sorry if this post is a bit long, I want to make sure that I do > provide all the necessary information. > > Like the original poster, I am seeing on my systems a fair number of > sockets apparently stuck for a long time in the CLOSE_WAIT state. > (Sometimes several hundreds of them). > They seem to predominantly concern Tomcat and other java processes, > but as Alan pointed out previously and I confirm, my perspective is > slanted, because we use a lot of common java programs and webapps on > our servers, and the ones mostly affected talk to eachother and come > from the same vendor. > Unfortunately also, I do not have the sources of these > programs/webapps available, and will not get them, and I can't do > without these programs. > > It has been previously established that a socket in a > long-time-lingering CLOSE-WAIT status, is due to one or the other side > of a TCP connection not properly closing its side of the connection > when it is done with it. > I also surmise (without having a definite proof of this), that this is > essentially "bad", as it ties up some resources that could be > otherwise freed. > I have also been told or discovered that, our servers being Linux > Debian servers, programs such as "ps", "netstat" and "lsof" can help > in determining precisely how many such lingering sockets there are, > and who the culprit processes are (to some extent). > > In our case, we know which are the programs involved, because we know > which ones open a listening socket and on what fixed port, and we also > know which are the other processes talking to them. > But, as mentioned previously, we do not have the source of these > programs and will not get them, but cannot practically do without them > for now. But we do have full root control of the Linux servers where > these programs are running. > > So my question is : considering the situation above, is there > something I can do locally to free these lingering CLOSE_WAIT sockets, > and under which conditions ? > (I must admit that I am a bit lost among the myriad options of lsof) > > For example, suppose I start with a "netstat -pan" command and I see > the display below (sorry for the line-wrapping). > I see a number of sockets in the CLOSE_WAIT state, and for those I > have a process-id, which I can associate to a particular process. > For example, I see this line : > tcp6 12 0 ::ffff:127.0.0.1:41764 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 29649/java > which tells me that there is a local process 29649/java, whith a > "local" socket port 41674 in the CLOSE_WAIT state, related to another > socket #11002 on the same host. > On the other hand, I see this line : > tcp 0 0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2 - > which shows a "local" socket on port 11002, related to this other > local socket port #41764, with no process-id/program displayed. > What does that tell me ? > > I also know that the process-id 29649 corresponds to a local java > process, of the daemon variety, multi-threaded. That program "talks > to" another known server program, written in C, of which instances are > started on an ad-hoc base by inetd, and which "listens" on port 11002 > (in fact it is inetd who does, and it passes this socket on to the > process it forks, I understand that). > > (The link with Tomcat is that I also see frequently the same > situation, where the process "owning" the CLOSE_WAIT socket is Tomcat, > more specifically one webapp running inside it. It's just that in > this particular snapshot it isn't.) > > What it looks like to me in this case, is that at some point one of > the threads of process # 29649 opened a client socket #41674 to the > local inetd port #11002; that inetd then started the underlying server > process (the C program); that the underlying C program then at some > point exited; but that process #41674 never closes one of the sides of > its connection with port #11002. > Can I somehow detect this condition, and "force" the offending thread > of process #29649 to close that socket (or just force this thread to > exit) ? > > I realise this may be a complex question, and that the answers may be > different if it is a Tomcat webapp than a stand-alone process. I > would be content to just have answers for the webapp case. > > > Full display of "netstat -pan | grep WAIT" : > > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > tcp 0 0 127.0.0.1:11002 127.0.0.1:41763 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2 - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41738 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41739 FIN_WAIT2 - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41741 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41735 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41755 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41752 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41753 FIN_WAIT2 - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41758 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41759 FIN_WAIT2 - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41744 TIME_WAIT - > tcp 0 0 127.0.0.1:11002 127.0.0.1:41749 TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41762 > FIN_WAIT2 - > tcp6 0 0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41737 > TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41743 > TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41740 > TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41734 > TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41754 > TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41757 > TIME_WAIT - > tcp6 0 0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41751 > TIME_WAIT - > tcp6 0 0 ::ffff:127.0.0.1:11101 ::ffff:127.0.0.1:41748 > FIN_WAIT2 - > tcp6 12 0 ::ffff:127.0.0.1:41711 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:41708 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:41764 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 29649/java > tcp6 12 0 ::ffff:127.0.0.1:41753 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:41759 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 29649/java > tcp6 12 0 ::ffff:127.0.0.1:41739 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:39436 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:38989 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:39364 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:39390 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 12 0 ::ffff:127.0.0.1:40859 ::ffff:127.0.0.1:11002 > CLOSE_WAIT 13333/java > tcp6 1 0 ::ffff:127.0.0.1:39412 ::ffff:127.0.0.1:11101 > CLOSE_WAIT 2864/java > tcp6 1 0 ::ffff:127.0.0.1:41249 ::ffff:127.0.0.1:11101 > CLOSE_WAIT 2864/java > tcp6 1 0 ::ffff:127.0.0.1:41748 ::ffff:127.0.0.1:11101 > CLOSE_WAIT 2864/java > tcp6 1 0 ::ffff:127.0.0.1:41731 ::ffff:127.0.0.1:11101 > CLOSE_WAIT 2864/java > tcp6 1 0 ::ffff:127.0.0.1:41762 ::ffff:127.0.0.1:11101 > CLOSE_WAIT 2864/java > tcp6 0 0 ::ffff:212.85.38.176:80 ::ffff:212.85.38.:56212 > TIME_WAIT - > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org