Skimmed quickly through your post there while working, so forgive me if
this is irrelevant.

CLOSE_WAIT is a state where the connection has been closed on the tcp/ip
level, but the application (in this case java) has not closed the socket
descriptor yet.

As a coincidence we just fixed this very same issue in our application,
which uses the httpclient library.

There is a known issue with the httpclient library where sockets are not
closed after the connection ends (issue or feature you be the judge), 
we worked around this by explicitly calling a close ourselves.

If httpclient is used that could be the culprit.

See
http://www.nabble.com/tcp-connections-left-with-CLOSE_WAIT-td13757202.html
for a better description

Rgds,

Taylan

André Warnier wrote:

>
> Hi.
> As a follow-upon another thread originally entitled "apache/tomcat
> communication issues (502 response)", I'd like to pursue the
> CLOSE-WAIT subject.
>
> Sorry if this post is a bit long, I want to make sure that I do
> provide all the necessary information.
>
> Like the original poster, I am seeing on my systems a fair number of
> sockets apparently stuck for a long time in the CLOSE_WAIT state.
> (Sometimes several hundreds of them).
> They seem to predominantly concern Tomcat and other java processes,
> but as Alan pointed out previously and I confirm, my perspective is
> slanted, because we use a lot of common java programs and webapps on
> our servers, and the ones mostly affected talk to eachother and come
> from the same vendor.
> Unfortunately also, I do not have the sources of these
> programs/webapps available, and will not get them, and I can't do
> without these programs.
>
> It has been previously established that a socket in a
> long-time-lingering CLOSE-WAIT status, is due to one or the other side
> of a TCP connection not properly closing its side of the connection
> when it is done with it.
> I also surmise (without having a definite proof of this), that this is
> essentially "bad", as it ties up some resources that could be
> otherwise freed.
> I have also been told or discovered that, our servers being Linux
> Debian servers, programs such as "ps", "netstat" and "lsof" can help
> in determining precisely how many such lingering sockets there are,
> and who the culprit processes are (to some extent).
>
> In our case, we know which are the programs involved, because we know
> which ones open a listening socket and on what fixed port, and we also
> know which are the other processes talking to them.
> But, as mentioned previously, we do not have the source of these
> programs and will not get them, but cannot practically do without them
> for now. But we do have full root control of the Linux servers where
> these programs are running.
>
> So my question is : considering the situation above, is there
> something I can do locally to free these lingering CLOSE_WAIT sockets,
> and under which conditions ?
> (I must admit that I am a bit lost among the myriad options of lsof)
>
> For example, suppose I start with a "netstat -pan" command and I see
> the display below (sorry for the line-wrapping).
> I see a number of sockets in the CLOSE_WAIT state, and for those I
> have a process-id, which I can associate to a particular process.
> For example, I see this line :
> tcp6      12      0 ::ffff:127.0.0.1:41764  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 29649/java
> which tells me that there is a local process 29649/java, whith a
> "local" socket port 41674 in the CLOSE_WAIT state, related to another
> socket #11002 on the same host.
> On the other hand, I see this line :
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41764 FIN_WAIT2  -
> which shows a "local" socket on port 11002, related to this other
> local socket port #41764, with no process-id/program displayed.
> What does that tell me ?
>
> I also know that the process-id 29649 corresponds to a local java
> process, of the daemon variety, multi-threaded.  That program "talks
> to" another known server program, written in C, of which instances are
> started on an ad-hoc base by inetd, and which "listens" on port 11002
> (in fact it is inetd who does, and it passes this socket on to the
> process it forks, I understand that).
>
> (The link with Tomcat is that I also see frequently the same
> situation, where the process "owning" the CLOSE_WAIT socket is Tomcat,
> more specifically one webapp running inside it.  It's just that in
> this particular snapshot it isn't.)
>
> What it looks like to me in this case, is that at some point one of
> the threads of process # 29649 opened a client socket #41674 to the
> local inetd port #11002; that inetd then started the underlying server
> process (the C program); that the underlying C program then at some
> point exited; but that process #41674 never closes one of the sides of
> its connection with port #11002.
> Can I somehow detect this condition, and "force" the offending thread
> of process #29649 to close that socket (or just force this thread to
> exit) ?
>
> I realise this may be a complex question, and that the answers may be
> different if it is a Tomcat webapp than a stand-alone process.  I
> would be content to just have answers for the webapp case.
>
>
> Full display of "netstat -pan | grep WAIT" :
>
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41763 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41764 FIN_WAIT2  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41738 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41739 FIN_WAIT2  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41741 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41735 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41755 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41752 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41753 FIN_WAIT2  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41758 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41759 FIN_WAIT2  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41744 TIME_WAIT  -
> tcp        0      0 127.0.0.1:11002         127.0.0.1:41749 TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41762
> FIN_WAIT2  -
> tcp6       0      0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41737
> TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41743
> TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41740
> TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41734
> TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41754
> TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41757
> TIME_WAIT  -
> tcp6       0      0 ::ffff:212.85.38.:11100 ::ffff:212.85.38.:41751
> TIME_WAIT  -
> tcp6       0      0 ::ffff:127.0.0.1:11101  ::ffff:127.0.0.1:41748
> FIN_WAIT2  -
> tcp6      12      0 ::ffff:127.0.0.1:41711  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:41708  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:41764  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 29649/java
> tcp6      12      0 ::ffff:127.0.0.1:41753  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:41759  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 29649/java
> tcp6      12      0 ::ffff:127.0.0.1:41739  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:39436  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:38989  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:39364  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:39390  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6      12      0 ::ffff:127.0.0.1:40859  ::ffff:127.0.0.1:11002
> CLOSE_WAIT 13333/java
> tcp6       1      0 ::ffff:127.0.0.1:39412  ::ffff:127.0.0.1:11101
> CLOSE_WAIT 2864/java
> tcp6       1      0 ::ffff:127.0.0.1:41249  ::ffff:127.0.0.1:11101
> CLOSE_WAIT 2864/java
> tcp6       1      0 ::ffff:127.0.0.1:41748  ::ffff:127.0.0.1:11101
> CLOSE_WAIT 2864/java
> tcp6       1      0 ::ffff:127.0.0.1:41731  ::ffff:127.0.0.1:11101
> CLOSE_WAIT 2864/java
> tcp6       1      0 ::ffff:127.0.0.1:41762  ::ffff:127.0.0.1:11101
> CLOSE_WAIT 2864/java
> tcp6       0      0 ::ffff:212.85.38.176:80 ::ffff:212.85.38.:56212
> TIME_WAIT  -
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to