Let's circle back to the original mail:

> When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
open with the regionserver as target.

Is that right? *Regionserver*, not another process (datanode or whatever)?
Or did I miss where somewhere along this thread there was evidence
confirming a datanode was the remote?

If you are sure that the stuck connections are to the regionserver process
(maybe pastebin lsof output so we can double check the port numbers
involved?) then the regionserver is closing the connection but the master
is not somehow, by definition of what CLOSE_WAIT means. HDFS settings won't
matter if it is the master is failing to close a socket, maybe this is an
IPC bug.



On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <hansi.kl...@web.de> wrote:

> Hi,
>
> sorry i missed that  :-(
>
> I tried that parameter in my hbase-site.xml and restartet the hbase master
> and all regionserver.
>
>   <property>
>     <name>dfs.client.socketcache.expiryMsec</name>
>     <value>900</value>
>   </property>
>
> No change, the ClOSE_WAIT sockets still persists on the hbase master to the
> regionserver's datanode after taking snapshots.
>
> Because it was not clear for me where to the setting has to go
> i put it in our hdfs-site.xml too and restarted all datanodes.
> I thought that settings with dfs.client maybe have to go there.
> But this did not change the behavior either.
>
> Regards Hansi
>
> > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> > Von: Stack <st...@duboce.net>
> > An: Hbase-User <user@hbase.apache.org>
> > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> handles on the hbase master server
> >
> > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <hansi.kl...@web.de> wrote:
> >
> > > Hi all,
> > >
> > > sorry for the late answer.
> > >
> > > I configured the hbase-site.conf like this
> > >
> > >   <property>
> > >     <name>dfs.client.socketcache.capacity</name>
> > >     <value>0</value>
> > >   </property>
> > >   <property>
> > >     <name>dfs.datanode.socket.reuse.keepalive</name>
> > >     <value>0</value>
> > >   </property>
> > >
> > > and restarted the hbase master and all regionservers.
> > > I still can see the same behavior. Each snapshot creates
> > > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> > >
> > > I there any other setting I can try?
> > >
> >
> > You saw my last suggestion about "...dfs.client.socketcache.expiryMsec to
> > 900 in your HBase client configuration.."?
> >
> > St.Ack
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to