Re: Region server not accept connections intermittently

2014-07-22 Thread Esteban Gutierrez
Thanks for keeping us updated Rural! I'm still curious why changing net.core.somaxconn in the kernel helped here if you didn't change ipc.server.listen.queue.size. Perhaps that property is in hdfs-site.xml or core-site.xml with a higher value? cheers, esteban. -- Cloudera, Inc. On Mon, Jul

Re: Region server not accept connections intermittently

2014-07-22 Thread Rural Hunter
I don't know. I think the other parameter is more important: net.core.somaxconn=1024 (original 128) net.ipv4.tcp_synack_retries=2 (original 5) Since I found many connections were in SYN_RECV status, my purpose of changing these 2 parameters are: net.ipv4.tcp_synack_retries: Reduce the waiting

Re: Region server not accept connections intermittently

2014-07-21 Thread Rural Hunter
Just update my result: Since HBASE-11277 was applied, I have not seen any connection problem for a week. Before, the connection problem almost occurred everyday.

Re: Region server not accept connections intermittently

2014-07-13 Thread Esteban Gutierrez
Hello Rural, Thats interesting, unless you have changed ipc.server.listen.queue.size in the HBase Region Server (and other Hadoop daemons) to value higher than 128, you might have worked around the issue by increasing the listen queue (globally) for a service that doesn't explicitly set the queue

Re: Region server not accept connections intermittently

2014-07-13 Thread Rural Hunter
No. I didn't touch ipc.server.listen.queue.size. Anyway my change mitigated my problem as I stated in another thread: From the observation in these 2 days after the action was taken, the frequency of the problem has been reduced. The huge improvement is, even when the problem happens, the RS

Re: Region server not accept connections intermittently

2014-07-11 Thread Rural Hunter
One additional info, I did 'netstat -an |grep 60020' when the problem happened, I saw many connections from remote to local port 60020 are on state SYN_RECV. Not sure if that indicates anything.

Re: Region server not accept connections intermittently

2014-07-11 Thread Esteban Gutierrez
For how long you noticed that connections? when you say many do you mean 1000s? the problem with having too many syn_recv is that you could end running out of file descriptors, which makes me wonder know what is the maximum number of open files that you have configured for the RS process (see all

Re: Region server not accept connections intermittently

2014-07-11 Thread Rural Hunter
the max number of files has already been set to 32768 for the user running hbase/hadoop. I think there should be errors in log if it's the file number problem. The count of connections in SYN_RECV state is about 100. I also checked the source of those connections and they are from the hosts of

Re: Region server not accept connections intermittently

2014-07-10 Thread Rural Hunter
I got the dump of the problematic rs from web ui: http://pastebin.com/4hfhkDUw output of top -H -p PID: http://pastebin.com/LtzkScYY I also got the output of jstack but I believe it's already in the dump so I do not paste it again. This time the hang lasted about 20 minutes. 于 2014/7/9 12:48,

Re: Region server not accept connections intermittently

2014-07-10 Thread Ted Yu
I noticed the blockSeek() call in HFileReaderV2. Did you take only one dump during the 20 minute hang ? Cheers On Jul 10, 2014, at 1:54 AM, Rural Hunter ruralhun...@gmail.com wrote: I got the dump of the problematic rs from web ui: http://pastebin.com/4hfhkDUw output of top -H -p PID:

Re: Region server not accept connections intermittently

2014-07-10 Thread Rural Hunter
Yes, I can take more if needed when it happens next time. 于 2014/7/10 17:11, Ted Yu 写道: I noticed the blockSeek() call in HFileReaderV2. Did you take only one dump during the 20 minute hang ? Cheers

Region server not accept connections intermittently

2014-07-08 Thread Rural Hunter
Hi, I'm using hbase-0.96.2. I saw sometimes my region servers don't accept connections from clients. this could last like 10 minutes to half hour. I was not able to connect to the 60020 port even with telnet command when it happened. After a while, the problem disappeared and the region

答复: Region server not accept connections intermittently

2014-07-08 Thread 谢良
Coud you try with -XX:+PrintGCApplicationStoppedTime vm parameter ? the hung from vm side was not caused by GC always Thanks, 发件人: Rural Hunter [ruralhun...@gmail.com] 发送时间: 2014年7月8日 14:06 收件人: user@hbase.apache.org 主题: Region server not accept

Re: 答复: Region server not accept connections intermittently

2014-07-08 Thread Rural Hunter
I checked the parameter and it seems also a gc parameter to print the total time of stop the world. So will it help to get the info about hung was not caused by GC ? 于 2014/7/8 14:28, 谢良 写道: Coud you try with -XX:+PrintGCApplicationStoppedTime vm parameter ? the hung from vm side was not caused

Re: Region server not accept connections intermittently

2014-07-08 Thread Ted Yu
Next time this happens, can you take jstack of the region server and pastebin it ? Thanks On Jul 7, 2014, at 11:06 PM, Rural Hunter ruralhun...@gmail.com wrote: Hi, I'm using hbase-0.96.2. I saw sometimes my region servers don't accept connections from clients. this could last like 10

Re: Region server not accept connections intermittently

2014-07-08 Thread Rural Hunter
OK, I will try to do that when it happens again. Thanks. 于 2014/7/8 17:06, Ted Yu 写道: Next time this happens, can you take jstack of the region server and pastebin it ? Thanks

Re: Region server not accept connections intermittently

2014-07-08 Thread Esteban Gutierrez
Hello Rural, It doesn't seem to be a problem from the region server from what I can tell. The RS is not showing in the logs any message about a long pause (unless you have a non standard log4j.properties file) and also if the RS was in a very long pause due GC or any other issue, then the master

Re: Region server not accept connections intermittently

2014-07-08 Thread Rural Hunter
No. I used the standard log4j file and there is not any network problem from the client. I checked the web admin ui and the master still take the slave as working. Just the request count is very small(about 10 while others are several hundreds). I sshed on the slave server and I can see the

Re: Region server not accept connections intermittently

2014-07-08 Thread Esteban Gutierrez
Hi Rural, Thats interesting. Since you are passing hbase.zookeeper.property.maxClientCnxns does it means that ZK is managed by HBase? If you experience the issue again, can you try to obtain a jstack (as the user that started the hbase process or try from the RS UI if responsive rs:port/dump) as

Re: Region server not accept connections intermittently

2014-07-08 Thread Rural Hunter
Hi Esteban, Yes I use the ZK managed by hbase. I will try to get the jstack and other info when this happens again. 于 2014/7/9 12:48, Esteban Gutierrez 写道: Hi Rural, Thats interesting. Since you are passing hbase.zookeeper.property.maxClientCnxns does it means that ZK is managed by HBase?