On Sun, Apr 28, 2013 at 7:37 PM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> So you mean that when the handler count is more than 5k this happens when
> it is lesser this does not.  Have you repeated this behaviour?
>

> What i doubt is when you say bouncing around different states i feel may be
> the ROOT assignment was a problem and something messed up there.
> If the reason was due to handler count then that needs different analysis.
>
> I think that if you can repeat the experiment and get the same behaviour,
> you can share the logs so that we can ascertain the exact problem.
>

Yeah I have repeated the behavior. But it seems the issue is due to some
weird pauses in the region server whenever I bump up the region handler
count (logs are below). I doubt the issue is GC, since it should not take
such a long time because this is happening on startup with 48GB heap size.
There are no active clients either.

I can safely say this is due to bumping up the region handler count is due
to the fact that I started 3 regionservers with 5000 handlers and 3 with
15000 handlers. The one's with 15000 spun up all IPC handlers and then
errored out as show in the logs below. This is just the logs around the
error. Before the error there were a few more timeouts.

I checked zookeeper servers (I have a 3-node cluster) and it did not GC
around the same time nor was it under any kind of load.

Thanks,
Viral

Region Server Logs
2013-04-29 08:00:55,512 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB,
free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0,
evicted=0, evictedPerRun=NaN
2013-04-29 08:02:35,674 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 40592ms for sessionid
0x703e48a8cfd81be6, closing socket connection and attempting reconnect
2013-04-29 08:02:36,286 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server 10.152.152.84:2181. Will not attempt to
authenticate using SASL (Unable to locate a login configuration)
2013-04-29 08:02:36,287 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to 10.152.152.84:2181, initiating session
2013-04-29 08:02:36,288 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x703e48a8cfd81be6 has expired,
closing socket connection
2013-04-29 08:03:16,287 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
<hostname>,60020,1367221255417:
regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6
regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6
received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:389)
        at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-04-29 08:03:16,288 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
<hostname>,60020,1367221255417: Unhandled exception:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
currently processing <hostname>,60020,1367221255417 as dead server
org.apache.hadoop.hbase.YouAreDeadException:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
currently processing <hostname>,60020,1367221255417 as dead server
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
        at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748)
        at java.lang.Thread.run(Thread.java:662)

Reply via email to