Hi All,
  I was just wondering if anyone had a chance too look at this issue, or 
experienced something similar. I'm stuck with this and would appreciate any 
hint.
Many thanks,
   Martin

On Dec 27, 2010, at 11:27 AM, Martin Arnandze wrote:

> Hi,
>   I'm using hbase 0.20.5 and hadoop 0.20.1. Some region servers are crashing, 
> saying that an file cannot be found, and that a lease has expired (log detail 
> below). Tried to find in this mailing list for the exact problem but was not 
> successful. These are the symptoms:
> 
> - Typically I see higher used swap on these servers than on the rest (the 
> ones that didn't crash). 
> - I don't see any Xciever count exceeded message on the DataNode logs.
> - Please find log snippets as well as configuration settings below.
>  - We don't write to WALs
> 
> I would greatly appreciate any hint on this regard.
> 
> Many thanks,
>    Martin
> 
> ======================
> 
> These are some of my settings which I think might be relevant:
> 
> hdfs-site:
> <property>
>    <name>dfs.datanode.max.xcievers</name>
>    <value>10000</value>
>  </property>
> 
> hbase-site:
> 
>  <property>
>    <name>hbase.regionserver.handler.count</name>
>    <value>50</value>
>  </property>
> 
>  <property>
>    <name>hbase.regionserver.global.memstore.upperLimit</name>
>    <value>0.3</value>
>  </property>
>  <property>
>    <name>hfile.block.cache.size</name>
>    <value>0.3</value>
>  </property>
> 
> 
> ulimit -n
> 100000
> 
> 
> LOGS:  
> 
> Region Server:
> 
> 2010-12-27 08:35:31,932 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished snapshotting, commencing flushing stores
> 2010-12-27 08:35:32,426 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Added 
> hdfs://ls-nn05.netw.domain01.net:50001/hbase/usertrails_all/1777872268/d/2971304353461036329,
>  entries=
> 22319, sequenceid=1717292326, memsize=16.1m, filesize=13.5m to 
> usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x019223370750718307807\x015692970,1292964088406
> 2010-12-27 08:35:32,426 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Caches flushed, doing commit now (which includes update scanners)
> 2010-12-27 08:35:32,426 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished memstore flush of ~16.1m for region 
> usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x019223370750718307807\x015692970,1292964088406
>  in 494ms, sequence id=1717292326, compaction requested=false
> 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Updates disabled for region, no outstanding scanners on 
> usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4
> f1d1\x019223370750718307807\x015692970,1292964088406
> 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> No more row locks outstanding on region 
> usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x0192233707
> 50718307807\x015692970,1292964088406
> 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> closed d
> 2010-12-27 08:35:32,427 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Closed 
> usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x019223370750718307807\x015692970,12929640884
> 06
> 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> closing hlog writer in 
> hdfs://ls-nn05.netw.domain01.net:50001/hbase/.logs/ls-hdfs24.netw.domain01.net,60020,12
> 92875777337
> 2010-12-27 08:35:32,497 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Exception: org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredEx
> ception: No lease on 
> /hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959
>  File does not exist. Holder DFSClient_52782511 does not have any open files.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1328)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1319)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1247)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:434)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)        at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)        at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)        at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>          at org.apache.hadoop.ipc.Client.call(Client.java:739)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy1.addBlock(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at $Proxy1.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2906)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2788)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2078)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2264)
> 
> 2010-12-27 08:35:32,497 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
> for block null bad datanode[0] nodes == null2010-12-27 08:35:32,497 WARN 
> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file 
> "/hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293
> 458980959" - Aborting...
> 2010-12-27 08:35:32,499 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Close and delete 
> failedorg.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /hbase/.logs/ls-hdfs24.netw.domain01.net,6002
> 0,1292875777337/hlog.dat.1293458980959 File does not exist. Holder 
> DFSClient_52782511 does not have any open files.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1328)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1319)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1247)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:434)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:626)
>         at java.lang.Thread.run(Thread.java:619)2010-12-27 08:35:32,500 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that 
> region server is shutting down at: 
> ls-hdfs24.netw.domain01.net,60020,12928757773
> 37
> 2010-12-27 08:35:32,508 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 
> ls-hdfs24.netw.domain01.net,60020,1292875777337
> 2010-12-27 08:35:32,509 INFO org.apache.zookeeper.ZooKeeper: Closing session: 
> 0x2d05625f9b0008
> 2010-12-27 08:35:32,509 INFO org.apache.zookeeper.ClientCnxn: Closing 
> ClientCnxn for session: 0x2d05625f9b0008
> 2010-12-27 08:35:32,512 INFO org.apache.zookeeper.ClientCnxn: Disconnecting 
> ClientCnxn for session: 0x2d05625f9b0008
> 2010-12-27 08:35:32,512 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x2d05625f9b0008 closed
> 2010-12-27 08:35:32,512 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with 
> ZooKeeper
> 2010-12-27 08:35:32,512 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> 2010-12-27 08:35:32,614 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> regionserver/10.32.20.70:60020 exiting
> 2010-12-27 08:35:49,448 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
> -4947199215968218273 lease expired
> 2010-12-27 08:35:49,538 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
> 3282292946771297386 lease expired
> 2010-12-27 08:35:49,582 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
> 7882165461457343907 lease expired
> 2010-12-27 08:35:49,582 INFO org.apache.hadoop.hbase.Leases: 
> regionserver/10.32.20.70:60020.leaseChecker closing leases
> 2010-12-27 08:35:49,582 INFO org.apache.hadoop.hbase.Leases: 
> regionserver/10.32.20.70:60020.leaseChecker closed leases
> 2010-12-27 08:35:49,607 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> 2010-12-27 08:35:49,607 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> 
> 
> 
> 
> Namenode:
> 
> 
> 2010-12-27 08:35:32,427 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 7 on 50001, call 
> addBlock(/hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959,
>  DFSClient_52782511) from 10.32.20.70:44967: error: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959
>  File does not exist. Holder DFSClient_52782511 does not have any open files.
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959
>  File does not exist. Holder DFSClient_52782511 does not have any open files.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1328)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1319)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1247)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:434)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>         
>         
>         
> Datanode:
> 
> Not too unusual. Some entries like the following:
> 
> 2010-12-27 08:35:52,519 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.32.20.70:50010, 
> storageID=DS-2137188581-173.192.229.228-50010-1273698157328, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:168)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>         at java.io.DataInputStream.read(DataInputStream.java:132)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-12-27 09:35:52,728 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> writeBlock blk_-4708329089501359494_30620223 received exception 
> java.net.SocketException: Connection reset
> 
> 

Reply via email to