Couple of things: - Can you check the resources on the region server for which you get the lease exception? It seems like the server is heavily thrashed - What are your values for scan.setCaching and scan.setBatch?
The lease does not exist exception generally happens when the client goes back to the region server after the lease expires (in your case 900000). If you setCaching is really high for example, the client gets enough data in one call to scanner.next and keeps processing it for > 900000 ms and when it eventually goes back to the region server, the lease on the region server has already expired. Setting your setCaching value lower might help in this case Regards, Dhaval ________________________________ From: Ameya Kanitkar <am...@groupon.com> To: user@hbase.apache.org Sent: Wednesday, 28 August 2013 11:00 AM Subject: Lease Exception Errors When Running Heavy Map Reduce Job HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4456594242606811626' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor. Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb We have changed following settings in HBase to counter this problem but issue persists: <property> <!-- Loaded from hbase-site.xml --> <name>hbase.regionserver.lease.period</name> <value>900000</value> </property> <property> <!-- Loaded from hbase-site.xml --> <name>hbase.rpc.timeout</name> <value>900000</value> </property> We also reduced number of mappers per RS less than available CPU's on the box. We also observed that problem once happens, happens multiple times on the same RS. All other regions are unaffected. But different RS observes this problem on different days. There is no particular region causing this either. We are running: 0.94.2 with cdh4.2.0 Any ideas? Ameya