Any ideas? Anyone?

On Wed, Aug 28, 2013 at 9:36 AM, Ameya Kanitkar <am...@groupon.com> wrote:

> Thanks for your response.
>
> I checked namenode logs and I find following:
>
> 2013-08-28 15:25:24,025 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
> lease [Lease.  Holder:
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
> pendingcreates: 1],
> src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
> from client
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
> 2013-08-28 15:25:24,025 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
> lease=[Lease.  Holder:
> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
> pendingcreates: 1],
> src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
> 2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
> closed.
>
> There are LeaseException errors on namenode as well:
> http://pastebin.com/4feVcL1F Not sure why its happening.
>
> I do not think I am ending up with any timeouts, as my jobs fail within
> couple of minutes, while all my time outs are 10 minutes+
> Not sure why above would
>
> Ameya
>
>
>
> On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> From the log you posted on pastebin, I see the following.
>> Can you check namenode log to see what went wrong ?
>>
>>
>>    1. Caused by:
>>    org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>> on
>>
>>  
>> /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
>>    File does not exist. [Lease.  Holder:
>>
>>  
>> DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
>>    pendingcreates: 1]
>>
>>
>>
>> On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar <am...@groupon.com>
>> wrote:
>>
>> > HI All,
>> >
>> > We have a very heavy map reduce job that goes over entire table with
>> over
>> > 1TB+ data in HBase and exports all data (Similar to Export job but with
>> > some additional custom code built in) to HDFS.
>> >
>> > However this job is not very stable, and often times we get following
>> error
>> > and job fails:
>> >
>> > org.apache.hadoop.hbase.regionserver.LeaseException:
>> > org.apache.hadoop.hbase.regionserver.LeaseException: lease
>> > '-4456594242606811626' does not exist
>> >         at
>> > org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
>> >         at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
>> >         at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
>> >
>> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> > Method)
>> >         at
>> >
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> >         at
>> >
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> >         at java.lang.reflect.Constructor.newInstance(Constructor.
>> >
>> >
>> > Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
>> >
>> > We have changed following settings in HBase to counter this problem
>> > but issue persists:
>> >
>> > <property>
>> > <!-- Loaded from hbase-site.xml -->
>> > <name>hbase.regionserver.lease.period</name>
>> > <value>900000</value>
>> > </property>
>> >
>> > <property>
>> > <!-- Loaded from hbase-site.xml -->
>> > <name>hbase.rpc.timeout</name>
>> > <value>900000</value>
>> > </property>
>> >
>> >
>> > We also reduced number of mappers per RS less than available CPU's on
>> the
>> > box.
>> >
>> > We also observed that problem once happens, happens multiple times on
>> > the same RS. All other regions are unaffected. But different RS
>> > observes this problem on different days. There is no particular region
>> > causing this either.
>> >
>> > We are running: 0.94.2 with cdh4.2.0
>> >
>> > Any ideas?
>> >
>> >
>> > Ameya
>> >
>>
>
>

Reply via email to