Tianying: Have you checked audit log on namenode for deletion event corresponding to the files involved in LeaseExpiredException ?
Cheers On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <[email protected]> wrote: > This time re-run passed (although with many failed/retry tasks) with my > throttle bandwidth as 200M(although by iftop, it never reach close to that > number). Is there a way to increase the lease expire time for low throttle > bandwidth for individual export job? > > Thanks > Tian-Ying > > > > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <[email protected]> > wrote: > > > yes, I am using the bandwidth throttle feature. The export job of this > > table actually succeed for its first run. When I rerun it (for my robust > > testing) it seems never pass. I am wondering if it has some werid state > (I > > did clean up the target cluster even removed > > /hbase/.archive/rich_pint_data_v1 folder) > > > > It seems even if I set the throttle value really large, it still fail. > And > > I think even after I replace the jar back to the one without throttle, it > > still fail for re-run. > > > > Is there some way that I can increase the lease to be very large to test > > it out? > > > > > > > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi < > [email protected] > > > wrote: > > > >> the file is the file in export, so you are creating that file. > >> do you have the bandwidth throttle on? > >> > >> I'm thinking that the file is slow writing: e.g. write(few bytes) wait > >> write(few bytes) > >> and on the wait your lease expire > >> or something like that can happen if your MR job is stuck in someway > (slow > >> machine or similar) and it is not writing within the lease timeout > >> > >> Matteo > >> > >> > >> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <[email protected]> > >> wrote: > >> > >> > we are using > >> > > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several > >> snapshot > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in > >> > exportSnapshot) > >> > > >> > I found when the LeaseExpiredException first reported, that file > indeed > >> > not there, and the map task retry. And I verifified couple minutes > >> later, > >> > that HFile does exist under /.archive. But the retry map task still > >> > complain the same error of file not exist... > >> > > >> > I will check the namenode log for the LeaseExpiredException. > >> > > >> > > >> > Thanks > >> > > >> > Tian-Ying > >> > > >> > > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <[email protected]> wrote: > >> > > >> > > Can you give us the hbase and hadoop releases you're using ? > >> > > > >> > > Can you check namenode log around the time LeaseExpiredException was > >> > > encountered ? > >> > > > >> > > Cheers > >> > > > >> > > > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <[email protected]> > >> > wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > When I export large table with 460+ regions, I saw the > >> exportSnapshot > >> > job > >> > > > fail sometime (not all the time). The error of the map task is > >> below: > >> > > But I > >> > > > verified the file highlighted below, it does exist. Smaller table > >> seems > >> > > > always pass. Any idea? Is it because it is too big and get session > >> > > timeout? > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > >> > > > No lease on > >> > > > > >> > > > >> > > >> > /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f > >> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 > does > >> > > > not have any open files. > >> > > > at > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396) > >> > > > at > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387) > >> > > > at > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183) > >> > > > at > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481) > >> > > > at > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) > >> > > > at > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) > >> > > > at org.apache.hadoop.ipc.ProtobufR > >> > > > > >> > > > > >> > > > > >> > > > Thanks > >> > > > > >> > > > Tian-Ying > >> > > > > >> > > > >> > > >> > > > > >
