Re: Slow region server recoveries

Ted Yu Sun, 21 Apr 2013 10:57:31 -0700

Varun:
Thanks for trying out HBASE-8354 .

Can you move the text in Environment section of HBASE-8389 to Description ?


If you have a patch for HBASE-8389, can you upload it ?

Cheers

On Sun, Apr 21, 2013 at 10:38 AM, Varun Sharma <va...@pinterest.com> wrote:

> Hi Ted, Nicholas,
>
> Thanks for the comments. We found some issues with lease recovery and I
> patched HBASE 8354 to ensure we don't see data loss. Could you please look
> at HDFS 4721 and HBASE 8389 ?
>
> Thanks
> Varun
>
>
> On Sat, Apr 20, 2013 at 10:52 AM, Varun Sharma <va...@pinterest.com>
> wrote:
>
> > The important thing to note is the block for this rogue WAL is
> > UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
> > thing kicks in correctly for UNDER_RECOVERY blocks but failed.
> >
> >
> > On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma <va...@pinterest.com
> >wrote:
> >
> >> Hi Nicholas,
> >>
> >> Regarding the following, I think this is not a recovery - the file below
> >> is an HFIle and is being accessed on a get request. On this cluster, I
> >> don't have block locality. I see these exceptions for a while and then
> they
> >> are gone, which means the stale node thing kicks in.
> >>
> >> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> >> connect to /10.156.194.94:50010 for file
> >> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
> >> 02f639bb43944d4ba9abcf58287831c0
> >> for block
> >>
> >> This is the real bummer. The stale datanode is 1st even 90 seconds
> >> afterwards.
> >>
> >> *2013-04-19 00:28:35*,777 WARN
> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> >> hdfs://
> >>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed,
> returning error
> >> java.io.IOException: Cannot obtain block length for
> >>
> LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
> >> getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
> >> 10.156.192.106:50010, 10.156.195.38:50010]}*
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:117)
> >> >---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
> >> >---at
> >>
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
> >> >---at
> >> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
> >> >---at java.lang.Thread.run(Thread.java:662)
> >>
> >>
> >>
> >> On Sat, Apr 20, 2013 at 1:16 AM, Nicolas Liochon <nkey...@gmail.com
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I looked at it again with a fresh eye. As Varun was saying, the root
> >>> cause
> >>> is the wrong order of the block locations.
> >>>
> >>> The root cause of the root cause is actually simple: HBASE started the
> >>> recovery while the node was not yet stale from an HDFS pov.
> >>>
> >>> Varun mentioned this timing:
> >>> Lost Beat: 27:30
> >>> Became stale: 27:50 - * this is a guess and reverse engineered (stale
> >>> timeout 20 seconds)
> >>> Became dead: 37:51
> >>>
> >>> But the  recovery started at 27:13 (15 seconds before we have this log
> >>> line)
> >>> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> to
> >>> connect to /10.156.194.94:50010 for file
> >>>
> >>>
> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/02f639bb43944d4ba9abcf58287831c0
> >>> for block
> >>>
> >>>
> BP-696828882-10.168.7.226-1364886167971:blk_-5977178030490858298_99853:java.net.SocketTimeoutException:
> >>> 15000 millis timeout while waiting for channel to be ready for connect.
> >>> ch
> >>> : java.nio.channels.SocketChannel[connection-pending remote=/
> >>> 10.156.194.94:50010]
> >>>
> >>> So when we took the blocks from the NN, the datanode was not stale, so
> >>> you
> >>> have the wrong (random) order.
> >>>
> >>> ZooKeeper can expire a session before the timeout. I don't what why it
> >>> does
> >>> this in this case, but I don't consider it as a ZK bug: if ZK knows
> that
> >>> a
> >>> node is dead, it's its role to expire the session. There is something
> >>> more
> >>> fishy: we started the recovery while the datanode was still responding
> to
> >>> heartbeat. I don't know why. Maybe the OS has been able to kill 15 the
> RS
> >>> before vanishing away.
> >>>
> >>> Anyway, we then have an exception when we try to connect, because the
> RS
> >>> does not have a TCP connection to this datanode. And this is retried
> many
> >>> times.
> >>>
> >>> You would not have this with trunk, because HBASE-6435 reorders the
> >>> blocks
> >>> inside the client, using an information not available to the NN,
> >>> excluding
> >>> the datanode of the region server under recovery.
> >>>
> >>> Some conclusions:
> >>>  - we should likely backport hbase-6435 to 0.94.
> >>>  - I will revive HDFS-3706 and HDFS-3705 (the non hacky way to get
> >>> hbase-6435).
> >>>  - There are some stuff that could be better in HDFS. I will see.
> >>>  - I'm worried by the SocketTimeoutException. We should get
> NoRouteToHost
> >>> at a moment, and we don't. That's also why it takes ages. I think it's
> an
> >>> AWS thing, but it brings to issue: it's slow, and, in HBase, you don't
> >>> know
> >>> if the operation could have been executed or not, so it adds complexity
> >>> to
> >>> some scenarios. If someone with enough network and AWS knowledge could
> >>> clarify this point it would be great.
> >>>
> >>>  Cheers,
> >>>
> >>>  Nicolas
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Apr 19, 2013 at 10:10 PM, Varun Sharma <va...@pinterest.com>
> >>> wrote:
> >>>
> >>> > This is 0.94.3 hbase...
> >>> >
> >>> >
> >>> > On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma <va...@pinterest.com>
> >>> wrote:
> >>> >
> >>> > > Hi Ted,
> >>> > >
> >>> > > I had a long offline discussion with nicholas on this. Looks like
> the
> >>> > last
> >>> > > block which was still being written too, took an enormous time to
> >>> > recover.
> >>> > > Here's what happened.
> >>> > > a) Master split tasks and region servers process them
> >>> > > b) Region server tries to recover lease for each WAL log - most
> >>> cases are
> >>> > > noop since they are already rolled over/finalized
> >>> > > c) The last file lease recovery takes some time since the crashing
> >>> server
> >>> > > was writing to it and had a lease on it - but basically we have the
> >>> > lease 1
> >>> > > minute after the server was lost
> >>> > > d) Now we start the recovery for this but we end up hitting the
> stale
> >>> > data
> >>> > > node which is puzzling.
> >>> > >
> >>> > > It seems that we did not hit the stale datanode when we were trying
> >>> to
> >>> > > recover the finalized WAL blocks with trivial lease recovery.
> >>> However,
> >>> > for
> >>> > > the final block, we hit the stale datanode. Any clue why this might
> >>> be
> >>> > > happening ?
> >>> > >
> >>> > > Varun
> >>> > >
> >>> > >
> >>> > > On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu <yuzhih...@gmail.com>
> >>> wrote:
> >>> > >
> >>> > >> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
> >>> > >>
> >>> > >> Here is the criteria for stale node checking to kick in (from
> >>> > >>
> >>> > >>
> >>> >
> >>>
> https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
> >>> > >> ):
> >>> > >>
> >>> > >> +   * Check if the datanode is in stale state. Here if
> >>> > >> +   * the namenode has not received heartbeat msg from a
> >>> > >> +   * datanode for more than staleInterval (default value is
> >>> > >> +   * {@link
> >>> > >>
> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
> >>> > >> +   * the datanode will be treated as stale node.
> >>> > >>
> >>> > >>
> >>> > >> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma <
> va...@pinterest.com
> >>> >
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Is there a place to upload these logs ?
> >>> > >> >
> >>> > >> >
> >>> > >> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma <
> >>> va...@pinterest.com>
> >>> > >> > wrote:
> >>> > >> >
> >>> > >> > > Hi Nicholas,
> >>> > >> > >
> >>> > >> > > Attached are the namenode, dn logs (of one of the healthy
> >>> replicas
> >>> > of
> >>> > >> the
> >>> > >> > > WAL block) and the rs logs which got stuch doing the log
> split.
> >>> > Action
> >>> > >> > > begins at 2013-04-19 00:27*.
> >>> > >> > >
> >>> > >> > > Also, the rogue block is 5723958680970112840_174056. Its very
> >>> > >> interesting
> >>> > >> > > to trace this guy through the HDFS logs (dn and nn).
> >>> > >> > >
> >>> > >> > > Btw, do you know what the UNDER_RECOVERY stage is for, in
> HDFS ?
> >>> > Also
> >>> > >> > does
> >>> > >> > > the stale node stuff kick in for that state ?
> >>> > >> > >
> >>> > >> > > Thanks
> >>> > >> > > Varun
> >>> > >> > >
> >>> > >> > >
> >>> > >> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon <
> >>> nkey...@gmail.com
> >>> > >> > >wrote:
> >>> > >> > >
> >>> > >> > >> Thanks for the detailed scenario and analysis. I'm going to
> >>> have a
> >>> > >> look.
> >>> > >> > >> I can't access the logs (
> >>> ec2-107-20-237-30.compute-1.amazonaws.com
> >>> > >> > >> timeouts), could you please send them directly to me?
> >>> > >> > >>
> >>> > >> > >> Thanks,
> >>> > >> > >>
> >>> > >> > >> Nicolas
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma <
> >>> > va...@pinterest.com>
> >>> > >> > >> wrote:
> >>> > >> > >>
> >>> > >> > >> > Hi Nicholas,
> >>> > >> > >> >
> >>> > >> > >> > Here is the failure scenario, I have dug up the logs.
> >>> > >> > >> >
> >>> > >> > >> > A machine fails and stops accepting/transmitting traffic.
> The
> >>> > >> HMaster
> >>> > >> > >> > starts the distributed split for 13 tasks. There are 12
> >>> region
> >>> > >> > servers.
> >>> > >> > >> 12
> >>> > >> > >> > tasks succeed but the 13th one takes a looong time.
> >>> > >> > >> >
> >>> > >> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout
> >>> is 20
> >>> > >> > >> seconds.
> >>> > >> > >> > Both patches are there.
> >>> > >> > >> >
> >>> > >> > >> > a) Machine fails around 27:30
> >>> > >> > >> > b) Master starts the split around 27:40 and submits the
> >>> tasks.
> >>> > The
> >>> > >> one
> >>> > >> > >> task
> >>> > >> > >> > which fails seems to be the one which contains the WAL
> being
> >>> > >> currently
> >>> > >> > >> > written to:
> >>> > >> > >> >
> >>> > >> > >> > 2013-04-19 00:27:44,325 INFO
> >>> > >> > >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
> >>> Splitting
> >>> > >> hlog:
> >>> > >> > >> > hdfs://
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >>> > >> > >> > ,
> >>> > >> > >> > length=0
> >>> > >> > >> >
> >>> > >> > >> > Basically this region server picks up the task but finds
> the
> >>> > >> length of
> >>> > >> > >> this
> >>> > >> > >> > file to be 0 and drops. This happens again
> >>> > >> > >> >
> >>> > >> > >> > c) Finally another region server picks up the task but it
> >>> ends up
> >>> > >> > going
> >>> > >> > >> to
> >>> > >> > >> > the bad datanode which should not happen because of the
> stale
> >>> > node
> >>> > >> > >> timeout)
> >>> > >> > >> > Unfortunately it hits the 45 retries and a connect timeout
> >>> of 20
> >>> > >> > seconds
> >>> > >> > >> > every time. This delays recovery significantly. Now I guess
> >>> > >> reducing #
> >>> > >> > >> of
> >>> > >> > >> > retries to 1 is one possibility.
> >>> > >> > >> > But then the namenode logs are very interesting.
> >>> > >> > >> >
> >>> > >> > >> > d) Namenode seems to be in cyclic lease recovery loop until
> >>> the
> >>> > >> node
> >>> > >> > is
> >>> > >> > >> > marked dead. There is this one last block which exhibits
> >>> this.
> >>> > >> > >> >
> >>> > >> > >> > 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK*
> blk_-*
> >>> > >> > >> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
> >>> > >> > >> > primaryNodeIndex=1,
> >>> > >> > >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010
> |RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
> recovery
> >>> > >> started,
> >>> > >> > >> > primary=10.156.192.106:50010
> >>> > >> > >> > 2013-04-19 00:28:09,744 WARN
> >>> org.apache.hadoop.hdfs.StateChange:
> >>> > >> DIR*
> >>> > >> > >> > NameSystem.internalReleaseLease: File
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> /hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >>> > >> > >> > has not been closed. Lease recovery is in progress.
> >>> RecoveryId =
> >>> > >> > 174413
> >>> > >> > >> for
> >>> > >> > >> > block
> >>> > blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
> >>> > >> > >> > primaryNodeIndex=1,
> >>> > >> > >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010
> |RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
> >>> > >> > >> >
> >>> > >> > >> > I see this over and over again in the logs until the
> >>> datanode is
> >>> > >> > marked
> >>> > >> > >> > dead. It seems to be cycling through the replicas for this
> >>> WAL
> >>> > >> block
> >>> > >> > and
> >>> > >> > >> > trying to add it to the recovery list. I looked at the code
> >>> and
> >>> > it
> >>> > >> > says:
> >>> > >> > >> >
> >>> > >> > >> >       // Cannot close file right now, since the last block
> >>> > requires
> >>> > >> > >> > recovery.
> >>> > >> > >> >       // This may potentially cause infinite loop in lease
> >>> > recovery
> >>> > >> > >> >       // if there are no valid replicas on data-nodes.
> >>> > >> > >> >       NameNode.stateChangeLog.warn(
> >>> > >> > >> >                 "DIR* NameSystem.internalReleaseLease: " +
> >>> > >> > >> >                 "File " + src + " has not been closed." +
> >>> > >> > >> >                " Lease recovery is in progress. " +
> >>> > >> > >> >                 "RecoveryId = " + blockRecoveryId + " for
> >>> block
> >>> > " +
> >>> > >> > >> > lastBlock);
> >>> > >> > >> >       break;
> >>> > >> > >> >
> >>> > >> > >> > Eventually for this block, we get
> >>> > >> > >> >
> >>> > >> > >> > 2013-04-19 00:41:20,736 INFO
> >>> > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
> >>> > >> > >> > *5723958680970112840_174056*, newgenerationstamp=174413,
> >>> > >> > >> > newlength=119148648, newtargets=[10.156.192.106:50010,
> >>> > >> > >> 10.156.195.38:50010
> >>> > >> > >> > ],
> >>> > >> > >> > closeFile=true, deleteBlock=false)
> >>> > >> > >> > 2013-04-19 00:41:20,736 ERROR
> >>> > >> > >> > org.apache.hadoop.security.UserGroupInformation:
> >>> > >> > >> PriviledgedActionException
> >>> > >> > >> > as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> >>> > >> > >> > not found
> >>> > >> > >> > 2013-04-19 00:41:20,736 INFO org.apache.hadoop.ipc.Server:
> >>> IPC
> >>> > >> Server
> >>> > >> > >> > handler 35 on 8020, call
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
> >>> > >> > >> > from 10.156.192.106:53271: error: java.io.IOException:
> Block
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> >>> > >> > >> > not found
> >>> > >> > >> >
> >>> > >> > >> > On the datanode side, i see a call for recover blocks - I
> see
> >>> > that
> >>> > >> a
> >>> > >> > >> write
> >>> > >> > >> > pipeline is there, which gets terminated with some socket
> >>> > >> timeouts...
> >>> > >> > >> >
> >>> > >> > >> > 00:28:11,471 INFO
> >>> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> >>> > >> > >> NameNode
> >>> > >> > >> > at
> >>> > ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020calls
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
> >>> > >> > >> > targets=[10.156.194.94:50010, 10.156.192.106:50010,
> >>> > >> > 10.156.195.38:50010
> >>> > >> > >> ],
> >>> > >> > >> > newGenerationStamp=174413)
> >>> > >> > >> >
> >>> > >> > >> > Not sure but this looks like a case where data could be
> lost
> >>>   ?
> >>> > >> > >> >
> >>> > >> > >> > Varun
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >> > On Fri, Apr 19, 2013 at 12:38 AM, Nicolas Liochon <
> >>> > >> nkey...@gmail.com>
> >>> > >> > >> > wrote:
> >>> > >> > >> >
> >>> > >> > >> > > Hey Varun,
> >>> > >> > >> > >
> >>> > >> > >> > > Could you please share the logs and the configuration
> >>> (hdfs /
> >>> > >> hbase
> >>> > >> > >> > > settings + cluster description). What's the failure
> >>> scenario?
> >>> > >> > >> > > From an HDFS pov, HDFS 3703 does not change the dead node
> >>> > status.
> >>> > >> > But
> >>> > >> > >> > these
> >>> > >> > >> > > node will be given the lowest priority when reading.
> >>> > >> > >> > >
> >>> > >> > >> > >
> >>> > >> > >> > > Cheers,
> >>> > >> > >> > >
> >>> > >> > >> > > Nicolas
> >>> > >> > >> > >
> >>> > >> > >> > >
> >>> > >> > >> > > On Fri, Apr 19, 2013 at 3:01 AM, Varun Sharma <
> >>> > >> va...@pinterest.com>
> >>> > >> > >> > wrote:
> >>> > >> > >> > >
> >>> > >> > >> > > > Hi,
> >>> > >> > >> > > >
> >>> > >> > >> > > > We are facing problems with really slow HBase region
> >>> server
> >>> > >> > >> recoveries
> >>> > >> > >> > ~
> >>> > >> > >> > > 20
> >>> > >> > >> > > > minuted. Version is hbase 0.94.3 compiled with
> >>> > >> hadoop.profile=2.0.
> >>> > >> > >> > > >
> >>> > >> > >> > > > Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912
> >>> > patched
> >>> > >> and
> >>> > >> > >> > stale
> >>> > >> > >> > > > node timeouts configured correctly. Time for dead node
> >>> > >> detection
> >>> > >> > is
> >>> > >> > >> > still
> >>> > >> > >> > > > 10 minutes.
> >>> > >> > >> > > >
> >>> > >> > >> > > > We see that our region server is trying to read an HLog
> >>> is
> >>> > >> stuck
> >>> > >> > >> there
> >>> > >> > >> > > for
> >>> > >> > >> > > > a long time. Logs here:
> >>> > >> > >> > > >
> >>> > >> > >> > > > 2013-04-12 21:14:30,248 WARN
> >>> > org.apache.hadoop.hdfs.DFSClient:
> >>> > >> > >> Failed
> >>> > >> > >> > to
> >>> > >> > >> > > > connect to /10.156.194.251:50010 for file
> >>> > >> > >> > > >
> >>> > >> > >> > > >
> >>> > >> > >> > >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
> >>> > >> > >> > > > for block
> >>> > >> > >> > > >
> >>> > >> > >> > > >
> >>> > >> > >> > >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
> >>> > >> > >> > > > 15000 millis timeout while waiting for channel to be
> >>> ready
> >>> > for
> >>> > >> > read.
> >>> > >> > >> > ch :
> >>> > >> > >> > > > java.nio.channels.SocketChannel[connected local=/
> >>> > >> > >> 10.156.192.173:52818
> >>> > >> > >> > > > remote=/
> >>> > >> > >> > > > 10.156.194.251:50010]
> >>> > >> > >> > > >
> >>> > >> > >> > > > I would think that HDFS 3703 would make the server fail
> >>> fast
> >>> > >> and
> >>> > >> > go
> >>> > >> > >> to
> >>> > >> > >> > > the
> >>> > >> > >> > > > third datanode. Currently, the recovery seems way too
> >>> slow
> >>> > for
> >>> > >> > >> > production
> >>> > >> > >> > > > usage...
> >>> > >> > >> > > >
> >>> > >> > >> > > > Varun
> >>> > >> > >> > > >
> >>> > >> > >> > >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> > >
> >>> > >> > >
> >>> > >> >
> >>> > >>
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Slow region server recoveries

Reply via email to