Re: Recovering from corrupt blocks in HFile

Jerry He Wed, 18 Mar 2015 14:43:35 -0700

For a 'fix' and 'recover' hfile tool at HBase level,  the relatively easy
thing we can recover is probably the data (KVs) up to the point when we hit
the first corruption caused exception.
After that, it will not be as easy.  For example, if the current key length
or value length is bad, there is no way to skip to the next KV.  We will
probably need to skip the whole current hblock, and go to the next block
for KVs assuming the hblock index is still good.


HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does an
incremental improvement to make sure we do get a corruption caused
exception so that the scan/read will not go into an infinite loop.

Jerry

On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <mike.dil...@synctree.com>
wrote:

> I haven't filed one myself, but I can do so if my investigation ends up
> finding something bug-worthy as opposed to just random failures due to
> out-of-disk scenarios.
>
> Unfortunately, I had to prioritize some other work this morning, so I
> haven't made it back to the bad node yet.
>
> I did attempt restarting the datanode to see if I could make hadoop fsck
> happy, but that didn't have any noticeable effect. I'm hoping to have more
> time this afternoon to investigate the other suggestions from this thread.
>
> -md
>
> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > 
> > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > If it's possible to recover all of the file except
> > > > a portion of the affected block, that would be OK too.
> > >
> > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need
> to
> > > add it so you can recover all but the bad block (we should figure how
> to
> > > skip the bad section also).
> >
> >
> > I was just getting caught up on this thread and had the same thought. Is
> > there an issue filed for this?
> >
> >
> > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mike.dil...@synctree.com
> >
> > > wrote:
> > >
> > > > Hi all-
> > > >
> > > > I've got an HFile that's reporting a corrupt block in "hadoop fsck"
> and
> > > was
> > > > hoping to get some advice on recovering as much data as possible.
> > > >
> > > > When I examined the blk-* file on the three data nodes that have a
> > > replica
> > > > of the affected block, I saw that the replicas on two of the
> datanodes
> > > had
> > > > the same SHA-1 checksum and that the replica on the other datanode
> was
> > a
> > > > truncated version of the replica found on the other nodes (as
> reported
> > > by a
> > > > difference at EOF by "cmp"). The size of the two identical blocks is
> > > > 67108864, the same as most of the other blocks in the file.
> > > >
> > > > Given that there were two datanodes with the same data and another
> with
> > > > truncated data, I made a backup of the truncated file and dropped the
> > > > full-length copy of the block in its place directly on the data
> mount,
> > > > hoping that this would cause HDFS to no longer report the file as
> > > corrupt.
> > > > Unfortunately, this didn't seem to have any effect.
> > > >
> > > >
> > > That seems like a reasonable thing to do.
> > >
> > > Did you restart the DN that was serving this block before you ran fsck?
> > > (Fsck asks namenode what blocks are bad; it likely is still reporting
> off
> > > old info).
> > >
> > >
> > >
> > > > Looking through the Hadoop source code, it looks like there is a
> > > > CorruptReplicasMap internally that tracks which nodes have "corrupt"
> > > copies
> > > > of a block. In HDFS-6663 <
> > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > >,
> > > > a "-blockId" parameter was added to "hadoop fsck" to allow dumping
> the
> > > > reason that a block ids is considered corrupt, but that wasn't added
> > > until
> > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > >
> > > >
> > > Good digging.
> > >
> > >
> > >
> > > > I also had a look at running the "HFile" tool on the affected file
> (cf.
> > > > section 9.7.5.2.2 at
> > http://hbase.apache.org/0.94/book/regions.arch.html
> > > ).
> > > > When I did that, I was able to see the data up to the corrupted block
> > as
> > > > far as I could tell, but then it started repeatedly looping back to
> the
> > > > first row and starting over. I believe this is related to the
> behavior
> > > > described in https://issues.apache.org/jira/browse/HBASE-12949
> > >
> > >
> > >
> > > So, your file is 3G and your blocks are 128M?
> > >
> > > The dfsclient should just pass over the bad replica and move on to the
> > good
> > > one so it would seem to indicate all replicas are bad for you.
> > >
> > > If you enable DFSClient DEBUG level logging it should report which
> blocks
> > > it is reading from. For example, here I am reading the start of the
> index
> > > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> > emissions
> > > only:
> > >
> > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > DFSClient
> > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > org.apache.hadoop.util.PureJavaCrc32 available
> > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > org.apache.hadoop.util.PureJavaCrc32C available
> > > SLF4J: Class path contains multiple SLF4J bindings.
> > > SLF4J: Found binding in
> > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > SLF4J: Found binding in
> > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > explanation.
> > > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > CacheConfig:disabled
> > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > LocatedBlocks{
> > >   fileLength=108633903
> > >   underConstruction=false
> > >
> > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > >
> > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > >   isLastBlockComplete=true}
> > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.27:50011
> > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.27:50011
> > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > LocatedBlocks{
> > >   fileLength=108633903
> > >   underConstruction=false
> > >
> > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > >
> > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > >   isLastBlockComplete=true}
> > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.30:50011
> > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.27:50011
> > >
> > > Do you see it reading from 'good' or 'bad' blocks?
> > >
> > > I added this line to hbase log4j.properties to enable DFSClient DEBUG:
> > >
> > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > >
> > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > >
> > >
> > >
> > > > My goal is to determine whether the block in question is actually
> > corrupt
> > > > and, if so, in what way.
> > >
> > >
> > > What happens if you just try to copy the file local or elsewhere in the
> > > filesystem using dfs shell. Do you get a pure dfs exception unhampered
> by
> > > hbaseyness?
> > >
> > >
> > >
> > > > If it's possible to recover all of the file except
> > > > a portion of the affected block, that would be OK too.
> > >
> > >
> > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need
> to
> > > add it so you can recover all but the bad block (we should figure how
> to
> > > skip the bad section also).
> > >
> > >
> > >
> > > > I just don't want to
> > > > be in the position of having to lose all 3 gigs of data in this
> > > particular
> > > > region, given that most of it appears to be intact. I just can't find
> > the
> > > > right low-level tools to let me determine the diagnose the exact
> state
> > > and
> > > > structure of the block data I have for this file.
> > > >
> > > >
> > > Nod.
> > >
> > >
> > >
> > > > Any help or direction that someone could provide would be much
> > > appreciated.
> > > > For reference, I'll repeat that our client is running Hadoop
> > > 2.0.0-cdh4.6.0
> > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > >
> > > >
> > > See if any of the above helps. I'll try and dig up some more tools in
> > > meantime.
> > > St.Ack
> > >
> > >
> > >
> > > > Thanks!
> > > >
> > > > -md
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: Recovering from corrupt blocks in HFile

Reply via email to