2015-02-23 20:25 GMT-05:00 Arinto Murdopo <ari...@gmail.com>:

> @JM:
> You mentioned about deleting "the files", are you referring to HDFS files
> or file on HBase?
>

Your HBase files are stored in HDFS. So I think we are refering to the same
thing. Look into /hbase in our HDFS to find HBase files.



>
> Our cluster have 15 nodes. We used 14 of them as DN. Actually we tried to
> enable the remaining one as DN (so that we have 15 DN), but then we
> disabled it (so now we have 14 again). Probably our crawlers write some
> data into the additional DN without any replication. Maybe I could try to
> enable again the DN.
>

That's a very valid option. If you still have the DN directories, just
enable it back to see if you can recover the blocks...



> I don't have the list of the corrupted files yet. I notice that when I try
> to Get some of the files, my HBase client code throws these exceptions:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=2, exceptions:
> Mon Feb 23 17:49:32 SGT 2015,
> org.apache.hadoop.hbase.client.HTable$3@11ff4a1c,
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
>
> plr_sg_insta_media_live,\x0177998597896:953:5:a5:58786,1410771627251.6c323832d2dc77c586f1cf6441c7ef6e.
>

FSCK should give ou the list of corrupt files. Can you extract it from
there?



>
> Can I use these exceptions to determine the corrupted files?
> The files are media data (images or videos) obtained from the internet.
>

This exception gives you all the hints for a directory most probably under
/hbase/plr_sg_insta_media_live/6c323832d2dc77c586f1cf6441c7ef6e

Files under this directory might be corrupted but you need to find which
files. If it's a HFiles it's easy. If it's the .regioninfo it's a bit more
tricky.

JM



> Arinto
> www.otnira.com
>
> On Tue, Feb 24, 2015 at 8:06 AM, Michael Segel <mse...@segel.com> wrote:
>
> > I’m sorry, but I implied checking the checksums of the blocks.
> > Didn’t think I needed to spell it out.  Next time I’ll be a bit more
> > precise.
> >
> > > On Feb 23, 2015, at 2:34 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
> > >
> > > HBase/HDFS are maintaining block checksums, so presumably a corrupted
> > block
> > > would fail checksum validation. Increasing the number of replicas
> > increases
> > > the odds that you'll still have a valid block. I'm not an HDFS expert,
> > but
> > > I would be very surprised if HDFS is validating a "questionable block"
> > via
> > > byte-wise comparison over the network amongst the replica peers.
> > >
> > > On Mon, Feb 23, 2015 at 12:25 PM, Michael Segel <mse...@segel.com>
> > wrote:
> > >
> > >>
> > >> On Feb 23, 2015, at 1:47 AM, Arinto Murdopo <ari...@gmail.com> wrote:
> > >>
> > >> We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop
> > >> 2.0.0-cdh4.6.0).
> > >> For all of our tables, we set the replication factor to 1
> > (dfs.replication
> > >> = 1 in hbase-site.xml). We set to 1 because we want to minimize the
> HDFS
> > >> usage (now we realize we should set this value to at least 2, because
> > >> "failure is a norm" in distributed systems).
> > >>
> > >>
> > >>
> > >> Sorry, but you really want this to be a replication value of at least
> 3
> > >> and not 2.
> > >>
> > >> Suppose you have corruption but not a lost block. Which copy of the
> two
> > is
> > >> right?
> > >> With 3, you can compare the three and hopefully 2 of the 3 will match.
> > >>
> > >>
> >
> >
>

Reply via email to