Ok,
I’m still a bit slow this morning … coffee is not helping…. ;-)
Are we talking HFile or just a single block in the HFile?
While it may be too late for Mike Dillon, here’s the question that the HBase
Devs are going to have to think…
How and when do you check on the correctness of the hdf
I didn't see any problems in my preliminary testing, but I'll let you know
if the team that works with this data reports anything weird. It seemed to
just skip past the missing data from what I saw.
-md
On Fri, Mar 20, 2015 at 12:56 PM, Jerry He wrote:
> Hi, Mike Dillon
>
> Do you see any probl
Hi, Mike Dillon
Do you see any problems after removing the corrupted hfile? HBase region
store keeps an internal list of hfiles for each store.
You can 'close' the region, then 'assign' it again to refresh the internal
list so that you won't see no more annoying exceptions. The command 'move'
wi
I wish it were possible to take that step back and determine the root cause
in this case, but I wasn't asked to look into the situation until a few
weeks after the corruption took place (as far as I can tell). At that
point, the logs that would have said what was happening at the time had
been rota
Sorry,
Can we take a step back? I’m a little slow this evening….
(FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink too
much Bourbon. I take no responsibility and blame my friends who are named Joe.
;-)
What caused the block to be corrupt?
Was it your typical HDFS where
Thank you!
On Thu, Mar 19, 2015 at 1:48 PM, Jerry He wrote:
> It is ok to delete the hfile in question with hadoop file system command.
> No restart of hbase is needed. You may see some error exceptions if there
> are things (user scan, compaction) on the fly. But it will be ok.
>
> Jerry
>
>
It is ok to delete the hfile in question with hadoop file system command.
No restart of hbase is needed. You may see some error exceptions if there
are things (user scan, compaction) on the fly. But it will be ok.
Jerry
On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon
wrote:
> So, it turns out t
So, it turns out that the client has an archived data source that can
recreate the HBase data in question if needed, so the need for me to
actually recover this HFile has diminished to the point where it's probably
not worth investing my time in creating a custom tool to extract the data.
Given th
>From HBase perspective, since we don't have a ready tool, the general idea
will need you to have access to HBase source code and write your own tool.
On the high level, the tool will read/scan the KVs from the hfile similar
to what the HFile tool does, while opening a HFileWriter to dump the good
I've had a chance to try out Stack's passed along suggestion of
HADOOP_ROOT_LOGGER="TRACE,console" hdfs dfs -cat and managed to get this:
https://gist.github.com/md5/d42e97ab7a0bd656f09a
After knowing what to look for, I was able to find the same checksum
failures in the logs during the major com
For a 'fix' and 'recover' hfile tool at HBase level, the relatively easy
thing we can recover is probably the data (KVs) up to the point when we hit
the first corruption caused exception.
After that, it will not be as easy. For example, if the current key length
or value length is bad, there is n
I haven't filed one myself, but I can do so if my investigation ends up
finding something bug-worthy as opposed to just random failures due to
out-of-disk scenarios.
Unfortunately, I had to prioritize some other work this morning, so I
haven't made it back to the bad node yet.
I did attempt resta
On Tue, Mar 17, 2015 at 9:47 PM, Stack wrote:
>
> > If it's possible to recover all of the file except
> > a portion of the affected block, that would be OK too.
>
> I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> add it so you can recover all but the bad block (we sho
On Tue, Mar 17, 2015 at 11:42 PM, Mike Dillon
wrote:
> Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that
> short-circuit reads are not turned on, but I'll double check when I follow
> up on this.
>
> The main issue that actually led to me being asked to look into this issue
Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that
short-circuit reads are not turned on, but I'll double check when I follow
up on this.
The main issue that actually led to me being asked to look into this issue
was that the cluster had a datanode running at 100% disk usage o
On Tue, Mar 17, 2015 at 9:47 PM, Stack wrote:
> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon
> wrote:
>
>> Hi all-
>>
>> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and
>> was
>> hoping to get some advice on recovering as much data as possible.
>>
>> When I examined the b
On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon
wrote:
> Hi all-
>
> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and was
> hoping to get some advice on recovering as much data as possible.
>
> When I examined the blk-* file on the three data nodes that have a replica
> of the
Hi all-
I've got an HFile that's reporting a corrupt block in "hadoop fsck" and was
hoping to get some advice on recovering as much data as possible.
When I examined the blk-* file on the three data nodes that have a replica
of the affected block, I saw that the replicas on two of the datanodes h
18 matches
Mail list logo