Re: Recovering from corrupt blocks in HFile

2015-03-23 Thread Michael Segel
Ok, I’m still a bit slow this morning … coffee is not helping…. ;-) Are we talking HFile or just a single block in the HFile? While it may be too late for Mike Dillon, here’s the question that the HBase Devs are going to have to think… How and when do you check on the correctness of the hdf

Re: Recovering from corrupt blocks in HFile

2015-03-20 Thread Mike Dillon
I didn't see any problems in my preliminary testing, but I'll let you know if the team that works with this data reports anything weird. It seemed to just skip past the missing data from what I saw. -md On Fri, Mar 20, 2015 at 12:56 PM, Jerry He wrote: > Hi, Mike Dillon > > Do you see any probl

Re: Recovering from corrupt blocks in HFile

2015-03-20 Thread Jerry He
Hi, Mike Dillon Do you see any problems after removing the corrupted hfile? HBase region store keeps an internal list of hfiles for each store. You can 'close' the region, then 'assign' it again to refresh the internal list so that you won't see no more annoying exceptions. The command 'move' wi

Re: Recovering from corrupt blocks in HFile

2015-03-20 Thread Mike Dillon
I wish it were possible to take that step back and determine the root cause in this case, but I wasn't asked to look into the situation until a few weeks after the corruption took place (as far as I can tell). At that point, the logs that would have said what was happening at the time had been rota

Re: Recovering from corrupt blocks in HFile

2015-03-19 Thread Michael Segel
Sorry, Can we take a step back? I’m a little slow this evening…. (FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink too much Bourbon. I take no responsibility and blame my friends who are named Joe. ;-) What caused the block to be corrupt? Was it your typical HDFS where

Re: Recovering from corrupt blocks in HFile

2015-03-19 Thread Mike Dillon
Thank you! On Thu, Mar 19, 2015 at 1:48 PM, Jerry He wrote: > It is ok to delete the hfile in question with hadoop file system command. > No restart of hbase is needed. You may see some error exceptions if there > are things (user scan, compaction) on the fly. But it will be ok. > > Jerry > >

Re: Recovering from corrupt blocks in HFile

2015-03-19 Thread Jerry He
It is ok to delete the hfile in question with hadoop file system command. No restart of hbase is needed. You may see some error exceptions if there are things (user scan, compaction) on the fly. But it will be ok. Jerry On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon wrote: > So, it turns out t

Re: Recovering from corrupt blocks in HFile

2015-03-19 Thread Mike Dillon
So, it turns out that the client has an archived data source that can recreate the HBase data in question if needed, so the need for me to actually recover this HFile has diminished to the point where it's probably not worth investing my time in creating a custom tool to extract the data. Given th

Re: Recovering from corrupt blocks in HFile

2015-03-18 Thread Jerry He
>From HBase perspective, since we don't have a ready tool, the general idea will need you to have access to HBase source code and write your own tool. On the high level, the tool will read/scan the KVs from the hfile similar to what the HFile tool does, while opening a HFileWriter to dump the good

Re: Recovering from corrupt blocks in HFile

2015-03-18 Thread Mike Dillon
I've had a chance to try out Stack's passed along suggestion of HADOOP_ROOT_LOGGER="TRACE,console" hdfs dfs -cat and managed to get this: https://gist.github.com/md5/d42e97ab7a0bd656f09a After knowing what to look for, I was able to find the same checksum failures in the logs during the major com

Re: Recovering from corrupt blocks in HFile

2015-03-18 Thread Jerry He
For a 'fix' and 'recover' hfile tool at HBase level, the relatively easy thing we can recover is probably the data (KVs) up to the point when we hit the first corruption caused exception. After that, it will not be as easy. For example, if the current key length or value length is bad, there is n

Re: Recovering from corrupt blocks in HFile

2015-03-18 Thread Mike Dillon
I haven't filed one myself, but I can do so if my investigation ends up finding something bug-worthy as opposed to just random failures due to out-of-disk scenarios. Unfortunately, I had to prioritize some other work this morning, so I haven't made it back to the bad node yet. I did attempt resta

Re: Recovering from corrupt blocks in HFile

2015-03-18 Thread Andrew Purtell
​ On Tue, Mar 17, 2015 at 9:47 PM, Stack wrote: > > > If it's possible to recover all of the file except > > a portion of the affected block, that would be OK too. > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need to > add it so you can recover all but the bad block (we sho

Re: Recovering from corrupt blocks in HFile

2015-03-18 Thread Stack
On Tue, Mar 17, 2015 at 11:42 PM, Mike Dillon wrote: > Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that > short-circuit reads are not turned on, but I'll double check when I follow > up on this. > > The main issue that actually led to me being asked to look into this issue

Re: Recovering from corrupt blocks in HFile

2015-03-17 Thread Mike Dillon
Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that short-circuit reads are not turned on, but I'll double check when I follow up on this. The main issue that actually led to me being asked to look into this issue was that the cluster had a datanode running at 100% disk usage o

Re: Recovering from corrupt blocks in HFile

2015-03-17 Thread Stack
On Tue, Mar 17, 2015 at 9:47 PM, Stack wrote: > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon > wrote: > >> Hi all- >> >> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and >> was >> hoping to get some advice on recovering as much data as possible. >> >> When I examined the b

Re: Recovering from corrupt blocks in HFile

2015-03-17 Thread Stack
On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon wrote: > Hi all- > > I've got an HFile that's reporting a corrupt block in "hadoop fsck" and was > hoping to get some advice on recovering as much data as possible. > > When I examined the blk-* file on the three data nodes that have a replica > of the

Recovering from corrupt blocks in HFile

2015-03-17 Thread Mike Dillon
Hi all- I've got an HFile that's reporting a corrupt block in "hadoop fsck" and was hoping to get some advice on recovering as much data as possible. When I examined the blk-* file on the three data nodes that have a replica of the affected block, I saw that the replicas on two of the datanodes h