Thanks for clarifying. So given the region was already open for a while, I guess those were just empty recovered.edits dir under the region dir, and my previous assumption does not really apply here. I also had checked further on TableSnapshotInputFormat, then realised it actually performs a copy of table dir to a temporary, *restoreDir, *that should be passed as parameter to *TableSnapshotInputFormat.setInput *initialisation method:
https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java#L212 Note the method comments on this *restoreDir *param: > > *restoreDir a temporary directory to restore the snapshot into. Current > user should * have write permissions to this directory, and this should > not be a subdirectory of rootdir. * After the job is finished, restoreDir > can be deleted.* > Here's the point where snapshot data get copied to restoreDir: https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L509 So as long as we follow javadoc advice, our concerns about potential data loss is not valid. I guess problem here is that when table dir is recreated/copied to *restoreDir*, original ownership/permissions is preserved for the subdirs, such as regions recovered.edits. Em ter, 18 de jun de 2019 às 01:03, Jacob LeBlanc < [email protected]> escreveu: > First of all, thanks for the reply! I appreciate the time taken addressing > our issues. > > > It seems the mentioned "hiccup" caused RS(es) crash(es), as you got RITs > and recovered edits under these regions dirs. > > To give more context, I was making changes to increase snapshot timeout on > region servers and did a graceful restart, so I didn't mean to crash > anything, but it seems like I did this to too many region servers at once > (did about half the cluster) which seemed to result in some number of > regions getting stuck in transition. This was attempted on a live > production cluster so the hope was to do this without downtime but it > resulted in an outage to our application instead. Unfortunately master and > region server logs have since rolled and aged out so I don't have them > anymore. > > > The fact there was a "recovered" dir under some regions dirs means that > when the snapshot was taken, crashed RS(es) WAL(s) had been split, but not > completely replayed yet. > > Snapshot was taken many days later. File timestamps under recovered.edits > directory were from June 6th and snapshot from the pastebin was taken on > June 14th, but actually snapshots were taken many times with the same > result (ETL jobs are launched at least daily in oozie). Do you mean that if > a snapshot was taken before region was fully recovered it could result in > this state even if snapshot was subsequently deleted? > > > Would you know which specific hbase version is this? > > It is EMR 5.22 which runs HBase 1.4.9 (with some Amazon-specific edits > maybe? I noticed line numbers in HRegion.java in stack trace don't quite > line up with those in the 1.4.9 tag in github). > > > Could your job restore the snapshot into a temp table and then read from > this temp table using TableInputFormat, instead? > > Maybe we could do this, but it will take us some effort to make the > changes, test, release, etc... Of course we'd rather not jump through hoops > like this. > > > In this case, it's finding "recovered" folder under regions dir, so it > will replay the edits there. Looks like a problem with > TableSnapshotInputFormat, seems weird that it tries to delete edits on a > non-staging dir (your path suggests it's trying to delete the actual edit > folder), that could cause data loss if it would succeed to delete edits > before RSes actually replay it. > > I agree that this "seems weird" to me given that I am not intimately > familiar with all of the inner workings of hbase code. The potential data > loss is what I'm wondering about - would data loss have occurred if we > happened to execute our job under a user that had delete permissions in > HDFS directories? Or did the edits actually get replayed when regions were > in stuck and transition and the files just didn't get cleaned up? Is this > something for which I should file a defect in JIRA? > > Thanks again, > > --Jacob LeBlanc > > > -----Original Message----- > From: Wellington Chevreuil [mailto:[email protected]] > Sent: Monday, June 17, 2019 3:55 PM > To: [email protected] > Subject: Re: TableSnapshotInputFormat failing to delete files under > recovered.edits > > It seems the mentioned "hiccup" caused RS(es) crash(es), as you got RITs > and recovered edits under these regions dirs. The fact there was a > "recovered" dir under some regions dirs means that when the snapshot was > taken, crashed RS(es) WAL(s) had been split, but not completely replayed > yet. > > Since you are facing error when reading from table snapshot, and the stack > trace shows TableSnapshotInputFormat is using "HRegion.openHRegion" code > path to read snapshotted data, it will basically do the same as an RS would > when trying to assign a region. In this case, it's finding "recovered" > folder under regions dir, so it will replay the edits there. Looks like a > problem with TableSnapshotInputFormat, seems weird that it tries to delete > edits on a non-staging dir (your path suggests it's trying to delete the > actual edit folder), that could cause data loss if it would succeed to > delete edits before RSes actually replay it. Would you know which specific > hbase version is this? Could your job restore the snapshot into a temp > table and then read from this temp table using TableInputFormat, instead? > > Em seg, 17 de jun de 2019 às 17:22, Jacob LeBlanc < > [email protected]> escreveu: > > > Hi, > > > > We periodically execute Spark jobs to run ETL from some of our HBase > > tables to another data repository. The Spark jobs read data by taking > > a snapshot and then using the TableSnapshotInputFormat class. Lately > > we've been having some failures because when the jobs try to read the > > data, it is trying to delete files under the recovered.edits directory > > for some regions and the user under which we run the jobs doesn't have > > permissions to do that. Pastebin of the error and stack trace from one > > of our job logs is > > here: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__pastebin.com_MAhV > > c9JB&d=DwIFaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=-G7ASEzkT0cM96gyWHqBYm_tv-Vl8 > > sWyppvdo1zs_bg&m=FVOQFa9mNyURuYCEsxwgOABlbQ6Exqq8uj-miVRzIlo&s=yw1IpbL > > 4ALgFshBkYBmCNskREIo_RYDvLhjWd-dJ0yU&e= > > > > This has started happening since upgrading to EMR 5.22 where the > > recovered.edits directory is collocated with the WALs in HDFS where it > > used to be in S3-backed EMRFS. > > > > I have two questions regarding this: > > > > > > 1) First of why are these files under the recovered.edits directory? > > The timestamp of the files coincides with a hiccup we had with our > > cluster where I had to use "hbase hbck -fixAssignments" to fix regions > > that were stuck in transition. But that command seemed to work just > > fine and all regions were assigned and there have since been no > > inconsistencies. Does this mean the WALs were not replayed correctly? > > Does "hbase hbck -fixAssignments" not recover regions properly? > > > > 2) Why is our job trying to delete these files? I don't know enough > > to say for sure, but it seems like using TableSnapshotInputFormat to > > read snapshot data should not be trying recover or delete edits. > > > > I've fixed the problems by running "assign '<region>'" in hbase shell > > for every region that had files under the recovered.edits directory > > and those files seemed to be cleaned up when the assignment completed. > > But I'd like to understand this better especially if something is > > interfering with replaying edits from WALs (also making sure our ETL > > jobs don't start failing would be nice). > > > > Thanks! > > > > --Jacob LeBlanc > > > > >
