Re: Missing replication metadata

Josh Elser Mon, 24 Jul 2017 10:56:10 -0700


On 7/24/17 1:44 PM, Adam J. Shook wrote:

We had some corrupt WAL blocks on our stage environment the other dayand opted to delete them. We not have some missing metadata and about3k files pending for replication. I've dug into it a bit and noticedthat many of the WALs in the `order` queue of the replication table A)no longer exist in HDFS and B) have no entries in the `repl` section ofthe replication table.
Based on the code, if there are no entries in the `repl` section, thenthe work will never be queued for completion via ZooKeeper and thereforenever finished -- does this make sense?

Yeah, that sounds about right. I'm lamenting that I never wrote up docsfor the user-manual to cover the table-schema. I should ... do that...

I think the order entry is created when the repl entry is. Would have todig back into code though.


  What'd be the suggestion here

to proceed? I'm thinking a one-off tool to backfill the `repl` sectionshould do the trick, but I am wondering if this is something that shouldbe changed in Accumulo?

A tool to back-fill makes sense to me. I'm not sure what we could do inAccumulo automatically. Any time there is data-loss (data gone missingor old data coming back), Accumulo really can't do anything on its own.As you described in your scenario, you made the conscious decision tonuke the files with missing blocks. However, providing tools to handle"common" failure scenarios outside of our purview sounds like a good idea.

Improving our docs around how to "re-sync" two tables being replicatedwould also be great. We have the hammer via snapshot+export, just needto be clear with the instructions.

Cheers,
--Adam

Re: Missing replication metadata

Reply via email to