Sounds good.

Just opened ACCUMULO-4684 for docs.

On 7/24/17 2:13 PM, Adam J. Shook wrote:
Thanks, Josh.  As this is our stage cluster, we aren't too worried about the missing data; I just want to clean up the metadata so the queue looks better.  I'll take the back-fill approach and see how that goes.

--Adam

On Mon, Jul 24, 2017 at 1:55 PM, Josh Elser <[email protected] <mailto:[email protected]>> wrote:



    On 7/24/17 1:44 PM, Adam J. Shook wrote:

        We had some corrupt WAL blocks on our stage environment the
        other day and opted to delete them.  We not have some missing
        metadata and about 3k files pending for replication.  I've dug
        into it a bit and noticed that many of the WALs in the `order`
        queue of the replication table A) no longer exist in HDFS and B)
        have no entries in the `repl` section of the replication table.

        Based on the code, if there are no entries in the `repl`
        section, then the work will never be queued for completion via
        ZooKeeper and therefore never finished -- does this make sense?


    Yeah, that sounds about right. I'm lamenting that I never wrote up
    docs for the user-manual to cover the table-schema. I should ... do
    that...

    I think the order entry is created when the repl entry is. Would
    have to dig back into code though.

       What'd be the suggestion here

        to proceed?  I'm thinking a one-off tool to backfill the `repl`
        section should do the trick, but I am wondering if this is
        something that should be changed in Accumulo?


    A tool to back-fill makes sense to me. I'm not sure what we could do
    in Accumulo automatically. Any time there is data-loss (data gone
    missing or old data coming back), Accumulo really can't do anything
    on its own. As you described in your scenario, you made the
    conscious decision to nuke the files with missing blocks. However,
    providing tools to handle "common" failure scenarios outside of our
    purview sounds like a good idea.

    Improving our docs around how to "re-sync" two tables being
    replicated would also be great. We have the hammer via
    snapshot+export, just need to be clear with the instructions.

        Cheers,
        --Adam


Reply via email to