The s3a Hadoop FileSystem isn't robust enough to support the requirements Accumulo has to guarantee no data loss around Write-Ahead Logs.

You can use the ExportSnapshot tool for Accumulo to get an immutable "picture" of a table. The expectation is that you would use DistCp to copy the files referenced by this snapshot to some other "cold" storage.

The downside of this approach is that each snapshot is a full copy. There is no such thing as an incremental snapshot.

Hypothetically, you could build some additional logic which would prevent re-copying a file to your cold-storage (all Accumulo files are immutable, thus if Snapshot1 already referenced fileA, you wouldn't need to re-copy fileA if Snapshot2 also references it). This is left as an exercise to the user :)

On 10/3/17 4:40 PM, Christopher wrote:
Hi Mike. This is a great question. Accumulo has several options for backup.

Accumulo is backed by HDFS for persisting its data on disk. It may be possible to use S3 directly at this layer. I'm not sure what the current state is for doing something like this, but a brief Googling for "HDFS on S3" shows a few historical projects which may still be active and mature.

Accumulo also has a replication feature to automatically mirror live ingest to a pluggable external receiver, which could be a backup service you've written to store data in S3. Recovery would depend on how you store the data in S3. You could also implement an ingest system which stores data to a backup as well as to Accumulo, to handle both live and bulk ingest.

Accumulo also has an "exporttable" feature, which exports the metadata for a table, along with a list of files in HDFS for you to back up to S3 (or another file system). Recovery involves using the "importtable" feature which recreates the metadata, and bulk importing the files after you've moved them from your backup location back onto HDFS.

This is just a rough outline of 3 possible solutions. I don't know which (if any) would match your requirements best. There may be many other solutions as well.

On Tue, Oct 3, 2017 at 4:10 PM <[email protected] <mailto:[email protected]>> wrote:

    Please forgive the newbie question. What options are there for
    backup and recovery of accumulo data?____

    __ __

    Ideally I would like something that would replicate to S3 in
    realtime.____


Reply via email to