All of the components that you need to perform point in time recovery of an 
Accumulo instance exist already. I have been working on a tool[1] in my copious 
amounts of free time to integrate them into something usable, but it doesn’t 
actually use the files in the trash. My approach is to let you determine the 
MTTR and then schedule your backups accordingly; a backup in case you are not 
able to recover your database using the techniques in the current 
documentation.  

 

[1] https://github.com/dlmarion/raccovery

 

From: James Hughes [mailto:[email protected]] 
Sent: Monday, August 17, 2015 4:28 PM
To: [email protected]
Subject: Re: Accumulo GC and Hadoop trash settings

 

Ok, I can the see the benefit of being able to recovery data.  Is this process 
documented?  And is there any kind of user-friendly tool for it?

 

On Mon, Aug 17, 2015 at 4:11 PM, <[email protected]> wrote:

 

 It's not temporary files, it's any file that has been compacted away. If you 
keep files around longer than {dfs.namenode.checkpoint.period}, then you have a 
chance to recover in case your most recent checkpoint is corrupt.

 

  _____  

From: "James Hughes" <[email protected]>
To: [email protected]
Sent: Monday, August 17, 2015 3:57:57 PM
Subject: Accumulo GC and Hadoop trash settings

 

 

Hi all,

 

>From reading about the Accumulo GC, it sounds like temporary files are 
>routinely deleted during GC cycles.  In a small testing environment, I've the 
>HDFS Accumulo user's .Trash folder have 10s of gigabytes of data.


Is there any reason that the default value for gc.trash.ignore is false?  Is 
there any downside to deleting GC'ed files completely?

 

Thanks in advance,

 

Jim

 

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_gc_trash_ignore

 

 

Reply via email to