Patrick, In the comments of ACCUMULO-456, I outlined out a procedure for doing this for 1.4.
By default cloning a table will flush anything in memory. Keith On Thu, Jul 5, 2012 at 10:13 AM, Adam Fuchs <[email protected]> wrote: > Hi Patrick, > > The short answer is yes, but there are a few caveats: > 1. As you said, information that is sitting in the in-memory map and in the > write-ahead log will not be in those files. You can periodically call flush > (Connector.getTableOperations().flush(...)) to guarantee that your data has > made it into the RFiles. > 2. Old data that has been deleted may reappear. RFiles can span multiple > tablets, which happens when tablets split. Often, one of the tablets > compacts, getting rid of delete keys. However, the file that holds the > original data is still in HDFS because it is referenced by another tablet > (or because it has not yet been garbage collected). If you're using Accumulo > in an append-only fashion, then this will not be a problem. > 3. For the same reasons as #2, if you're doing any aggregation you might run > into counts being incorrect. > > You might also check out the table cloning feature introduced in 1.4 as a > means for backing up a table: > http://accumulo.apache.org/1.4/user_manual/Table_Configuration.html#Cloning_Tables > > Cheers, > Adam > > > On Thu, Jul 5, 2012 at 9:52 AM, <[email protected]> wrote: >> >> users@accumulo, >> >> I need help understanding if one could recover or backup tables by taking >> their files stored in HDFS and reattaching them to tablet servers, even >> though this would mean the loss of information from recent mutations and >> write ahead logs. The documentation on recovery is focused on the failure of >> a tablet server, but, in the event of a failure of the master or other >> situation where the tablet servers cannot be utilized, it would be >> beneficial to know whether the files in HDFS can be used for recovery. >> >> Thanks, >> >> Patrick Lynch > >
