On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <[email protected]> wrote:
> A possible answer to keep all versions with no TTL, and do replication. At a 
> certain size this ceases to be practical though.
>

Discussing point-in-time-recovery here at our shop, and trying to
avoid having to keep all versions is what prompted the below issue:

 HBASE-4071  Data GC: Remove all versions > TTL EXCEPT the last
               written version (Lars Hofhansl)

You want to support being able to restore any version?

Our thought was that the TTL would be the window during which you
could get any version, a month say, and that thereafter, only the last
written would be kept.


> A typical scenario for relational databases is to take periodic base backups 
> and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all HBase 
> files that was done while HBase was running and I
> also have an archive of all WALs since the time when the distcp started.
>

So, you are thinking that you would replay all WALs from the cluster
from the point in time at which the hfile copy started?

That should work.

Would be nice if you could filter out complete WALs by looking at
"metadata", metadata that does not currently exist: e.g. metadata
could include what regions a WAL has edits for, the range of
timestamps.

Or, as in hbase-50, could roll logs first before staring the copy.
That'd narrow the number of WALs to replay for sure.

Would need a WAL to hfile mapreduce job.

I think the PITR would be easier if table-scoped.

Doing it cluster-wide would require our having the meta table in sync
as you say elsewhere.  Or, we just dump the state of meta when doing a
cluster backup at the end of PITR and restoring a cluster, the first
thing we'd do is replace .META. (Could be issue if tables deleted
between start of PITR and end).

> Could I theoretically restore HBase to a consistent state (at any time after 
> the distcp finished)? Or are there changes that are not
> WAL logged that I would miss (like admin actions)?
>

These are not logged currently but Dhruba just opened this:

 HBASE-4401 Record log region splits and region moves in the HLog

St.Ack

Reply via email to