On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <[email protected]> wrote: > A typical scenario for relational databases is to take periodic base backups > and also archive the log files. > Would that even work in HBase currently? Say I have distcp copy of all HBase > files that was done while HBase was running and I > also have an archive of all WALs since the time when the distcp started. > > Could I theoretically restore HBase to a consistent state (at any time after > the distcp finished)? Or are there changes that are not > WAL logged that I would miss (like admin actions)? >
I'm interested in this topic too. Related work was done up in hbase-50, snapshotting. Have you seen that lars? It'd roll WALs and make a manifest of all hfiles. A background could then copy off the hfiles in the manifests and WALs. IIRC, there was a restore from a snapshot mechanism too. Need to figure stuff like most recent sequenceid for a region and then discard all WAL edits that were done before this sequenceid. Reading head and tail of WAL we could figure what sequenceids it had (we should probably get the sequenceid out in the name of the file, at least the start sequenceid.... and perhaps even an accompanying metadata file or entry on the end of the WAL that had the list of regions for which the WAL had edits (maybe this is more trouble than its worth since there will be times when we don't close WAL properly). Sequenceids are kept by the regionserver. hfiles are by region which can move among regionservers. > If that works, a backup would involve these steps: > 1. Flush all stores. Flush would be nice but could take a good while to complete... could jepardize your snapshot. > 2. copy the files. I'd dump a manifest and background copy. Copy is going to be heavy-duty too I'd say if you let it run full belt. > 3. roll all logs. > > > #1 and #3 are really optional, #3 is good because it would make all logs > eligible for archiving right after the backup is done. > > > In any case some hooks to act upon HLog actions would be a good thing anyway. > For example we could add four new methods to WALObserver (or a new observer > type): > > boolean preLogRoll(Path newFile) > void postLogRoll(Path newFile) > > boolean preLogArchive(Path oldFile) > void postLogArchive(Path oldFile) > Is HBASE-4132 related? St.Ack
