Logged HBASE-19333.
On Wed, Nov 22, 2017 at 1:11 PM, Ted Yu <[email protected]> wrote: > For getSnapshotFiles, it returns protobuf class. That was why it is > private. > > If we create POJO class for SnapshotFileInfo which is returned, I think > the method can become public. > > Cheers > > -------- Original message -------- > From: Timothy Brown <[email protected]> > Date: 11/22/17 12:52 PM (GMT-08:00) > To: [email protected] > Subject: Re: Deleting and cleaning old snapshots exported to S3 > > Hi Lex, > > We had a similar issue with our S3 bucket growing in size and we wrote our > own cleaner. The cleaner first looks at the HFiles required by the current > snapshots. We then figure out which snapshots we no longer want (for > example snapshots older than a week or whatever rules you want). Then we > find the HFiles that are only referenced by these unwanted snapshots and > delete these HFiles from S3. > > The tricky part is finding the HFiles for a given snapshot. There are two > ways to this. > > 1) Use: > > SnapshotDescription snapshotDesc = > SnapshotDescriptionUtils.readSnapshotInfo(fs, snapshotDir); > SnapshotReferenceUtil.visitReferencedFiles(conf, fs, snapshotDir, > snapshotDesc, snapshotVisitor) > > where snapshotVisitor is an implementation of the SnapshotVisitor interface > found here: > https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/ > hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ > SnapshotReferenceUtil.java#L63 > > 2) The ExportSnapshot class provides a private method that does this for > you. We ended up using reflection to make ExportSnapshot#getSnapshotFiles > public (see > https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/ > hbase-server/src/main/java/org/apache/hadoop/hbase/ > snapshot/ExportSnapshot.java#L539). > For example: > > Path snapshotPath = getCompletedSnapshotDir(snapshotName, rootDir); > Method method = ExportSnapshot.class.getDeclaredMethod("getSnapshotFiles", > Configuration.class, FileSystem.class, Path.class); > method.setAccessible(true); > List<Pair<SnapshotFileInfo, Long>> snapshotFiles = method.invoke(null, > conf, fs, snapshotPath); > > I would love to know how other people are tackling this issue as well. > > -Tim > > On Mon, Nov 20, 2017 at 7:45 PM, Lex Toumbourou <[email protected]> wrote: > > > Hi all, > > > > Wondering if I could get some help figuring out how to clean out old > > snapshots that have been exported to S3? > > > > We've been exporting snapshots to S3 using the export snapshot command: > > > > bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > > some-snapshot -copy-to s3a://some-bucket/hbase > > > > > > Now the size of the S3 bucket is getting a little out of control and I'd > > like to remove the old snapshots and let HBase garbage collect blocks no > > longer referenced. > > > > One idea I had was to spin up an entirely new cluster that uses the S3 > > bucket as the hbase.rootdir then just delete the snapshots as normal and > > maybe use cleaner_run to clean up the old files but it feels like > overkill > > having to spin up an entire cluster. > > > > So my question is: what's the best approach for deleting snapshots > exported > > to an s3 bucket and cleaning old store files no longer referenced? We are > > using HBase 1.3.1 on EMR. > > > > Thanks! > > > > Lex ToumbourouCTO at scrunch.com <http://scrunch.com/> > > >
