bq. the bulk of the work involves deleting the files from the column family
from HDFS

I think the first step when you delete files from column family is


On Mon, Feb 8, 2016 at 7:53 AM, Cameron, David A <>

> Hi,
> I'm working on a project where we have a strange use case.
> First off, we use bulk loading exclusively.  We never use the put or bulk
> put interface to load data into tables.
> We have drivers that make me want to segregate data by tables and column
> families.  Our data is clearly delineated by the job it came from.  We
> would like to quickly either delete, or export data from a given data set
> quickly.  To enable this I have been considering using column families to
> make it quick for us and easy on hbase to delete data that is no longer
> needed.
> It is my understanding that multiple column families bite you in the back
> side via the put interface and memstore.  That having multiple column
> families with different distributions among the partitions can cause
> lumpiness in your partitions.  I have convinced myself that because our key
> space is so incredibly consistent that we don't have the lumpiness issue.
> And so, I ask this, given that we don't use the memstore, are there any
> other drawbacks to using tables and column families to segregate data for
> easy/quick backup and deletion?  If you are wondering about our backup
> strategy it involves using snapshots and clones.  Once a table is cloned we
> can delete the column families from the table we don't want to export to
> tape.  And delete becomes quick because the bulk of the work involves
> deleting the files from the column family from HDFS.
> All feedback is greatly appreciated!
> Thanks
> Dave

Reply via email to