bq. the bulk of the work involves deleting the files from the column family
from HDFS

I think the first step when you delete files from column family is
archiving.

FYI

On Mon, Feb 8, 2016 at 7:53 AM, Cameron, David A <david.a.came...@lmco.com>
wrote:

> Hi,
>
> I'm working on a project where we have a strange use case.
>
> First off, we use bulk loading exclusively.  We never use the put or bulk
> put interface to load data into tables.
>
> We have drivers that make me want to segregate data by tables and column
> families.  Our data is clearly delineated by the job it came from.  We
> would like to quickly either delete, or export data from a given data set
> quickly.  To enable this I have been considering using column families to
> make it quick for us and easy on hbase to delete data that is no longer
> needed.
>
> It is my understanding that multiple column families bite you in the back
> side via the put interface and memstore.  That having multiple column
> families with different distributions among the partitions can cause
> lumpiness in your partitions.  I have convinced myself that because our key
> space is so incredibly consistent that we don't have the lumpiness issue.
>
> And so, I ask this, given that we don't use the memstore, are there any
> other drawbacks to using tables and column families to segregate data for
> easy/quick backup and deletion?  If you are wondering about our backup
> strategy it involves using snapshots and clones.  Once a table is cloned we
> can delete the column families from the table we don't want to export to
> tape.  And delete becomes quick because the bulk of the work involves
> deleting the files from the column family from HDFS.
>
> All feedback is greatly appreciated!
>
> Thanks
>
> Dave
>
>
>
>

Reply via email to