Feng Honghua created HBASE-8753:
-----------------------------------

             Summary: Provide new delete flag which can delete all cells under 
a column-family which have a same designated timestamp
                 Key: HBASE-8753
                 URL: https://issues.apache.org/jira/browse/HBASE-8753
             Project: HBase
          Issue Type: New Feature
          Components: Deletes
            Reporter: Feng Honghua


In one of our production scenario (Xiaomi message search), multiple cells will 
be put in batch using a same timestamp with different column names under a 
specific column-family. 

And after some time these cells also need to be deleted in batch by given a 
specific timestamp. But the column names are parsed tokens which can be 
arbitrary words , so such batch delete is impossible without first retrieving 
all KVs from that CF and get the column name list which has KV with that given 
timestamp, and then issuing individual deleteColumn for each column in that 
column-list.

Though it's possible to do such batch delete, its performance is poor, and 
customers also find their code is quite clumsy by first retrieving and 
populating the column list and then issuing a deleteColumn for each column in 
that column-list.

This feature resolves this problem by introducing a new delete flag: 
DeleteFamilyVersion. 

  1). When you need to delete all KVs under a column-family with a given 
timestamp, just call Delete.deleteFamilyVersion(cfName, timestamp); only a 
DeleteFamilyVersion type KV is put to HBase (like DeleteFamily / DeleteColumn / 
Delete) without read operation;

  2). Like other delete types, DeleteFamilyVersion takes effect in 
get/scan/flush/compact operations, the ScanDeleteTracker now parses out and 
uses DeleteFamilyVersion to prevent all KVs under the specific CF which has the 
same timestamp as the DeleteFamilyVersion KV to pop-up as part of a get/scan 
result (also in flush/compact).

Our customers find this feature efficient, clean and easy-to-use since it does 
its work without knowing the exact column names list that needs to be deleted. 

This feature has been running smoothly for a couple of months in our production 
clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to