churro morales created HBASE-13031:
--------------------------------------

             Summary: Ability to snapshot based on a key range
                 Key: HBASE-13031
                 URL: https://issues.apache.org/jira/browse/HBASE-13031
             Project: HBase
          Issue Type: Brainstorming
    Affects Versions: 0.94.26, 2.0.0, 1.1.0, 0.98.11
            Reporter: churro morales
            Assignee: churro morales
            Priority: Critical


Posted on the mailing list and seems like some people are interested.  A little 
background for everyone.

We have a very large table, we would like to snapshot and transfer the data to 
another cluster (compressed data is always better to ship).  Our problem lies 
in the fact it could take many weeks to transfer all of the data and during 
that time with major compactions, the data stored in dfs has the potential to 
double which would cause us to run out of disk space.

So we were thinking about allowing the ability to snapshot a specific key 
range.  

Ideally I feel the approach is that the user would specify a start and stop 
key, those would be associated with a region boundary.  If between the time the 
user submits the request and the snapshot is taken the boundaries change (due 
to merging or splitting of regions) the snapshot should fail.

We would know which regions to snapshot and if those changed between when the 
request was submitted and the regions locked, the snapshot could simply fail 
and the user would try again, instead of potentially giving the user more / 
less than what they had anticipated.  I was planning on storing the start / 
stop key in the SnapshotDescription and from there it looks pretty straight 
forward where we just have to change the verifier code to accommodate the key 
ranges.  

If this design sounds good to anyone, or if I am overlooking anything please 
let me know.  Once we agree on the design, I'll write and submit the patches.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to