[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-12363.
-----------------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed

Committed to 0.98, -1, and master.

Thanks for the reviews.

> Improve how KEEP_DELETED_CELLS works with MIN_VERSIONS
> ------------------------------------------------------
>
>                 Key: HBASE-12363
>                 URL: https://issues.apache.org/jira/browse/HBASE-12363
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>              Labels: Phoenix
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>         Attachments: 12363-0.98.txt, 12363-1.0.txt, 12363-master.txt, 
> 12363-test.txt, 12363-v2.txt, 12363-v3.txt
>
>
> Brainstorming...
> This morning in the train (of all places) I realized a fundamental issue in 
> how KEEP_DELETED_CELLS is implemented.
> The problem is around knowing when it is safe to remove a delete marker (we 
> cannot remove it unless all cells affected by it are remove otherwise).
> This was particularly hard for family marker, since they sort before all 
> cells of a row, and hence scanning forward through an HFile you cannot know 
> whether the family markers are still needed until at least the entire row is 
> scanned.
> My solution was to keep the TS of the oldest put in any given HFile, and only 
> remove delete markers older than that TS.
> That sounds good on the face of it... But now imagine you wrote a version of 
> ROW 1 and then never update it again. Then later you write a billion other 
> rows and delete them all. Since the TS of the cells in ROW 1 is older than 
> all the delete markers for the other billion rows, these will never be 
> collected... At least for the region that hosts ROW 1 after a major 
> compaction.
> Note, in a sense that is what HBase is supposed to do when keeping deleted 
> cells: Keep them until they would be removed by some other means (for example 
> TTL, or MAX_VERSION when new versions are inserted).
> The specific problem here is that even as all KVs affected by a delete marker 
> are expired this way the marker would not be removed if there just one older 
> KV in the HStore.
> I don't see a good way out of this. In parent I outlined these four solutions:
> So there are three options I think:
> # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
> apply to deleted rows or delete marker rows (wouldn't know how long to keep 
> family deletes in that case). (MAX)VERSIONS would still be enforced on all 
> rows types except for family delete markers.
> # Translate family delete markers to column delete marker at (major) 
> compaction time.
> # Change HFileWriterV* to keep track of the earliest put TS in a store and 
> write it to the file metadata. Use that use expire delete marker that are 
> older and hence can't affect any puts in the file.
> # Have Store.java keep track of the earliest put in internalFlushCache and 
> compactStore and then append it to the file metadata. That way HFileWriterV* 
> would not need to know about KVs.
> And I implemented #4.
> I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to