Re: Tombstone removal optimization and question

kurt greaves Tue, 06 Nov 2018 02:59:23 -0800

Yes it does. Consider if it didn't and you kept writing to the same
partition, you'd never be able to remove any tombstones for that partition.


On Tue., 6 Nov. 2018, 19:40 DuyHai Doan <doanduy...@gmail.com wrote:

> Hello all
>
> I have tried to sum up all rules related to tombstone removal:
>
>
> ----------------------------------------------------------------------------------
>
> Given a tombstone written at timestamp (t) for a partition key (P) in
> SSTable (S1). This tombstone will be removed:
>
> 1) after gc_grace_seconds period has passed
> 2) at the next compaction round, if SSTable S1 is selected (not at all
> guaranteed because compaction is not deterministic)
> 3) if the partition key (P) is not present in any other SSTable that is
> NOT picked by the current round of compaction
>
> Rule 3) is quite complex to understand so here is the detailed explanation:
>
> If Partition Key (P) also exists in another SSTable (S2) that is NOT
> compacted together with SSTable (S1), if we remove the tombstone, there is
> some data in S2 that may resurrect.
>
> Precisely, at compaction time, Cassandra does not have ANY detail about
> Partition (P) that stays in S2 so it cannot remove the tombstone right away.
>
> Now, for each SSTable, we have some metadata, namely minTimestamp and
> maxTimestamp.
>
> I wonder if the current compaction optimization does use/leverage this
> metadata for tombstone removal. Indeed if we know that tombstone timestamp
> (t) < minTimestamp, it can be safely removed.
>
> Does someone has the info ?
>
> Regards
>
>
>

Re: Tombstone removal optimization and question

Reply via email to