Thanks for the confirmation Kurt Le 6 nov. 2018 11:59, "kurt greaves" <k...@instaclustr.com> a écrit :
> Yes it does. Consider if it didn't and you kept writing to the same > partition, you'd never be able to remove any tombstones for that partition. > > On Tue., 6 Nov. 2018, 19:40 DuyHai Doan <doanduy...@gmail.com wrote: > >> Hello all >> >> I have tried to sum up all rules related to tombstone removal: >> >> ------------------------------------------------------------ >> ---------------------- >> >> Given a tombstone written at timestamp (t) for a partition key (P) in >> SSTable (S1). This tombstone will be removed: >> >> 1) after gc_grace_seconds period has passed >> 2) at the next compaction round, if SSTable S1 is selected (not at all >> guaranteed because compaction is not deterministic) >> 3) if the partition key (P) is not present in any other SSTable that is >> NOT picked by the current round of compaction >> >> Rule 3) is quite complex to understand so here is the detailed >> explanation: >> >> If Partition Key (P) also exists in another SSTable (S2) that is NOT >> compacted together with SSTable (S1), if we remove the tombstone, there is >> some data in S2 that may resurrect. >> >> Precisely, at compaction time, Cassandra does not have ANY detail about >> Partition (P) that stays in S2 so it cannot remove the tombstone right away. >> >> Now, for each SSTable, we have some metadata, namely minTimestamp and >> maxTimestamp. >> >> I wonder if the current compaction optimization does use/leverage this >> metadata for tombstone removal. Indeed if we know that tombstone timestamp >> (t) < minTimestamp, it can be safely removed. >> >> Does someone has the info ? >> >> Regards >> >> >>