The Partition key is made of datetime(basically date
truncated to hour) and bucket.I think your RCA may be correct since we are
deleting the partition rows one by one not in a batch files maybe
overlapping for the particular partition.A scheduled thread picks the rows
for a partition based on current datetime and bucket number and checks
whether for each row the entiry is past due or not, if yes we trigger a
event and remove the entry.



On Tue 19 Jun, 2018, 7:58 PM Jeff Jirsa, <jji...@gmail.com> wrote:

> The most likely explanation is tombstones in files that won’t be collected
> as they potentially overlap data in other files with a lower timestamp
> (especially true if your partition key doesn’t change and you’re writing
> and deleting data within a partition)
>
> --
> Jeff Jirsa
>
>
> > On Jun 19, 2018, at 3:28 AM, Abhishek Singh <abh23...@gmail.com> wrote:
> >
> > Hi all,
> >            We using Cassandra for storing events which are time series
> based for batch processing once a particular batch based on hour is
> processed we delete the entries but we were left with almost 18% deletes
> marked as Tombstones.
> >                  I ran compaction on the particular CF tombstone didn't
> come down.
> >             Can anyone suggest what is the optimal tunning/recommended
> practice used for compaction strategy and GC_grace period with 100k entries
> and deletes every hour.
> >
> > Warm Regards
> > Abhishek Singh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Reply via email to