When the data expired (after TTL of 7 days), at the next compaction they
are transformed into tombstonnes and will still stay there during
gc_grace_seconds. After that, they (the tombstonnes) will be completely
removed at the next compaction, if there is any ...

So doing some maths, supposing that you have let gc_grace_seconds to its
default value of 10 days then you'll have tombstonnes for 10 days worth of
data before they got eventually removed...

What is your compaction strategy ? I strongly suggest

1) Setting TTL directly as the table property (ALTER TABLE) instead of
setting it at query level (INSERT INTO ... USING TTL). When setting TTL at
table level, Cassandra can perform some optimization and drop entirely some
SSTable and don't even bother compact them

2) Use TimeWindowCompactionStrategy and tune it properly to accomodate your
workload



On Sat, Jan 28, 2017 at 5:30 PM, John Sanda <john.sa...@gmail.com> wrote:

> I have a time series data model that is basically:
>
> CREATE TABLE metrics (
>     id text,
>     time timeuuid,
>     value double,
>     PRIMARY KEY (id, time)
> ) WITH CLUSTERING ORDER BY (time DESC);
>
> I do append-only writes, no deletes, and use a TTL of seven days. Data
> points are written every seconds. The UI queries data for the past hour,
> two hours, day, or week. The UI refreshes and executes queries every 30
> seconds. In one test environment I am seeing lots of tombstone threshold
> warnings and Cassandra has even OOME'd. Since I am storing data in
> descending order and always query for recent data, I do not understand why
> I am running into this problem.
>
> I know that it is recommended to do some date partitioning in part to
> ensure partitions do not grow too large. I already have some changes in
> place to partition by day.. Before I make those changes I want to
> understand why I am scanning so many tombstones so that I can be more
> confident that the date partitioning changes will help.
>
> Thanks
>
> - John
>

Reply via email to