Thanks for a very helpful reply.
Will try to refactor the code accordingly.

On Tue, Jan 16, 2018 at 4:36 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> I would not plan on deleting data at the row level as you'll end up with a
> lot of tombstones eventually (and you won't even notice them).
> It's not healthy to allow that many tombstones to be read, and while your
> latency may fit your SLA now, it may not in the future.
> Tombstones are going to create a lot of heap pressure and eventually
> trigger long GC pauses, which then tend to affect the whole cluster (a slow
> node is worse than a down node).
>
> You should definitely separate data that is TTLed and data that is not in
> different tables so that you can adjust compaction strategies,
> gc_grace_seconds and read patterns accordingly. I understand that it will
> complexify your code, but it will prevent severe performance issues in
> Cassandra.
>
> Tombstones won't be a problem for repair, they will get repaired as
> classic cells. They negatively affect the read path mostly, and use space
> on disk.
>
> On Tue, Jan 16, 2018 at 2:12 PM Python_Max <python....@gmail.com> wrote:
>
>> Hello.
>>
>> I was planning to remove a row (not partition).
>>
>> Most of the tombstones are seen in the use case of geographic grid with
>> X:Y as partition key and object id (timeuuid) as clustering key where
>> objects could be temporary with TTL about 10 hours or fully persistent.
>> When I select all objects in specific X:Y I can even hit 100k (default)
>> limit for some X:Y. I have changed this limit to 500k since 99.9p read
>> latency is < 75ms so I should not (?) care how many tombstones while read
>> latency is fine.
>>
>> Splitting entities to temporary and permanent and using different
>> compaction strategies is an option but it will lead to code duplication and
>> 2x read queries.
>>
>> Is my assumption correct about tombstones are not so big problem as soon
>> as read latency and disk usage are okey? Are tombstones affect repair time
>> (using reaper)?
>>
>> Thanks.
>>
>>
>> On Tue, Jan 16, 2018 at 11:32 AM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi,
>>>
>>> could you be more specific about the deletes you're planning to perform ?
>>> This will end up moving your problem somewhere else as you'll be
>>> generating new tombstones (and if you're planning on deleting rows, be
>>> aware that row level tombstones aren't reported anywhere in the metrics,
>>> logs and query traces).
>>> Currently you can delete your data at the partition level, which will
>>> create a single tombstone that will shadow all your expired (and non
>>> expired) data and is very efficient. The read path is optimized for such
>>> tombstones and the data won't be fully read from disk nor exchanged between
>>> replicas. But that's of course if your use case allows to delete full
>>> partitions.
>>>
>>> We usually model so that we can restrict our reads to live data.
>>> If you're creating time series, your clustering key should include a
>>> timestamp, which you can use to avoid reading expired data. If your TTL is
>>> set to 60 days, you can read only data that is strictly younger than that.
>>> Then you can partition by time ranges, and access exclusively partitions
>>> that have no chance to be expired yet.
>>> Those techniques usually work better with TWCS, but the former could
>>> make you hit a lot of SSTables if your partitions can spread over all time
>>> buckets, so only use TWCS if you can restrict individual reads to up to 4
>>> time windows.
>>>
>>> Cheers,
>>>
>>>
>>> On Tue, Jan 16, 2018 at 10:01 AM Python_Max <python....@gmail.com>
>>> wrote:
>>>
>>>> Hi.
>>>>
>>>> Thank you very much for detailed explanation.
>>>> Seems that there is nothing I can do about it except delete records by
>>>> key instead of expiring.
>>>>
>>>>
>>>> On Fri, Jan 12, 2018 at 7:30 PM, Alexander Dejanovski <
>>>> a...@thelastpickle.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> As DuyHai said, different TTLs could theoretically be set for
>>>>> different cells of the same row. And one TTLed cell could be shadowing
>>>>> another cell that has no TTL (say you forgot to set a TTL and set one
>>>>> afterwards by performing an update), or vice versa.
>>>>> One cell could also be missing from a node without Cassandra knowing.
>>>>> So turning an incomplete row that only has expired cells into a tombstone
>>>>> row could lead to wrong results being returned at read time : the 
>>>>> tombstone
>>>>> row could potentially shadow a valid live cell from another replica.
>>>>>
>>>>> Cassandra needs to retain each TTLed cell and send it to replicas
>>>>> during reads to cover all possible cases.
>>>>>
>>>>>
>>>>> On Fri, Jan 12, 2018 at 5:28 PM Python_Max <python....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you for response.
>>>>>>
>>>>>> I know about the option of setting TTL per column or even per item in
>>>>>> collection. However in my example entire row has expired, shouldn't
>>>>>> Cassandra be able to detect this situation and spawn a single tombstone 
>>>>>> for
>>>>>> entire row instead of many?
>>>>>> Is there any reason not doing this except that no one needs it? Is
>>>>>> this suitable for feature request or improvement?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> On Wed, Jan 10, 2018 at 4:52 PM, DuyHai Doan <doanduy...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> "The question is why Cassandra creates a tombstone for every column
>>>>>>> instead of single tombstone per row?"
>>>>>>>
>>>>>>> --> Simply because technically it is possible to set different TTL
>>>>>>> value on each column of a CQL row
>>>>>>>
>>>>>>> On Wed, Jan 10, 2018 at 2:59 PM, Python_Max <python....@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello, C* users and experts.
>>>>>>>>
>>>>>>>> I have (one more) question about tombstones.
>>>>>>>>
>>>>>>>> Consider the following example:
>>>>>>>> cqlsh> create keyspace test_ttl with replication = {'class':
>>>>>>>> 'SimpleStrategy', 'replication_factor': '1'}; use test_ttl;
>>>>>>>> cqlsh> create table items(a text, b text, c1 text, c2 text, c3
>>>>>>>> text, primary key (a, b));
>>>>>>>> cqlsh> insert into items(a,b,c1,c2,c3) values('AAA', 'BBB', 'C111',
>>>>>>>> 'C222', 'C333') using ttl 60;
>>>>>>>> bash$ nodetool flush
>>>>>>>> bash$ sleep 60
>>>>>>>> bash$ nodetool compact test_ttl items
>>>>>>>> bash$ sstabledump mc-2-big-Data.db
>>>>>>>>
>>>>>>>> [
>>>>>>>>   {
>>>>>>>>     "partition" : {
>>>>>>>>       "key" : [ "AAA" ],
>>>>>>>>       "position" : 0
>>>>>>>>     },
>>>>>>>>     "rows" : [
>>>>>>>>       {
>>>>>>>>         "type" : "row",
>>>>>>>>         "position" : 58,
>>>>>>>>         "clustering" : [ "BBB" ],
>>>>>>>>         "liveness_info" : { "tstamp" : "2018-01-10T13:29:25.777Z",
>>>>>>>> "ttl" : 60, "expires_at" : "2018-01-10T13:30:25Z", "expired" : true },
>>>>>>>>         "cells" : [
>>>>>>>>           { "name" : "c1", "deletion_info" : { "local_delete_time"
>>>>>>>> : "2018-01-10T13:29:25Z" }
>>>>>>>>           },
>>>>>>>>           { "name" : "c2", "deletion_info" : { "local_delete_time"
>>>>>>>> : "2018-01-10T13:29:25Z" }
>>>>>>>>           },
>>>>>>>>           { "name" : "c3", "deletion_info" : { "local_delete_time"
>>>>>>>> : "2018-01-10T13:29:25Z" }
>>>>>>>>           }
>>>>>>>>         ]
>>>>>>>>       }
>>>>>>>>     ]
>>>>>>>>   }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> The question is why Cassandra creates a tombstone for every column
>>>>>>>> instead of single tombstone per row?
>>>>>>>>
>>>>>>>> In production environment I have a table with ~30 columns and It
>>>>>>>> gives me a warning for 30k tombstones and 300 live rows. It is 30 times
>>>>>>>> more then it could be.
>>>>>>>> Can this behavior be tuned in some way?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Python_Max.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Python_Max.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Python_Max.
>>>>
>>>
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Best regards,
>> Python_Max.
>>
>
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best regards,
Python_Max.

Reply via email to