Thank you guys for the answers - I expected this but wanted to verify (who knows how smart Cassandra can be in the background! :-) )

@Jeff: unfortunately the records we will pick up for delete are not necessarily "neighbours" in terms of creation time so forming up contiguous ranges can not be done...

Just one more question left in this case...
As this way we will have lots of row tombstones generated over this "wide" table What would be your recommended table setup here (in terms of gc_grace_seconds, compaction, compression, etc etc)? Currently we have default setup for everything which I believe should be fine tuned a bit better....

FYI: this table has ~500k new UUID keyed rows every day in each partition...

thanks a lot!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


04.09.2020 16:33 keltezéssel, Jeff Jirsa írta:

As someone else pointed out it’s the same number of tombstones. Doing distinct queries gives you a bit more flexibility to retry it one fails, but multiple in one command avoids some contention on the memtable partition objects.

If you’re happen to be using type1 uuids (timeuuid) AND you’re deleting contiguous ranges, you could do a DELETE ... WHERE uuid>=? AND uuid <= ?

This would trade lots of tombstones for a single range tombstones, but may not match your model.



On Sep 3, 2020, at 11:57 PM, Attila Wind <attilaw@swf.technology> wrote:



Hi C* gurus,

I'm looking for the best strategy to delete records from a "wide" table.
"wide" means the table stores records which have a UUID-style id element of the key - within each partition

So yes, its not the partitioning key... The partitioning key is actually kind of a customerId at the moment and actually I'm not even sure this is the right model for this table... Given the fact that number of curtomerIds <<< number of UUIDs probably not. But lets exclude this for a moment maybe and come back to the main question of mine!

So the question:
when I delete records from this table, given the fact I can and I will delete in "batch fashion" (imagine kind of a scheduled job which collects - let's say - 1000 records) every time I do deletes...

Would there be a difference (in terms of generated tombstones) if I would

a) issue delete one-by-one like
DELETE FROM ... WHERE ... uuid = 'a'
DELETE FROM ... WHERE ... uuid = 'b'
...
DELETE FROM ... WHERE ... uuid = 'z'

or

b) issue delete in a group fashion like
DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z')

?

or is there any other way to effeicently delete which I miss here?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


Reply via email to