Hi experts, 

I have a design issue here: 
We want to store bigger amounts of data (> 30mio rows containing blobs) which 
will be deleted depending on the type 
of data on a monthly base (not in the same order as the data entered the 
system). 
Some data would survive for two month only, other data for 3-5 years. 

The choice now is to have one table only with TTL per partition and partitions 
per deletion month (when the data should be deleted) 
which will allow a single delete command, followed by a compaction 
or alternatively to have multiple tables (one per month when the deletion 
process would just drop the table). 
The logic to retrieve that data is per record, so we know both the retention 
period and the id (uuid) of the addressed record, 
so multiple tables can be handled. 

Since it would be one table per deletion month, I do not expect more than 
1000-2000 tables, depending on the 
retention period of the data. 

The benefit creating multiple tables would be that there are no tombstones 
while more tables take more memory in the nodes. 
The one table approach would make the compaction process take longer and 
produce more I/O activity because 
the compaction would regenerate multiple tables internally. 

Any thoughts on this ? 
We want to use 9 nodes, cassandra 3.11 on Linux, total data amount expected 
~15-20 TB. 

Thank you very much, 

Marcus Haarmann 

Reply via email to