if me, I will go 1 table, just think too much labor to manage many tables
and also how reliable while switching tables.
Regarding tombstones,  may try some ways to fight:
reasonable partition size  ( big partition with large tombstones will be a
problem);
don't query tombstones as possible,  in application coding, put timestamp >
expired time in where condition, so will not touch tombstones, in table
side,  timestamp column in cluster key, with desc order, so better
performance.

James

On Thu, Feb 1, 2018 at 3:16 AM, Marcus Haarmann <marcus.haarm...@midoco.de>
wrote:

> Hi experts,
>
> I have a design issue here:
> We want to store bigger amounts of data (> 30mio rows containing blobs)
> which will be deleted depending on the type
> of data on a monthly base (not in the same order as the data entered the
> system).
> Some data would survive for two month only, other data for 3-5 years.
>
> The choice now is to have one table only with TTL per partition and
> partitions per deletion month (when the data should be deleted)
> which will allow a single delete command, followed by a compaction
> or alternatively to have multiple tables (one per month when the deletion
> process would just drop the table).
> The logic to retrieve that data is per record, so we know both the
> retention period and the id (uuid) of the addressed record,
> so multiple tables can be handled.
>
> Since it would be one table per deletion month, I do not expect more than
> 1000-2000 tables, depending on the
> retention period of the data.
>
> The benefit creating multiple tables would be that there are no tombstones
> while more tables take more memory in the nodes.
> The one table approach would make the compaction process take longer and
> produce more I/O activity because
> the compaction would regenerate multiple tables internally.
>
> Any thoughts on this ?
> We want to use 9 nodes, cassandra 3.11 on Linux, total data amount
> expected ~15-20 TB.
>
> Thank you very much,
>
> Marcus Haarmann
>

Reply via email to