On Sun, Mar 13, 2011 at 7:10 PM, Karl Hiramoto <k...@hiramoto.org> wrote:
>
> Hi,
>
> I'm looking for advice on reducing disk usage.   I've ran out of disk space 
> two days in a row while running a  nightly scheduled nodetool repair && 
> nodetool compact  cronjob.
>
> I have 6 nodes RF=3  each with 300 GB drives at a hosting company.   
> GCGraceSeconds = 260000 (3.1 days)
>
> Every column in the database has a TTL of 86400 (24 hours)   to handle 
> deletion of stale data.   50% of the time the data is only written once, read 
> 0 or many times then expires. The other 50% of the time it's written multiple 
> times, resetting the TTL to 24 hours each time.

As it turns out, the compaction algorithm is pretty much the worst
possible for this use case. Because we compact files that have a
similar size, the older a column gets, the less often it is compacted.
If you always set a fixed TTL for all columns, you would want to do
some compaction of recent sstable, for the sake of not having too many
sstables, but you also want to compact old sstable, that are
guaranteed to just go away. And for those, it's actually fine to
compact them alone (only for the sake of purging).
But as compaction works, you will end up with big sstables of stuffs
that are expired, and you may even not be able to compact simply
because compaction "thinks" it doesn't have enough room.

But I do think that your use case (having a CF where are columns have
the same TTL and you only rely on it for deletion) is a very useful
one, and we should handle it better. In particular, CASSANDRA-1610
could be an easy way to get this.

CASSANDRA-1537 is probably also a partial but possibly sufficient
solution. That's also probably easier than CASSANDRA-1610 and I'll try
to give it a shot asap, that had been on my todo list way too long.

> One question,  since I use a TTL is it safe to set GCGraceSeconds  to 0?   I 
> don't manually delete ever, I just rely on the TTL for deletion, so are 
> forgotten deletes an issue?

The rule is this. Say you think that m is a reasonable value for
GCGraceSeconds. That is, you make sure that you'll always put back up
failing nodes and run repair within m seconds. Then, if you always use
a TTL of n (in your case 24 hours), the actual GCGraceSeconds that you
should set is m - n.

So putting a GCGrace of 0 in you would would be roughly equivalent to
set a GCGrace of 24h on a "normal" CF. That's probably a bit low.

--
Sylvain


>
>
>
> cfstats:
>  Read Count: 32052
>         Read Latency: 3.1280378135529765 ms.
>         Write Count: 9704525
>         Write Latency: 0.009527474760485443 ms.
>         Pending Tasks: 0
>                 Column Family: Offer
>                 SSTable count: 12
>                 Space used (live): 59865089091
>                 Space used (total): 76111577830
>                 Memtable Columns Count: 39355
>                 Memtable Data Size: 14726313
>                 Memtable Switch Count: 414
>                 Read Count: 32052
>                 Read Latency: 3.128 ms.
>                 Write Count: 9704525
>                 Write Latency: 0.010 ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 1000
>                 Key cache size: 1000
>                 Key cache hit rate: 2.4805931214280473E-4
>                 Row cache: disabled
>                 Compacted row minimum size: 36
>                 Compacted row maximum size: 1597
>                 Compacted row mean size: 1319
>

Reply via email to