I see the same thing here. I have tried to do some maths including
timestamps, columns name, keys and raw data but in the end cassandra reports
a cluster size from 2 to 3 times bigger than the raw data. I am surely
missing something in my formula + i have a lot of free hard drive space, so
it's not a big issue to me. Just puzzling.

On Wed, Jul 7, 2010 at 7:17 PM, Peter Schuller
<peter.schul...@infidyne.com>wrote:

> > I am thinking that the timestamps and column names should be included in
> the
> > column family stats, which basically says 300,000 rows that are 100KB
> each=30
> > GB.  My rows only have 1 column so there should only be one timestamp.
>  My
> > column name is only 10 bytes long.
> >
> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24
> hours
> > after all writes have completed.  Compactions should be complete, no?
>
> Nope, it sounds fishy to me. Presuming that compaction is not actively
> running in the background still (should be obvious from logs and/or
> CPU usage and/or disk I/O).
>
> --
> / Peter Schuller
>

Reply via email to