I see the same thing here. I have tried to do some maths including timestamps, columns name, keys and raw data but in the end cassandra reports a cluster size from 2 to 3 times bigger than the raw data. I am surely missing something in my formula + i have a lot of free hard drive space, so it's not a big issue to me. Just puzzling.
On Wed, Jul 7, 2010 at 7:17 PM, Peter Schuller <peter.schul...@infidyne.com>wrote: > > I am thinking that the timestamps and column names should be included in > the > > column family stats, which basically says 300,000 rows that are 100KB > each=30 > > GB. My rows only have 1 column so there should only be one timestamp. > My > > column name is only 10 bytes long. > > > > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 > hours > > after all writes have completed. Compactions should be complete, no? > > Nope, it sounds fishy to me. Presuming that compaction is not actively > running in the background still (should be obvious from logs and/or > CPU usage and/or disk I/O). > > -- > / Peter Schuller >