> We're playing around with Cassandra trying to get a feel for it. Can someone 
> please explain the difference between load (from nodetool) and whats actually 
> stored on disk? Sometimes these number mirror each other and sometimes the 
> disk usage is up to 2x the load reported. as you can see below...
[snip]
> Run 3,5,6,9, and 12 don't seem to match up well. Can someone explain this 
> please?

Probably there are obsolete sstables that have not yet been removed.
Removal of sstables is somewhat delayed because it relies on GC to
avoid synchronization complexities in the implementation. See:

   http://wiki.apache.org/cassandra/MemtableSSTable

I believe sstables that are obsolete will not count towards load.

You can either trigger the GC, restart the cassandra nodes, or just
wait until they disappear (generating enough activity to trigger a CMS
sweep of the heap should be enough, assuming you use CMS).

--
/ Peter Schuller

Reply via email to