Sounds like you're seeing the bug in 0.7.0 preventing deletion of non-Data.db files (i.e. your Index.db) post-compaction. This is fixed for 0.7.1. (https://issues.apache.org/jira/browse/CASSANDRA-2059)
On Wed, Feb 2, 2011 at 8:15 AM, Omer van der Horst Jansen <ome...@gmail.com> wrote: > We're using Cassandra as the back end for a home grown session > management system. That system was originally built back in 2005 using > BerkelyDB/Java and a data distribution system that used UDP multicast. > Maintenance was becoming increasingly painful. > > I wrote a prototype replacement service using Cassandra 0.6 but > decided to wait for the availability of official TTL support in 0.7 > before switching over. > > The new system has been running in production now for a little over a > week. My main issue is that Cassandra is using far more disk space > than I expected it to. The vast bulk of disk space seems to be used > for *Index.db files. I'm hoping that the 10-day GCGraceSeconds > interval that kicks in on Friday will help me there. > > Most of our apps that use this service generate their own session > keys. I assume by hashing and salting a user ID and/or calling > something like java.util.UUID.randomUUID(). > > My schema is currently very simple -- there's a single CF containing a > (binary) payload column and a column that indicates whether or not the > data has been compressed. We have a few rogue apps that store > humongous XML documents in the session and compression helps to deal > with that. That's also why memcached wasn't going to work in our > scenario. > > > > On Tue, Feb 1, 2011 at 12:18 PM, Kallin Nagelberg > <kallin.nagelb...@gmail.com> wrote: >> Hey, >> I am currently investigating Cassandra for storing what are >> effectively web sessions. Our production environment has about 10 high >> end servers behind a load balancer, and we'd like to add distributed >> session support. My main concerns are performance, consistency, and >> the ability to create unique session keys. The last thing we would >> want is users picking up each others sessions. After spending a few >> days investigating Cassandra I'm thinking of creating a single >> keyspace with a single super-column-family. The scf would store a few >> standard columns, and a supercolumn of arbitrary session attributes, >> like: >> >> 0s809sdf8s908sf90s: { >> prop1: x, >> created : timestamp, >> lastAccessed: timestamp, >> prop2: y, >> arbirtraryProperties : { >> someRandomProperty1:xxyyzz, >> someRandomProperty2:xxyyzz, >> someRandomProperty3:xxyyzz >> } >> >> Does this sound like a reasonable use case? We are on a tight timeline >> and I'm currently on the fence about getting something up and running >> like this on a tight timeline. >> >> Thanks, >> -Kal >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com