Hi Kevin, C* version: 1.2.xx Astyanax: 1.56.xx
We basically do this same thing in one of our production clusters, but rather than dropping SSTables, we drop Column Families. We time-bucket our CFs, and when a CF has passed some time threshold (metadata or embedded in CF name), it is dropped. This means there is a home-grown system that is doing the bookkeeping/maintenance rather than relying on C*s inner workings. It is unfortunate that we have to maintain a system which maintains CFs, but we've been in a pretty good state for the last 12 months using this method. Some caveats: By default, C* makes snapshots of your data when a table is dropped. You can leave that and have something else clear up the snapshots, or if you're less paranoid, set auto_snapshot: false in the cassandra.yaml file. Cassandra does not handle 'quick' schema changes very well, and we found that only one node should be used for these changes. When adding or removing column families, we have a single, property defined C* node that is designated as the schema node. After making a schema change, we had to throw in an artificial delay to ensure that the schema change propagated through the cluster before making the next schema change. And of course, relying on a single node being up for schema changes is less than ideal, so handling fail over to a new node is important. The final, and hardest problem, is that C* can't really handle schema changes while a node is being bootstrapped (new nodes, replacing a dead node). If a column family is dropped, but the new node has not yet received that data from its replica, the node will fail to bootstrap when it finally begins to receive that data - there is no column family for the data to be written to, so that node will be stuck in the joining state, and it's system keyspace needs to be wiped and re-synced to attempt to get back to a happy state. This unfortunately means we have to stop schema changes when a node needs to be replaced, but we have this flow down pretty well. Hope this helps, Jeremy Powell On Mon, May 12, 2014 at 5:53 PM, Kevin Burton <bur...@spinn3r.com> wrote: > We have a log only data structure… everything is appended and nothing is > ever updated. > > We should be totally fine with having lots of SSTables sitting on disk > because even if we did a major compaction the data would still look the > same. > > By 'lots' I mean maybe 1000 max. Maybe 1GB each. > > However, I would like a way to delete older data. > > One way to solve this could be to just drop an entire SSTable if all the > records inside have tombstones. > > Is this possible, to just drop a specific SSTable? > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > Skype: *burtonator* > blog: http://burtonator.wordpress.com > … or check out my Google+ > profile<https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > War is peace. Freedom is slavery. Ignorance is strength. Corporations are > people. > >