Hi, In our project, many distributed modules sending each other binary blobs, up to 100-200kb each in average. Small JSONs are being sent over message queue, while Cassandra is being used as temporary storage for blobs. We are using Cassandra instead of in memory distributed cache like Couch due to following reasons: (1) We don't wan't to be limited by RAM size (2) We are using intensively ordered composite keys and ranges (it is not simple key/value cache).
We don't use TTL mechanism for several reasons. Major reason is that we need to reclaim free disk space immediately and not after 10 days (gc_grace). We are very limited in disk space cause traffic is intensive and blobs are big. So what we did is creating every hour new keyspace named yyyy_MM_dd_HH and when disk becomes full, script running in crontrab on each node drops keyspace with "IF EXISTS" flag, and deletes whole keyspace folder. That way whole process is very clean and no garbage is left on disk. Keyspace is created by first module in flow on hourly basis and its name is being sent over message queue, to avoid possible problems. All modules read and write with consistency ONE and of cause there is no replication. Actually it works nice but we have several problems: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Hope not too much text :) Thanks, Pavel