Hi,

In our project, many distributed modules sending each other binary blobs,
up to 100-200kb each in average. Small JSONs are being sent over message
queue, while Cassandra is being used as temporary storage for blobs. We are
using Cassandra instead of in memory distributed cache like Couch due to
following reasons: (1) We don't wan't to be limited by RAM size (2) We are
using intensively ordered composite keys and ranges (it is not simple
key/value cache).

We don't use TTL mechanism for several reasons. Major reason is that we
need to reclaim free disk space immediately and not after 10 days
(gc_grace). We are very limited in disk space cause traffic is intensive
and blobs are big.

So what we did is creating every hour new keyspace named yyyy_MM_dd_HH and
when disk becomes full, script running in crontrab on each node drops
keyspace with "IF EXISTS" flag, and deletes whole keyspace folder. That way
whole process is very clean and no garbage is left on disk.

Keyspace is created by first module in flow on hourly basis and its name is
being sent over message queue, to avoid possible problems. All modules read
and write with consistency ONE and of cause there is no replication.

Actually it works nice but we have several problems:
1) When new keyspace with its columnfamilies is being just created (every
round hour), sometimes other modules failed to read/write data, and we lose
request. Can it be that creation of keyspace and columnfamilies is async
operation or there is propagation time between nodes?

2) We are reading and writing intensively, and usually I don't need the
data for more than 1-2 hours. What optimizations can I do to increase my
small cluster read performance? Cluster configuration - 3 identical nodes:
i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6.

Hope not too much text :)

Thanks,
  Pavel

Reply via email to