On 6/9/2011 8:40 AM, Edward Capriolo wrote:
Some of these things are challenges, and a few are being worked on in
one way or another.
1) Dynamic snitch was implemented to determine slow acting nodes and
re-balance load.
2) You can budget bootstrap with rsync, as long as you know what data
to copy where. 0.7.X made the data moving process more efficient.
Still, moving only 1 TB of data over a T-1 would take 61 days. Or you
could ship it in a couple.
3) There are many cases where different partition strategies can
theoretically be better. The question is for the normal use case what
is the best?
4) Compressed SSTables is on the way. This will be nice because it can
help maximize disk caches.
5) Compaction's *are* a good thing. You can already do this by setting
compaction thresholds to 0. That is not great because smaller
compactions can run really fast and you want those to happen
regularly. Another way I take care of this is forcing major
compactions on my schedule. This makes it very unlikely that a larger
compaction will happen at random during peak time. 0.8.X has
multi-threaded compaction and a throttling limit so that looks promising.
More nodes vs less nodes..+1 more nodes. This does not mean you need
to go very small, but the larger disk configurations are just more
painful. Unless you can get very/very/very fast disks.
Even with a massive RAID-0? At some point, the disk I/O throughput
should be pretty fast where it can compete with cache speeds perhaps?