On 6/9/2011 8:40 AM, Edward Capriolo wrote:



Some of these things are challenges, and a few are being worked on in one way or another.

1) Dynamic snitch was implemented to determine slow acting nodes and re-balance load.

2) You can budget bootstrap with rsync, as long as you know what data to copy where. 0.7.X made the data moving process more efficient.

Still, moving only 1 TB of data over a T-1 would take 61 days. Or you could ship it in a couple.


3) There are many cases where different partition strategies can theoretically be better. The question is for the normal use case what is the best?

4) Compressed SSTables is on the way. This will be nice because it can help maximize disk caches.

5) Compaction's *are* a good thing. You can already do this by setting compaction thresholds to 0. That is not great because smaller compactions can run really fast and you want those to happen regularly. Another way I take care of this is forcing major compactions on my schedule. This makes it very unlikely that a larger compaction will happen at random during peak time. 0.8.X has multi-threaded compaction and a throttling limit so that looks promising.

More nodes vs less nodes..+1 more nodes. This does not mean you need to go very small, but the larger disk configurations are just more painful. Unless you can get very/very/very fast disks.

Even with a massive RAID-0? At some point, the disk I/O throughput should be pretty fast where it can compete with cache speeds perhaps?


Reply via email to