Re: Ideas for Big Data Support

AJ Thu, 09 Jun 2011 08:25:05 -0700

On 6/9/2011 8:40 AM, Edward Capriolo wrote:

Some of these things are challenges, and a few are being worked on inone way or another.
1) Dynamic snitch was implemented to determine slow acting nodes andre-balance load.
2) You can budget bootstrap with rsync, as long as you know what datato copy where. 0.7.X made the data moving process more efficient.

Still, moving only 1 TB of data over a T-1 would take 61 days. Or youcould ship it in a couple.

3) There are many cases where different partition strategies cantheoretically be better. The question is for the normal use case whatis the best?
4) Compressed SSTables is on the way. This will be nice because it canhelp maximize disk caches.
5) Compaction's *are* a good thing. You can already do this by settingcompaction thresholds to 0. That is not great because smallercompactions can run really fast and you want those to happenregularly. Another way I take care of this is forcing majorcompactions on my schedule. This makes it very unlikely that a largercompaction will happen at random during peak time. 0.8.X hasmulti-threaded compaction and a throttling limit so that looks promising.
More nodes vs less nodes..+1 more nodes. This does not mean you needto go very small, but the larger disk configurations are just morepainful. Unless you can get very/very/very fast disks.

Even with a massive RAID-0? At some point, the disk I/O throughputshould be pretty fast where it can compete with cache speeds perhaps?

Re: Ideas for Big Data Support

Reply via email to