What's the best disk configuration for hadoop? SSD's Raid levels, etc?

2013-05-11 Thread David Parks
We've got a cluster of 10x 8core/24gb nodes, currently with 1 4TB disk (3 disk slots max), they chug away ok currently, only slightly IO bound on average. I'm going to upgrade the disk configuration at some point (we do need more space on HDFS) and I'm thinking about what's best hardware-wise:

Re: What's the best disk configuration for hadoop? SSD's Raid levels, etc?

2013-05-11 Thread Mirko Kämpf
Hi David, your first point : the role of thumb is : one disk per CPU (or per 1.5 to 2 CPU) in your case more parrallel IO could be possible with more disks, but as you wrote, you have less IO bound processing things might change and a SSD could speed up shuffle sort phase, but I suggest to do

Re: What's the best disk configuration for hadoop? SSD's Raid levels, etc?

2013-05-11 Thread Ted Dunning
This sounds (with no real evidence) like you are a bit light on memory for that number of cores. That could cause you to be spilling map outputs early and very much slowing things down. On Fri, May 10, 2013 at 11:30 PM, David Parks davidpark...@yahoo.comwrote: We’ve got a cluster of 10x