We've got a cluster of 10x 8core/24gb nodes, currently with 1 4TB disk (3
disk slots max), they chug away ok currently, only slightly IO bound on
average.
I'm going to upgrade the disk configuration at some point (we do need more
space on HDFS) and I'm thinking about what's best hardware-wise:
Hi David,
your first point :
the role of thumb is : one disk per CPU (or per 1.5 to 2 CPU) in your
case more parrallel IO could be possible with more disks, but
as you wrote, you have less IO bound processing things might change and a
SSD could speed up shuffle sort phase, but I suggest to do
This sounds (with no real evidence) like you are a bit light on memory for
that number of cores. That could cause you to be spilling map outputs
early and very much slowing things down.
On Fri, May 10, 2013 at 11:30 PM, David Parks davidpark...@yahoo.comwrote:
We’ve got a cluster of 10x