All,

I'm setting up a 4-node Hadoop 2.5.1 cluster. Each node has the following drives:
1 - 500GB drive (OS disk)
1 - 500GB drive
1 - 2 TB drive
1 - 3 TB drive.

In past experience I've had lots of issues with non-uniform drive sizes for HDFS, but unfortunately it wasn't an option to get all 3TB or 2TB drives for this cluster.

My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB drive as intermediate data. Most our of jobs don't make large use of intermediate data, but at least this way, I get a good amount of space (2TB) per node before I run into issues. Then I may end up using the AvailableSpaceVolumeChoosingPolicy to help with balancing the blocks.

If necessary I could put intermediate data on one of the OS partitions (/home). But this doesn't seem ideal.

Anybody have any recommendations regarding the optimal use of storage in this scenario?

Thanks,
Brian

Reply via email to