Re: Datanode disk configuration

daemeon reiydelle Wed, 12 Nov 2014 08:58:20 -0800

I would consider a jbod with 16-64mb stride. This would be a choice where
one or more (e.g. MR) steps will be io bound. Otherwise one or more tasks
will be hit with the low read/write times of having large amounts of data
behind a single spindle
On Nov 12, 2014 8:37 AM, "Brian C. Huffman" <[email protected]>
wrote:


>  All,
>
> I'm setting up a 4-node Hadoop 2.5.1 cluster.  Each node has the following
> drives:
> 1 - 500GB drive (OS disk)
> 1 - 500GB drive
> 1 - 2 TB drive
> 1 - 3 TB drive.
>
> In past experience I've had lots of issues with non-uniform drive sizes
> for HDFS, but unfortunately it wasn't an option to get all 3TB or 2TB
> drives for this cluster.
>
> My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB drive
> as intermediate data.  Most our of jobs don't make large use of
> intermediate data, but at least this way, I get a good amount of space
> (2TB) per node before I run into issues.  Then I may end up using the 
> AvailableSpaceVolumeChoosingPolicy
> to help with balancing the blocks.
>
> If necessary I could put intermediate data on one of the OS partitions
> (/home).  But this doesn't seem ideal.
>
> Anybody have any recommendations regarding the optimal use of storage in
> this scenario?
>
> Thanks,
> Brian
>

Re: Datanode disk configuration

Reply via email to