Using ram disk for cluster.local.dir

Eric Caspole Wed, 13 Jul 2011 07:15:19 -0700

I have a 1 node pseudo cluster with plenty of RAM and 5 HDs. As anexperiment, I set mapreduce.cluster.local.dir to point to a ram disk.For this experiment I am running 8GB terasort, so I made a 9GB ramdisk. This change sped up the run time of the job by ~16% versuspointing mapreduce.cluster.local.dir to a csv list of 5 HDs.


I have two questions about this -

- Will this work in a cluster situation where say I have a 12GB ramdisk per cluster node, and I am working on a 128GB terasort, or doesthe cluster.local.dir free space size per node have to be big enoughto hold all intermediate results? My hunch is yes but I am not sure.

- By googling I found that there is very little info about peopletrying to use ram disks with hadoop in this way, so it seems likethere is a technical reason people do not do it, perhaps the sizerelated issue I mentioned. Are there other gotchas about trying touse a ram disk like this? It seems like a quick and dirty way to getsome performance.


Thanks,
Eric

Using ram disk for cluster.local.dir

Reply via email to