Using ram disk for cluster.local.dir

2011-07-13 Thread Eric Caspole
I have a 1 node pseudo cluster with plenty of RAM and 5 HDs. As an experiment, I set mapreduce.cluster.local.dir to point to a ram disk. For this experiment I am running 8GB terasort, so I made a 9GB ram disk. This change sped up the run time of the job by ~16% versus pointing mapreduce.clu

Re: Lack of data locality in Hadoop-0.20.2

2011-07-13 Thread Virajith Jalaparti
Hi Matei, Using the fair scheduler of the cloudera distribution seems to have (mostly) solved the problem. Thanks a lot for the suggestion. -Virajith On Tue, Jul 12, 2011 at 7:23 PM, Matei Zaharia wrote: > Hi Virajith, > > The default FIFO scheduler just isn't optimized for locality for small >

The last task's status is always initializing

2011-07-13 Thread Michael Hu
Hi,all: I met a very weird problem. When I input some data, and if this data have to split into more than 2 tasks, then the last task's status is always *initializing*. And any node can't complete the task. I check the tasktracker, userlogs, datanode's logs, there're no error report. Pleas