Re: Problems with MR Job running really slowly

2011-11-06 Thread Florin P
Hello!    The advices that  I gave were based on my experience. They helped me to solve my issues when I was sending a lot of data to reducers.   I can give you just two advices,  and hope that the really expert guys from Hadoop and Cloudera to help you. I'm particular interested in this subject

Re: Problems with MR Job running really slowly

2011-11-06 Thread Steve Lewis
1) I am varying both the number of mappers and reducers trying to determine three things a) What are the options I need reducers and mappers to - Not have mappers or reducers killed with GC overhead limit exceeded - Minimize execution time for the cluster I use a custom Splitte

Fwd: Re: Do failed task attempts stick around the jobcache on local disk?

2011-11-06 Thread Uma Maheswara Rao G 72686
forwarding to mapreduce --- Begin Message --- Am I being completely silly asking about this? Does anyone know? On Wed, Nov 2, 2011 at 6:27 PM, Meng Mao wrote: > Is there any mechanism in place to remove failed task attempt directories > from the TaskTracker's jobcache? > > It seems like for

Keeping intermediate results

2011-11-06 Thread Yaron Gonen
Hi, Suppose I have chained M/R jobs that traverse a graph and look for nodes with a specific value. Every time a Map encounters that value, I'd like to keep that node in the final result. I can of course save it with a special key and use a condition in the Reducer, but is there a more formal or el

Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)

2011-11-06 Thread Praveen Sripati
Hi, What is the difference between setting the mapred.job.map.memory.mb and mapred.child.java.opts using -Xmx to control the maximum memory used by a Mapper and Reduce task? Which one takes precedence? Thanks, Praveen

Re: Problems with MR Job running really slowly

2011-11-06 Thread Florin P
Hello!   How many reducers you are using?   Regarding the performance parameters, fist you can increase the size of the io.sort.mb parameter. It seems that you are sending a lot of amount of data to the reducer. By increasing the value of this parameter, in the shuffle phase, the framework w