Hello!
The advices that I gave were based on my experience. They helped me to
solve my issues when I was sending a lot of data to reducers.
I can give you just two advices, and hope that the really expert guys from
Hadoop and Cloudera to help you. I'm particular interested in this subject
1) I am varying both the number of mappers and reducers trying to
determine three things
a) What are the options I need reducers and mappers to
- Not have mappers or reducers killed with GC overhead limit
exceeded
- Minimize execution time for the cluster
I use a custom Splitte
forwarding to mapreduce
--- Begin Message ---
Am I being completely silly asking about this? Does anyone know?
On Wed, Nov 2, 2011 at 6:27 PM, Meng Mao wrote:
> Is there any mechanism in place to remove failed task attempt directories
> from the TaskTracker's jobcache?
>
> It seems like for
Hi,
Suppose I have chained M/R jobs that traverse a graph and look for nodes
with a specific value. Every time a Map encounters that value, I'd like to
keep that node in the final result.
I can of course save it with a special key and use a condition in the
Reducer, but is there a more formal or el
Hi,
What is the difference between setting the mapred.job.map.memory.mb and
mapred.child.java.opts using -Xmx to control the maximum memory used by a
Mapper and Reduce task? Which one takes precedence?
Thanks,
Praveen
Hello!
How many reducers you are using?
Regarding the performance parameters, fist you can increase the size of the
io.sort.mb parameter.
It seems that you are sending a lot of amount of data to the reducer. By
increasing the value of this parameter, in the shuffle phase, the framework
w