Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon
Hi All I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I have 14 map and 14 reduce slots, here is the configuration: mapred.tasktracker.map.tasks.maximum 14 mapred.tasktracker.reduce.tasks.maximum 14 map

Re: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread Joey Echeverria
What does the jobtracker web page say is the total reduce capacity? -Joey On Mar 10, 2012, at 5:39, WangRamon wrote: > Hi All > > I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I > have 14 map and 14 reduce slots, here is the configuration: > > > >

RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon
Joey, here is the information: Cluster Summary (Heap Size is 481.88 MB/1.74 GB)Maps Reduces Total Submissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes 06 11 3 42 42

Mapper Record Spillage

2012-03-10 Thread Hans Uhlig
I am attempting to speed up a mapping process whose input is GZIP compressed CSV files. The files range from 1-2GB, I am running on a Cluster where each node has a total of 32GB memory available to use. I have attempted to tweak mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to ac

RE: Mapper Record Spillage

2012-03-10 Thread WangRamon
How man map/reduce tasks slots do you have for each node? If the total number is 10, then you will use 10 * 4096mb memory when all tasks are running, which is bigger than the total memory 32G you have for each node. Date: Sat, 10 Mar 2012 20:00:13 -0800 Subject: Mapper Record Spillage From: huh

Re: Mapper Record Spillage

2012-03-10 Thread Hans Uhlig
I am attempting to specify this for a single job during its creation/submission. Not via the general construct. I am using the new api so I am adding the values to the conf passed into new Job(); 2012/3/10 WangRamon > How man map/reduce tasks slots do you have for each node? If the > total numb

Re: Mapper Record Spillage

2012-03-10 Thread Harsh J
Hans, Its possible you may have an typo issue: mapred.map.child.jvm.opts - Such a property does not exist. Perhaps you wanted "mapred.map.child.java.opts"? Additionally, the computation you need to do is (# of map slots on a TT * per-map-task-heap-requirement) should be at least < (Total RAM - 2/

Re: Mapper Record Spillage

2012-03-10 Thread Hans Uhlig
That was a typo in my email not in the configuration. Is the memory reserved for the tasks when the task tracker starts? You seem to be suggesting that I need to set the memory to be the same for all map tasks. Is there no way to override for a single map task? On Sat, Mar 10, 2012 at 8:41 PM, Har

Re: Mapper Record Spillage

2012-03-10 Thread Harsh J
Hans, You can change memory requirements for tasks of a single job, but not of a single task inside that job. This is briefly how the 0.20 framework (by default) works: TT has notions only of "slots", and carries a maximum _number_ of simultaneous slots it may run. It does not know of what each t