re than one dfs block, you lose the data locality
scheduling benefits." (https://issues.apache.org/jira/browse/HADOOP-2560)
On Tue, Jul 26, 2011 at 12:53 AM, Anfernee Xu wrote:
> I have a generic question about how the number of mapper tasks is
> calculated, as far as I know, the number
I have a generic question about how the number of mapper tasks is
calculated, as far as I know, the number is primarily based on the number of
splits, say if I have 5 splits and I have 10 tasktracker running in the
cluster, I will have 5 mapper tasks running in my MR job, right?
But what I found
Great! mapred.map.tasks and mapred.task.partition work perfectly for me, even
for the local job runner.
Thanks
On Dec 3, 2010, at 5:59 PM, Harsh J wrote:
> Hi,
>
> (Answers may be 0.20 specific)
>
> On Sat, Dec 4, 2010 at 6:41 AM, Jason wrote:
>> In my mapper code I need to know the total
> BTW, why not take task attempt id context.getTaskAttemptID() as the
> prefix of unique id ? The task attempt id for each task should be
> different
The reason is that I would prefer to not have big gaps in my int id sequence,
so i'd rather store mapper task ID in the low bits (suffix instead of
Hi,
On Sat, Dec 4, 2010 at 7:36 AM, Shrijeet Paliwal
wrote:
> A note on mapred.map.tasks.
> Hadoop does not honor mapred.map.tasksĀ all the time. It is just a hint
> for the framework, actual number of map tasks launched may be
> different. *I think*.
>
This is true pre-job-submission. The InputF
>>mapred.map.tasks is set for every job before launch and is the total
>>number of maps that are going to run for a successful result.
A note on mapred.map.tasks.
Hadoop does not honor mapred.map.tasksĀ all the time. It is just a hint
for the framework, actual number of map tasks launched may be
di
Minor correction, it is:
mapred.tip.id is the task's id (contains various info about the task,
map/reduce).
mapred.task.id is the task's _attempt_ id (basically tip id, with
attempt information, map/reduce).
On Sat, Dec 4, 2010 at 7:29 AM, Harsh J wrote:
> Hi,
>
> (Answers may be 0.20 specific)
>
Hi,
(Answers may be 0.20 specific)
On Sat, Dec 4, 2010 at 6:41 AM, Jason wrote:
> In my mapper code I need to know the total number of mappers which is the
> same as number of input splits.
> (I need it for unique int Id generation)
mapred.map.tasks is set for every job before launch and is th
You can use the following code to get the number of mapper task
InputFormat inputForamt = ReflectionUtils.newInstance(
context.getInputFormatClass(),
context.getConfiguration());
int
In my mapper code I need to know the total number of mappers which is the same
as number of input splits.
(I need it for unique int Id generation)
Basically Im looking for an analog of context.getNumReduceTasks() but can't
find it.
Thanks
>
Hi
I am using Pig jobs to run on Hadoop but always it runs 4 mappers
simultaneously.
How can I increase the number of simultaneous mappers to run ?
What config do I have to change ?
Thanks,
Rahul
t; them, but 2 cores for 2 of them, and the 8 core machines are more
>>>> powerful (faster, more mem, more disk).
>>>>
>>>> Currently, I am using only the 3 machines (each with 8 cores), and the
>>>> max number of mapper tasks is 8.
>>&
t;>>
>>> I am running mapreduce on 5 machines, where I have 8 cores for 3 of
>>> them, but 2 cores for 2 of them, and the 8 core machines are more
>>> powerful (faster, more mem, more disk).
>>>
>>> Currently, I am using only the 3 machines (each wit
Zhao wrote:
>> Hi,
>>
>> I am running mapreduce on 5 machines, where I have 8 cores for 3 of
>> them, but 2 cores for 2 of them, and the 8 core machines are more
>> powerful (faster, more mem, more disk).
>>
>> Currently, I am using only the 3 machines (ea
nes are more
> powerful (faster, more mem, more disk).
>
> Currently, I am using only the 3 machines (each with 8 cores), and the
> max number of mapper tasks is 8.
> I may use one of the 2 core machine as the master, but it turns out I
> need a powerful master.
>
> Is there a
hadoop-daemon.sh also needs to be modified - it would wipe your custom
config files:
if [ "$HADOOP_MASTER" != "" ]; then
echo rsync from $HADOOP_MASTER
rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*'
--exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME"
fi
On
On Jul 14, 2010, at 11:50 AM, Shaojun Zhao wrote:
> Is there any way to specify that some machines run, say, 8 mapper
> tasks, while some machines run only 2 tasks?
A custom mapred-site.xml per machine.
Hi,
I am running mapreduce on 5 machines, where I have 8 cores for 3 of
them, but 2 cores for 2 of them, and the 8 core machines are more
powerful (faster, more mem, more disk).
Currently, I am using only the 3 machines (each with 8 cores), and the
max number of mapper tasks is 8.
I may use one
18 matches
Mail list logo