Re: How the number of mapper tasks is calculated

2011-07-25 Thread Chiku Singh
re than one dfs block, you lose the data locality scheduling benefits." (https://issues.apache.org/jira/browse/HADOOP-2560) On Tue, Jul 26, 2011 at 12:53 AM, Anfernee Xu wrote: > I have a generic question about how the number of mapper tasks is > calculated, as far as I know, the number

How the number of mapper tasks is calculated

2011-07-25 Thread Anfernee Xu
I have a generic question about how the number of mapper tasks is calculated, as far as I know, the number is primarily based on the number of splits, say if I have 5 splits and I have 10 tasktracker running in the cluster, I will have 5 mapper tasks running in my MR job, right? But what I found

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
Great! mapred.map.tasks and mapred.task.partition work perfectly for me, even for the local job runner. Thanks On Dec 3, 2010, at 5:59 PM, Harsh J wrote: > Hi, > > (Answers may be 0.20 specific) > > On Sat, Dec 4, 2010 at 6:41 AM, Jason wrote: >> In my mapper code I need to know the total

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
> BTW, why not take task attempt id context.getTaskAttemptID() as the > prefix of unique id ? The task attempt id for each task should be > different The reason is that I would prefer to not have big gaps in my int id sequence, so i'd rather store mapper task ID in the low bits (suffix instead of

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Harsh J
Hi, On Sat, Dec 4, 2010 at 7:36 AM, Shrijeet Paliwal wrote: > A note on mapred.map.tasks. > Hadoop does not honor mapred.map.tasksĀ all the time. It is just a hint > for the framework, actual number of map tasks launched may be > different. *I think*. > This is true pre-job-submission. The InputF

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Shrijeet Paliwal
>>mapred.map.tasks is set for every job before launch and is the total >>number of maps that are going to run for a successful result. A note on mapred.map.tasks. Hadoop does not honor mapred.map.tasksĀ all the time. It is just a hint for the framework, actual number of map tasks launched may be di

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Harsh J
Minor correction, it is: mapred.tip.id is the task's id (contains various info about the task, map/reduce). mapred.task.id is the task's _attempt_ id (basically tip id, with attempt information, map/reduce). On Sat, Dec 4, 2010 at 7:29 AM, Harsh J wrote: > Hi, > > (Answers may be 0.20 specific) >

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Harsh J
Hi, (Answers may be 0.20 specific) On Sat, Dec 4, 2010 at 6:41 AM, Jason wrote: > In my mapper code I need to know the total number of mappers which is the > same as number of input splits. > (I need it for unique int Id generation) mapred.map.tasks is set for every job before launch and is th

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jeff Zhang
You can use the following code to get the number of mapper task InputFormat inputForamt = ReflectionUtils.newInstance( context.getInputFormatClass(), context.getConfiguration()); int

Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
In my mapper code I need to know the total number of mappers which is the same as number of input splits. (I need it for unique int Id generation) Basically Im looking for an analog of context.getNumReduceTasks() but can't find it. Thanks >

Number Of Mapper

2010-09-14 Thread Rahul Malviya
Hi I am using Pig jobs to run on Hadoop but always it runs 4 mappers simultaneously. How can I increase the number of simultaneous mappers to run ? What config do I have to change ? Thanks, Rahul

Re: specify different number of mapper tasks for different machines

2010-08-31 Thread Vitaliy Semochkin
t; them, but 2 cores for 2 of them, and the 8 core machines are more >>>> powerful (faster, more mem, more disk). >>>> >>>> Currently, I am using only the 3 machines (each with 8 cores), and the >>>> max number of mapper tasks is 8. >>&

Re: specify different number of mapper tasks for different machines

2010-08-30 Thread Vitaliy Semochkin
t;>> >>> I am running mapreduce on 5 machines, where I have 8 cores for 3 of >>> them, but 2 cores for 2 of them, and the 8 core machines are more >>> powerful (faster, more mem, more disk). >>> >>> Currently, I am using only the 3 machines (each wit

Re: specify different number of mapper tasks for different machines

2010-08-30 Thread Shaojun Zhao
Zhao wrote: >> Hi, >> >> I am running mapreduce on 5 machines, where I have 8 cores for 3 of >> them, but 2 cores for 2 of them, and the 8 core machines are more >> powerful (faster, more mem, more disk). >> >> Currently, I am using only the 3 machines (ea

Re: specify different number of mapper tasks for different machines

2010-08-30 Thread Vitaliy Semochkin
nes are more > powerful (faster, more mem, more disk). > > Currently, I am using only the 3 machines (each with 8 cores), and the > max number of mapper tasks is 8. > I may use one of the 2 core machine as the master, but it turns out I > need a powerful master. > > Is there a

Re: specify different number of mapper tasks for different machines

2010-07-14 Thread Ted Yu
hadoop-daemon.sh also needs to be modified - it would wipe your custom config files: if [ "$HADOOP_MASTER" != "" ]; then echo rsync from $HADOOP_MASTER rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME" fi On

Re: specify different number of mapper tasks for different machines

2010-07-14 Thread Allen Wittenauer
On Jul 14, 2010, at 11:50 AM, Shaojun Zhao wrote: > Is there any way to specify that some machines run, say, 8 mapper > tasks, while some machines run only 2 tasks? A custom mapred-site.xml per machine.

specify different number of mapper tasks for different machines

2010-07-14 Thread Shaojun Zhao
Hi, I am running mapreduce on 5 machines, where I have 8 cores for 3 of them, but 2 cores for 2 of them, and the 8 core machines are more powerful (faster, more mem, more disk). Currently, I am using only the 3 machines (each with 8 cores), and the max number of mapper tasks is 8. I may use one