Re: number of mapper tasks

2013-01-29 Thread Marcelo Elias Del Valle
Hello, I have been able to make this work. I don't know why, but when but input file is zipped (read as a input stream) it creates only 1 mapper. However, when it's not zipped, it creates more mappers (running 3 instances it created 4 mappers and running 5 instances, it created 8 mappers).

Re: number of mapper tasks

2013-01-29 Thread Vinod Kumar Vavilapalli
Tried looking at your code, it's a bit involved. Instead of trying to run the job, try unit-testing your input format. Test for getSplits(), whatever number of splits that method returns, that will be the number of mappers that will run. You can also use LocalJobRunner also for this - set

Re: number of mapper tasks

2013-01-28 Thread Harsh J
I'm unfamiliar with EMR myself (perhaps the question fits EMR's own boards) but here's my take anyway: On Mon, Jan 28, 2013 at 9:24 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: Hello, I am using hadoop with TextInputFormat, a mapper and no reducers. I am running my jobs at Amazon

Re: number of mapper tasks

2013-01-28 Thread Marcelo Elias Del Valle
Sorry for asking too many questions, but the answers are really happening. 2013/1/28 Harsh J ha...@cloudera.com This seems CPU-oriented. You probably want the NLineInputFormat? See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html . This

Re: number of mapper tasks

2013-01-28 Thread Marcelo Elias Del Valle
Just to complement the last question, I have implemented the getSplits method in my input format: https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop/mapreduce/lib/input/CSVNLineInputFormat.java However, it still doesn't create more than 2 map tasks. Is there

Re: number of mapper tasks

2013-01-28 Thread Vinod Kumar Vavilapalli
Regarding your original question, you can use the min and max split settings to control the number of maps: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html . See #setMinInputSplitSize and #setMaxInputSplitSize. Or use mapred.min.split.size

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Harsh J
Hi, (Answers may be 0.20 specific) On Sat, Dec 4, 2010 at 6:41 AM, Jason urg...@gmail.com wrote: In my mapper code I need to know the total number of mappers which is the same as number of input splits. (I need it for unique int Id generation) mapred.map.tasks is set for every job before

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Harsh J
Minor correction, it is: mapred.tip.id is the task's id (contains various info about the task, map/reduce). mapred.task.id is the task's _attempt_ id (basically tip id, with attempt information, map/reduce). On Sat, Dec 4, 2010 at 7:29 AM, Harsh J qwertyman...@gmail.com wrote: Hi, (Answers may

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
BTW, why not take task attempt id context.getTaskAttemptID() as the prefix of unique id ? The task attempt id for each task should be different The reason is that I would prefer to not have big gaps in my int id sequence, so i'd rather store mapper task ID in the low bits (suffix instead of

Re: specify different number of mapper tasks for different machines

2010-08-31 Thread Vitaliy Semochkin
the 3 machines (each with 8 cores), and the max number of mapper tasks is 8. I may use one of the 2 core machine as the master, but it turns out I need a powerful master. Is there any way to specify that some machines run, say, 8 mapper tasks, while some machines run only 2 tasks? What I can

Re: specify different number of mapper tasks for different machines

2010-08-30 Thread Shaojun Zhao
of mapper tasks is 8. I may use one of the 2 core machine as the master, but it turns out I need a powerful master. Is there any way to specify that some machines run, say, 8 mapper tasks, while some machines run only 2 tasks? What I can imagine is to extend the slave file, and have machine1:8

Re: specify different number of mapper tasks for different machines

2010-08-30 Thread Vitaliy Semochkin
machines, where I have 8 cores for 3 of them, but 2 cores for 2 of them, and the 8 core machines are more powerful (faster, more mem, more disk). Currently, I am using only the 3 machines (each with 8 cores), and the max number of mapper tasks is 8. I may use one of the 2 core machine