Hello,
I have been able to make this work. I don't know why, but when but
input file is zipped (read as a input stream) it creates only 1 mapper.
However, when it's not zipped, it creates more mappers (running 3 instances
it created 4 mappers and running 5 instances, it created 8 mappers).
Tried looking at your code, it's a bit involved. Instead of trying to run
the job, try unit-testing your input format. Test for getSplits(), whatever
number of splits that method returns, that will be the number of mappers
that will run.
You can also use LocalJobRunner also for this - set
I'm unfamiliar with EMR myself (perhaps the question fits EMR's own
boards) but here's my take anyway:
On Mon, Jan 28, 2013 at 9:24 PM, Marcelo Elias Del Valle
mvall...@gmail.com wrote:
Hello,
I am using hadoop with TextInputFormat, a mapper and no reducers. I am
running my jobs at Amazon
Sorry for asking too many questions, but the answers are really happening.
2013/1/28 Harsh J ha...@cloudera.com
This seems CPU-oriented. You probably want the NLineInputFormat? See
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
.
This
Just to complement the last question, I have implemented the getSplits
method in my input format:
https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop/mapreduce/lib/input/CSVNLineInputFormat.java
However, it still doesn't create more than 2 map tasks. Is there
Regarding your original question, you can use the min and max split settings to
control the number of maps:
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html
. See #setMinInputSplitSize and #setMaxInputSplitSize. Or use
mapred.min.split.size
Hi,
(Answers may be 0.20 specific)
On Sat, Dec 4, 2010 at 6:41 AM, Jason urg...@gmail.com wrote:
In my mapper code I need to know the total number of mappers which is the
same as number of input splits.
(I need it for unique int Id generation)
mapred.map.tasks is set for every job before
Minor correction, it is:
mapred.tip.id is the task's id (contains various info about the task,
map/reduce).
mapred.task.id is the task's _attempt_ id (basically tip id, with
attempt information, map/reduce).
On Sat, Dec 4, 2010 at 7:29 AM, Harsh J qwertyman...@gmail.com wrote:
Hi,
(Answers may
BTW, why not take task attempt id context.getTaskAttemptID() as the
prefix of unique id ? The task attempt id for each task should be
different
The reason is that I would prefer to not have big gaps in my int id sequence,
so i'd rather store mapper task ID in the low bits (suffix instead of
the 3 machines (each with 8 cores), and the
max number of mapper tasks is 8.
I may use one of the 2 core machine as the master, but it turns out I
need a powerful master.
Is there any way to specify that some machines run, say, 8 mapper
tasks, while some machines run only 2 tasks?
What I can
of mapper tasks is 8.
I may use one of the 2 core machine as the master, but it turns out I
need a powerful master.
Is there any way to specify that some machines run, say, 8 mapper
tasks, while some machines run only 2 tasks?
What I can imagine is to extend the slave file, and have
machine1:8
machines, where I have 8 cores for 3 of
them, but 2 cores for 2 of them, and the 8 core machines are more
powerful (faster, more mem, more disk).
Currently, I am using only the 3 machines (each with 8 cores), and the
max number of mapper tasks is 8.
I may use one of the 2 core machine
12 matches
Mail list logo