Everybody, thanks for all the help.
Chris/Jason, while 1) assumption is actually incorrect for my situation.
Nonetheless, I can see how one would basically use a dynamic-typing approach
to sending the additional data as a first keys for each partition. It seems
less than elegant but doable.
The
I think this kind of partitioner is a little hackish. More straight forward
approach is to emit the extra data N times under special keys and write a
partitioner that would recognize these keys and dispatch them accordingly
between partitions 0..N-1
Also if this data needs to be shipped to reduc
>From my experience, writing data is possible using MO in both Map and
Reduce sides of a single MR job. All data written to the MO name in
map-side is committed just like it would if the job were a map-only
job (there's no difference, since a map task does not wait for reduce
tasks to begin - it is
If these assumptions are correct:
0) Each map outputs one result, a few hundred bytes
1) The map output is deterministic, given an input split index
2) Every reducer must see the result from every map
Then just output the result N times, where N is the number of
reducers, using a custom Partition
It was my understanding based on the FAQ and my personal experience, that
using the MutlipleOutputs class, or just relying on OutputComitter only
works for the final phase of the job. (E.g. the reduce phase in a
map+reduce job and the map phase only in the case of reducer=NONE). In the
case I'm t
With just HDFS, IMO the good approach would be (2). See this FAQ on
task-specific HDFS output directories you can use:
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F.
It'd also be much easier to use the MultipleOutputs class (o
you should be able to use
hadoop job -events to get the task completion events from the job
tracker. Here is a link:
http://hadoop.apache.org/common/docs/current/commands_manual.html#job
thanks
mahadev
On Sun, Feb 13, 2011 at 8:45 AM, Pedro Costa wrote:
> Hi,
>
> 1 - How do I get the name of t
I'm outputting a small amount of secondary summary information from a map
task that I want to use in the reduce phase of the job. This information is
keyed on a custom input split index.
Each map task outputs this summary information (less than hundred bytes per
input task). Note that the summar
Hi,
1 - How do I get the name of the map tasks the ran in the command line?
2 - How do I get the start time and the end time of a map task in the
command line?
--
Pedro
Hi,
I would like to get the duration of each Map and Reduce took to run by
command line. how is this possible?
Thanks,
--
Pedro
Hi,
I'm running GridMix2 examples and I would like to retrieve all the
results produced by the tests and save the files locally, to read the
later and offline. Does exists any command for that?
Thanks
--
Pedro
11 matches
Mail list logo