I have an OutputFormat which implements Configurable. I set new config entries
to a job configuration during checkOutputSpec() so that the tasks will get the
config entries through the job configuration. This works fine in 0.20.2, but
stopped working starting from 0.20.203. With 0.20.203, my
I need to save some data in the job config as part of
OutputFormat.checkOutputSpecs(), and have it propagated to map tasks. It seems
that the property is saved correctly when OutputFormat.checkOutputSpecs() is
run, but it can't be found in the map tasks. Any idea why that's the case?
Thanks,
I'd like to get some idea on how the task scheduler relies on
RecordReader.getProgress() with version 0.20.2.
There are times when I don't have an accurate count of the total records to be
processed, and I wonder the impact on task scheduling when returning an
inaccurate progress percentage.
Hi,
I was trying to start up a single-node Hadoop cluster using 0.20.2.
The namenode, datanode and jobtracker all started fine. The task tracker
failed with the following error:
2011-03-16 11:39:47,479 INFO org.apache.hadoop.mapred.TaskTracker: Starting
thread: Map-events fetcher for all re
Hi,
Looking at 0.21's API,
In org.apache.hadoop.mapreduce.RecordReader, there is an initialize() method
that I can use for one time work;
In org.apache.hadoop.mapreduce.RecordWriter, there is no initialize().
Why is that? Where am I supposed to do the one time initialization? In the
constru
particular key should go to. I am not sure if that can be
done. Just out of curiosity, why do you need this kind of control over
reduction?
Hari
On Sat, Dec 18, 2010 at 11:54 PM, Jane Chen wrote:
But how does this help me request which host to schedule the reduce task to?
Thanks,
Jane
Jane,
The partitioner class can be used to achieve this.
(http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html).
Thanks,
Hari
On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen wrote:
Hi All,
Is there anyway to influence where a reduce task is
Hi All,
Is there anyway to influence where a reduce task is run? We have a case where
we'd like to choose the host to run the reduce task based on the task's input
key.
Any suggestion is greatly appreciated.
Thanks,
Jane
directly with mapreduce.*.
> If you are
> using the New API all over (driver, mapper, etc.), you
> should use the
> mapreduce.* only, right?
>
> On Tue, Dec 7, 2010 at 3:05 AM, Jane Chen
> wrote:
> > In Hadoop 0.21, I found InputFormat as an Interface in
> packa
In Hadoop 0.21, I found InputFormat as an Interface in package mapred, and as
an abstract class in package mapreduce. The APIs are slightly different.
Which one should I choose to extend from or implement? How are the two
packages intended to be used differently?
Thanks,
Jane
10 matches
Mail list logo