Compatibility issue with 0.20.203.

2011-11-08 Thread Jane Chen
I have an OutputFormat which implements Configurable.  I set new config entries to a job configuration during checkOutputSpec() so that the tasks will get the config entries through the job configuration.  This works fine in 0.20.2, but stopped working starting from 0.20.203.  With 0.20.203, my

Setting Config Property in OutputFormat.checkOutputSpecs().

2011-05-20 Thread Jane Chen
I need to save some data in the job config as part of OutputFormat.checkOutputSpecs(), and have it propagated to map tasks. It seems that the property is saved correctly when OutputFormat.checkOutputSpecs() is run, but it can't be found in the map tasks. Any idea why that's the case? Thanks,

RecordReader Progress Reporting.

2011-03-28 Thread Jane Chen
I'd like to get some idea on how the task scheduler relies on RecordReader.getProgress() with version 0.20.2. There are times when I don't have an accurate count of the total records to be processed, and I wonder the impact on task scheduling when returning an inaccurate progress percentage.

TaskTracker failed to start: NoClassDefFoundError: Configured.

2011-03-16 Thread Jane Chen
Hi, I was trying to start up a single-node Hadoop cluster using 0.20.2. The namenode, datanode and jobtracker all started fine. The task tracker failed with the following error: 2011-03-16 11:39:47,479 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all re

Initialization for record writing.

2011-01-21 Thread Jane Chen
Hi, Looking at 0.21's API, In org.apache.hadoop.mapreduce.RecordReader, there is an initialize() method that I can use for one time work; In org.apache.hadoop.mapreduce.RecordWriter, there is no initialize(). Why is that? Where am I supposed to do the one time initialization? In the constru

Re: How to Influence Reduce Task Location.

2010-12-19 Thread Jane Chen
particular key should go to. I am not sure if that can be done. Just out of curiosity, why do you need this kind of control over reduction? Hari On Sat, Dec 18, 2010 at 11:54 PM, Jane Chen wrote: But how does this help me request which host to schedule the reduce task to? Thanks, Jane

Re: How to Influence Reduce Task Location.

2010-12-18 Thread Jane Chen
Jane,          The partitioner class can be used to achieve this. (http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html). Thanks, Hari On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen wrote: Hi All, Is there anyway to influence where a reduce task is

How to Influence Reduce Task Location.

2010-12-18 Thread Jane Chen
Hi All, Is there anyway to influence where a reduce task is run? We have a case where we'd like to choose the host to run the reduce task based on the task's input key. Any suggestion is greatly appreciated. Thanks, Jane

Re: InputFormat in mapred vs. mapreduce.

2010-12-07 Thread Jane Chen
directly with mapreduce.*. > If you are > using the New API all over (driver, mapper, etc.), you > should use the > mapreduce.* only, right? > > On Tue, Dec 7, 2010 at 3:05 AM, Jane Chen > wrote: > > In Hadoop 0.21, I found InputFormat as an Interface in > packa

InputFormat in mapred vs. mapreduce.

2010-12-06 Thread Jane Chen
In Hadoop 0.21, I found InputFormat as an Interface in package mapred, and as an abstract class in package mapreduce. The APIs are slightly different. Which one should I choose to extend from or implement? How are the two packages intended to be used differently? Thanks, Jane