Re: Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

2011-05-06 Thread jeremy
Thanks Todd. Unfortunately, I'm using Hadoop cascading, so I'm not sure if there's an easy mechanism to force LocalJobs it fires off to use a different configuration. I'll talk to the Cascading folks and find out. J Quoting Todd Lipcon : Hi Jeremy, That's a good point - we don't curren

Re: Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

2011-05-06 Thread Todd Lipcon
Hi Jeremy, That's a good point - we don't currently do a good job of segregating the configurations used for the LJR from the configs used for the TaskTracker. In particular I think both mapred.local.dir and mapred.system.dir are used by both. You run into the same issue when trying to use LJR on

Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

2011-05-06 Thread jeremy
Hi, I'm running hadoop (Cloudera release 3) in pseudo distributed mode, with the linux task controller so that jobs will run as the user who submitted them. My program (which uses hadoop cascading) fires off a job using LocalJobRunner (I think to read data from the local filesystem). So

Re: Multiple Outputs Not Being Written to File

2011-05-06 Thread Joey Echeverria
You need to add a call to MultipleOutputs.close() in your reducer's cleanup: public void cleanup(Context) throws IOException { mos.close(); ... } On Fri, May 6, 2011 at 1:55 PM, Geoffry Roberts wrote: > All, > > I am attempting to take a large file and split it up into a series of > smal

Re: Passing an Object to All Reducers

2011-05-06 Thread Geoffry Roberts
Steve, Yes the object is known at start up and is read only. The mappers don't touch it. I considered serializing to a string. I was wondering if there wasn't a more excellent way. Thanks On 6 May 2011 10:55, Steve Lewis wrote: > If possible serialize the object as XML then add it as a set

Re: Passing an Object to All Reducers

2011-05-06 Thread David Rosenstrauch
On 05/06/2011 01:12 PM, Geoffry Roberts wrote: All, I need for each one of my reducers to have read access to a certain object or a clone thereof. I can instantiate this object a start up. How can I give my reducers a copy? Serialize it to a string, set it as a configuration setting on the j

Multiple Outputs Not Being Written to File

2011-05-06 Thread Geoffry Roberts
All, I am attempting to take a large file and split it up into a series of smaller files. I want the smaller files to be named based on values taken from the large file. I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to do this. The job runs without error and produces a set o

Re: Passing an Object to All Reducers

2011-05-06 Thread Steve Lewis
If possible serialize the object as XML then add it as a set of lines to the config - alternatively serialize it (maybe xml) to a known spot in HDFS and read it in in the setup code in the reducer - I assume this is an object known at the start of the job and not modified by the mapper On Fri, May

Passing an Object to All Reducers

2011-05-06 Thread Geoffry Roberts
All, I need for each one of my reducers to have read access to a certain object or a clone thereof. I can instantiate this object a start up. How can I give my reducers a copy? -- Geoffry Roberts