Re: Running Mapred jobs after launching cluster

Tom White Fri, 28 Jan 2011 11:12:31 -0800

On Fri, Jan 28, 2011 at 6:28 AM,  <praveen.pe...@nokia.com> wrote:
> Thanks Tom. Could you eloborate little more on the second option.
>
> What is the HADOOP_CONF_DIR here, after launching the cluster?


~/.whirr/<cluster-name>

> When you said run in new process, did you mean using command line Whirr tool?

I meant that you could launch Whirr using the CLI, or Java. Then run
the job in another process, with HADOOP_CONF_DIR set.

The MR jobs you are running I assume can be run against an arbitrary
cluster, so you should be able to point them at a cluster started by
Whirr.

Tom

>
> I may finally end up writing my own driver for running external mapred jobs 
> so I can have more control but I was just curious to know if option #2 is 
> better than writing my own driver.
>
> Praveen
>
> -----Original Message-----
> From: ext Tom White [mailto:t...@cloudera.com]
> Sent: Thursday, January 27, 2011 4:01 PM
> To: whirr-user@incubator.apache.org
> Subject: Re: Running Mapred jobs after launching cluster
>
> If they implement the Tool interface then you can set configuration on them. 
> Failing that you could set HADOOP_CONF_DIR and run them in a new process.
>
> Cheers,
> Tom
>
> On Thu, Jan 27, 2011 at 12:52 PM,  <praveen.pe...@nokia.com> wrote:
>> Hmm...
>> I am running some of the map reduce jobs written by me but some of them are 
>> in external libraries (eg. Mahout) which I don't have control over. Since I 
>> can't modify the code in external libraries, is there any other way to make 
>> this work?
>>
>> Praveen
>>
>> -----Original Message-----
>> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
>> Sent: Thursday, January 27, 2011 3:42 PM
>> To: whirr-user@incubator.apache.org
>> Subject: Re: Running Mapred jobs after launching cluster
>>
>> You don't need to add anything to the classpath, but you need to use the 
>> configuration in the org.apache.whirr.service.Cluster object to populate 
>> your Hadoop Configuration object so that your code knows which cluster to 
>> connect to. See the getConfiguration() method in HadoopServiceController for 
>> how to do this.
>>
>> Cheers,
>> Tom
>>
>> On Thu, Jan 27, 2011 at 12:21 PM,  <praveen.pe...@nokia.com> wrote:
>>> Hello all,
>>> I wrote a java class HadoopLanucher that is very similar to
>>> HadoopServiceController. I was succesfully able to launch a cluster
>>> programtically from my application using Whirr. Now I want to copy
>>> files to hdfs and also run a job progrmatically.
>>>
>>> When I copy a file to hdfs its copying to local file system, not hdfs.
>>> Here is the code I used:
>>>
>>> Configuration conf = new Configuration(); FileSystem hdfs =
>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new
>>> Path(localFilePath), new Path(hdfsFileDirectory));
>>>
>>> Do I need to add anything else to the classpath so Hadoop libraries
>>> know that it needs to talk to the dynamically lanuched cluster? When
>>> running Whirr from command line I know it uses HADOOP_CONF_DIR to
>>> find the hadoop config files but when doing the same from Java I am
>>> wondering how to solve this issue.
>>>
>>> Praveen
>>>
>>>
>>
>

Re: Running Mapred jobs after launching cluster

Reply via email to