Re: Running Mapred jobs after launching cluster

Tom White Mon, 31 Jan 2011 15:23:53 -0800

On Mon, Jan 31, 2011 at 3:03 PM,  <praveen.pe...@nokia.com> wrote:
> If I anyway have to upload the files to webservers, do I still need the patch 
> then? It looks like the script has these properties that I can overwrite.


I suggested you look at the patch (WHIRR-55) so you can see how it
will be possible once it's committed. To try it out you need to upload
the scripts to a webserver (since the patch changes one of them).

>
> BTW I tried with webserver path and I could not make it work so far.
>
> 1. I copied scripts/apache folder to my /var/www folder and modified below 3 
> properties in /var/www/apache/hadoop/post-configure.
> 2. I changed hadoop.properties added following line
>        run-url-base=http://localhost/
> 3. Launched the cluster and verified the job properties are not what I 
> changed to. They are all defaults.

This looks right to me. If you are using CDH you need to change
cloudera/cdh/post-configure.

>
> How do I debug this issue?

You can log into the instances (see the FAQ for how to do this) and
look at the scripts that actually ran (and their output) in the /tmp
directory.


Tom

>
> Praveen
>
>
> Launched the cluster and I didn't see child jvm have 2G alloc
> -----Original Message-----
> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
> Sent: Monday, January 31, 2011 3:02 PM
> To: whirr-user@incubator.apache.org
> Subject: Re: Running Mapred jobs after launching cluster
>
> Hi Praveen,
>
> I think removing the webserver dependency (or making it optional) would be a 
> good goal, but we're not there yet. I've just created
> https://issues.apache.org/jira/browse/WHIRR-225 as a place to discuss the 
> design and implementation.
>
> In the meantime you could take a look at 
> https://issues.apache.org/jira/browse/WHIRR-55, and try using the patch there 
> to override some Hadoop properties (you will need to upload the scripts to a 
> webserver still however, until it is committed, since it modifies Hadoop's 
> post-configure script).
>
> Hope this helps.
>
> Cheers,
> Tom
>
> BTW what are the security concerns you have? There are no credentials 
> embedded in the scripts, so it should be safe to host them publicly, no?
>
> On Mon, Jan 31, 2011 at 11:00 AM,  <praveen.pe...@nokia.com> wrote:
>> Hi Tom,
>> If the hadoop install is fixed, Whirr must be getting all default hadoop 
>> properties from the hadoop install itself, correct? I sent an email about 
>> configuring hadoop properties and you mentioned I need to put the modified 
>> scripts on a webserver that is publicly accessible. I was wondering if there 
>> is place inside hadoop install I can change so that I don't need to put the 
>> scripts on webserver (for security reasons). Do you think it is possible? If 
>> so, how? I do not mind customizing the jar file for our purposes. I want to 
>> change the following properties:
>>
>> mapred.reduce.tasks=24
>> mapred.map.tasks=64
>> mapred.child.java.opts=-Xmx2048m
>>
>> Thanks in advance.
>> Praveen
>>
>> -----Original Message-----
>> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
>> Sent: Friday, January 28, 2011 4:02 PM
>> To: whirr-user@incubator.apache.org
>> Subject: Re: Running Mapred jobs after launching cluster
>>
>> It is fixed, and currently on 0.20.2. It will be made configurable in 
>> https://issues.apache.org/jira/browse/WHIRR-222.
>>
>> Cheers
>> Tom
>>
>> On Fri, Jan 28, 2011 at 12:56 PM,  <praveen.pe...@nokia.com> wrote:
>>> Hi Tom,
>>> So the hadoop version is not going to change for a given Whirr install? I 
>>> thought Whirr is getting hadoop install dynamically from a URL which is 
>>> always going to have the latest hadoop version. If that is not the case I 
>>> guess I am fine. I just don't want to get hadoop version mismatch 6 months 
>>> after our software is released just because new hadoop version got released.
>>>
>>> Thanks
>>> Praveen
>>>
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
>>> Sent: Friday, January 28, 2011 3:35 PM
>>> To: whirr-user@incubator.apache.org
>>> Subject: Re: Running Mapred jobs after launching cluster
>>>
>>> On Fri, Jan 28, 2011 at 12:06 PM,  <praveen.pe...@nokia.com> wrote:
>>>> Thanks Tom. I think I got it working with my own driver so I will go with 
>>>> it for now (unless that proves to be a bad option).
>>>>
>>>> BTW, could you tell me how to stick with one hadoop version while 
>>>> launching cluster. I have hadoop-0.20.2 in my classpath but it lookws like 
>>>> Whirr gets the latest hadoop from the repository. Since the latest version 
>>>> may be different depending on the time, I would like to stick to one 
>>>> version so that hadoop version mismatch won't happen.
>>>
>>> You do need to make sure that the versions are the same. See the Hadoop 
>>> integration tests, which specify the version of Hadoop to use in their POM.
>>>
>>>>
>>>> Also what jar files are necessary for launching cluster using Java. 
>>>> Currently I have cli version of jar file but that's way too large since it 
>>>> has ervrything in it.
>>>
>>> You need Whirr's core and Hadoop jars, as well as their dependencies.
>>> If you look at the POMs in the source code they will tell you the 
>>> dependencies.
>>>
>>> Cheers
>>> Tom
>>>
>>>>
>>>> Thanks
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
>>>> Sent: Friday, January 28, 2011 2:12 PM
>>>> To: whirr-user@incubator.apache.org
>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>
>>>> On Fri, Jan 28, 2011 at 6:28 AM,  <praveen.pe...@nokia.com> wrote:
>>>>> Thanks Tom. Could you eloborate little more on the second option.
>>>>>
>>>>> What is the HADOOP_CONF_DIR here, after launching the cluster?
>>>>
>>>> ~/.whirr/<cluster-name>
>>>>
>>>>> When you said run in new process, did you mean using command line Whirr 
>>>>> tool?
>>>>
>>>> I meant that you could launch Whirr using the CLI, or Java. Then run the 
>>>> job in another process, with HADOOP_CONF_DIR set.
>>>>
>>>> The MR jobs you are running I assume can be run against an arbitrary 
>>>> cluster, so you should be able to point them at a cluster started by Whirr.
>>>>
>>>> Tom
>>>>
>>>>>
>>>>> I may finally end up writing my own driver for running external mapred 
>>>>> jobs so I can have more control but I was just curious to know if option 
>>>>> #2 is better than writing my own driver.
>>>>>
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:t...@cloudera.com]
>>>>> Sent: Thursday, January 27, 2011 4:01 PM
>>>>> To: whirr-user@incubator.apache.org
>>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>>
>>>>> If they implement the Tool interface then you can set configuration on 
>>>>> them. Failing that you could set HADOOP_CONF_DIR and run them in a new 
>>>>> process.
>>>>>
>>>>> Cheers,
>>>>> Tom
>>>>>
>>>>> On Thu, Jan 27, 2011 at 12:52 PM,  <praveen.pe...@nokia.com> wrote:
>>>>>> Hmm...
>>>>>> I am running some of the map reduce jobs written by me but some of them 
>>>>>> are in external libraries (eg. Mahout) which I don't have control over. 
>>>>>> Since I can't modify the code in external libraries, is there any other 
>>>>>> way to make this work?
>>>>>>
>>>>>> Praveen
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
>>>>>> Sent: Thursday, January 27, 2011 3:42 PM
>>>>>> To: whirr-user@incubator.apache.org
>>>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>>>
>>>>>> You don't need to add anything to the classpath, but you need to use the 
>>>>>> configuration in the org.apache.whirr.service.Cluster object to populate 
>>>>>> your Hadoop Configuration object so that your code knows which cluster 
>>>>>> to connect to. See the getConfiguration() method in 
>>>>>> HadoopServiceController for how to do this.
>>>>>>
>>>>>> Cheers,
>>>>>> Tom
>>>>>>
>>>>>> On Thu, Jan 27, 2011 at 12:21 PM,  <praveen.pe...@nokia.com> wrote:
>>>>>>> Hello all,
>>>>>>> I wrote a java class HadoopLanucher that is very similar to
>>>>>>> HadoopServiceController. I was succesfully able to launch a
>>>>>>> cluster programtically from my application using Whirr. Now I
>>>>>>> want to copy files to hdfs and also run a job progrmatically.
>>>>>>>
>>>>>>> When I copy a file to hdfs its copying to local file system, not hdfs.
>>>>>>> Here is the code I used:
>>>>>>>
>>>>>>> Configuration conf = new Configuration(); FileSystem hdfs =
>>>>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new
>>>>>>> Path(localFilePath), new Path(hdfsFileDirectory));
>>>>>>>
>>>>>>> Do I need to add anything else to the classpath so Hadoop
>>>>>>> libraries know that it needs to talk to the dynamically lanuched 
>>>>>>> cluster?
>>>>>>> When running Whirr from command line I know it uses
>>>>>>> HADOOP_CONF_DIR to find the hadoop config files but when doing
>>>>>>> the same from Java I am wondering how to solve this issue.
>>>>>>>
>>>>>>> Praveen
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Running Mapred jobs after launching cluster

Reply via email to