Re: Running Mapred jobs after launching cluster

praveen.peddi Thu, 03 Feb 2011 14:21:25 -0800

After doing the search I found it in/etc/hadoop/conf/hadoop-site.xml and the 
property is defined correctly but all my malted tasks have virtual memory of 
500m but I set the value to -Xmx1600m.


Praveen

On Feb 3, 2011, at 4:57 PM, ext Andrei Savu <savu.and...@gmail.com> wrote:

> Did you try do to a file search for *-site.xml? I will take a closer look at 
> the install scripts.
>
> -original message-
> Subject: RE: Running Mapred jobs after launching cluster
> From: <praveen.pe...@nokia.com>
> Date: 03/02/2011 23:32
>
> Hi Andrei,
> I checked the /usr/local/hadoop-0.20.2/conf directory on master node and all 
> the *site.xml files are empty. I know some of my properties that I changed in 
> post_configure script are working but I wonder where that information is 
> stored. Even the master and slaves files are pointing to just localhost. It 
> almost looks like this is not the conf directory.
>
> Praveen
>
> ________________________________
> From: ext praveen.pe...@nokia.com [mailto:praveen.pe...@nokia.com]
> Sent: Thursday, February 03, 2011 1:07 PM
> To: whirr-user@incubator.apache.org
> Cc: whirr-user@incubator.apache.org
> Subject: Re: Running Mapred jobs after launching cluster
>
> Yes I modified the post_configure script. I modified 3 different properties 
> and I saw the other two are overwritten but not the mapred.child.java.opts 
> property.
>
> Praveen
>
> On Feb 3, 2011, at 12:33 PM, ext Andrei Savu 
> <savu.and...@gmail.com<mailto:savu.and...@gmail.com>> wrote:
>
> Here are the two relevant lines from the install scripts:
>
> HADOOP_VERSION=${HADOOP_VERSION:-0.20.2}
> HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION
> HADOOP_CONF_DIR=$HADOOP_HOME/conf
>
> Have you tried changing CHIL_OPTS in apache/hadoop/post-configure and use 
> that custom script to deploy a cluster?
>
> I don't have a running cluster to check this now.
>
> On Thu, Feb 3, 2011 at 7:18 PM, 
> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>  wrote:
> Hi Tom/all
> Where are the hadoop config files stored on the cluster nodes? I would like 
> to debug this issue since I need to give more memory for child java mapred 
> processes to process huge chunks of data.
>
> Thanks
> Praveen
> -----Original Message-----
> From: ext <mailto:praveen.pe...@nokia.com> 
> praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com> 
> [mailto:<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>]
> Sent: Wednesday, February 02, 2011 5:23 PM
> To: <mailto:whirr-user@incubator.apache.org> 
> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
> Subject: RE: Running Mapred jobs after launching cluster
>
> Can anyone think of a reason why the below property is not honoured when I 
> overwrote this along with other properties in the post_configure. Other 
> properties are correctly overwritten except this one. I need to set the 
> mapred tasks jvm to bigger than 200m.
>
> Praveen
> ________________________________________
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Tuesday, February 01, 2011 11:21 AM
> To: <mailto:whirr-user@incubator.apache.org> 
> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
> Subject: RE: Running Mapred jobs after launching cluster
>
> Thanks Tom. Silly me I should have thought of the property name. It works now 
> except one issue: I ran the wordcount example and I saw that no. of map and 
> reduce tasks are as I configured in post_configure script but for some reason 
> the below property in job.xml is always -Xmx200m and I set it to -Xmx1700m. 
> Not sure if this property is any special.
>
> mapred.child.java.opts  -Xmx200m
>
> Praveen
> ________________________________________
> From: ext Tom White 
> [<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
> Sent: Tuesday, February 01, 2011 12:13 AM
> To: <mailto:whirr-user@incubator.apache.org> 
> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
> Subject: Re: Running Mapred jobs after launching cluster
>
> Try setting whirr.run-url-base, not run-url-base.
>
> Tom
>
> On Mon, Jan 31, 2011 at 5:33 PM,  
> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>  wrote:
>> I am not using cdh (for now anyway) but the default hadoop. I even changed 
>> the "localhost" to ipaddress and still no luck. It likely that I am doing 
>> something wrong but having hard time debugging.
>> Here are the properties I changed in  /var/www/apache/hadoop/post-configure 
>> but when I run the job I am not seeing these values.
>> MAX_MAP_TASKS=16
>> MAX_REDUCE_TASKS=24
>> CHILD_OPTS=-Xmx1700m
>>
>> Here is what I see in  /tmp/runscript/runscript.sh of master node. It 
>> doesn't look like it used my scripts...
>>
>> installRunUrl || exit 1
>> runurl
>> <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure> 
>> http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure
>> -hostnames -c cloudservers runurl
>> <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta> 
>> http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta
>> ll runurl
>> <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/> 
>> http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/
>> install -c cloudservers
>>
>> Any suggestions?
>> Praveen
>> ________________________________________
>> From: ext Tom White 
>> [<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
>> Sent: Monday, January 31, 2011 6:23 PM
>> To: <mailto:whirr-user@incubator.apache.org> 
>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>> Subject: Re: Running Mapred jobs after launching cluster
>>
>> On Mon, Jan 31, 2011 at 3:03 PM,  
>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>  wrote:
>>> If I anyway have to upload the files to webservers, do I still need the 
>>> patch then? It looks like the script has these properties that I can 
>>> overwrite.
>>
>> I suggested you look at the patch (WHIRR-55) so you can see how it
>> will be possible once it's committed. To try it out you need to upload
>> the scripts to a webserver (since the patch changes one of them).
>>
>>>
>>> BTW I tried with webserver path and I could not make it work so far.
>>>
>>> 1. I copied scripts/apache folder to my /var/www folder and modified below 
>>> 3 properties in /var/www/apache/hadoop/post-configure.
>>> 2. I changed hadoop.properties added following line
>>>       run-url-base=<http://localhost/>http://localhost/ 3. Launched the 
>>> cluster and
>>> verified the job properties are not what I changed to. They are all 
>>> defaults.
>>
>> This looks right to me. If you are using CDH you need to change
>> cloudera/cdh/post-configure.
>>
>>>
>>> How do I debug this issue?
>>
>> You can log into the instances (see the FAQ for how to do this) and
>> look at the scripts that actually ran (and their output) in the /tmp
>> directory.
>>
>>
>> Tom
>>
>>>
>>> Praveen
>>>
>>>
>>> Launched the cluster and I didn't see child jvm have 2G alloc
>>> -----Original Message-----
>>> From: ext Tom White 
>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
>>> Sent: Monday, January 31, 2011 3:02 PM
>>> To: <mailto:whirr-user@incubator.apache.org> 
>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>>> Subject: Re: Running Mapred jobs after launching cluster
>>>
>>> Hi Praveen,
>>>
>>> I think removing the webserver dependency (or making it optional)
>>> would be a good goal, but we're not there yet. I've just created
>>> <https://issues.apache.org/jira/browse/WHIRR-225> 
>>> https://issues.apache.org/jira/browse/WHIRR-225 as a place to discuss the 
>>> design and implementation.
>>>
>>> In the meantime you could take a look at 
>>> <https://issues.apache.org/jira/browse/WHIRR-55> 
>>> https://issues.apache.org/jira/browse/WHIRR-55, and try using the patch 
>>> there to override some Hadoop properties (you will need to upload the 
>>> scripts to a webserver still however, until it is committed, since it 
>>> modifies Hadoop's post-configure script).
>>>
>>> Hope this helps.
>>>
>>> Cheers,
>>> Tom
>>>
>>> BTW what are the security concerns you have? There are no credentials 
>>> embedded in the scripts, so it should be safe to host them publicly, no?
>>>
>>> On Mon, Jan 31, 2011 at 11:00 AM,  
>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>>  wrote:
>>>> Hi Tom,
>>>> If the hadoop install is fixed, Whirr must be getting all default hadoop 
>>>> properties from the hadoop install itself, correct? I sent an email about 
>>>> configuring hadoop properties and you mentioned I need to put the modified 
>>>> scripts on a webserver that is publicly accessible. I was wondering if 
>>>> there is place inside hadoop install I can change so that I don't need to 
>>>> put the scripts on webserver (for security reasons). Do you think it is 
>>>> possible? If so, how? I do not mind customizing the jar file for our 
>>>> purposes. I want to change the following properties:
>>>>
>>>> mapred.reduce.tasks=24
>>>> mapred.map.tasks=64
>>>> mapred.child.java.opts=-Xmx2048m
>>>>
>>>> Thanks in advance.
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White 
>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
>>>> Sent: Friday, January 28, 2011 4:02 PM
>>>> To: <mailto:whirr-user@incubator.apache.org> 
>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>
>>>> It is fixed, and currently on 0.20.2. It will be made configurable in 
>>>> <https://issues.apache.org/jira/browse/WHIRR-222> 
>>>> https://issues.apache.org/jira/browse/WHIRR-222.
>>>>
>>>> Cheers
>>>> Tom
>>>>
>>>> On Fri, Jan 28, 2011 at 12:56 PM,  
>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>>>  wrote:
>>>>> Hi Tom,
>>>>> So the hadoop version is not going to change for a given Whirr install? I 
>>>>> thought Whirr is getting hadoop install dynamically from a URL which is 
>>>>> always going to have the latest hadoop version. If that is not the case I 
>>>>> guess I am fine. I just don't want to get hadoop version mismatch 6 
>>>>> months after our software is released just because new hadoop version got 
>>>>> released.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White 
>>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
>>>>> Sent: Friday, January 28, 2011 3:35 PM
>>>>> To: <mailto:whirr-user@incubator.apache.org> 
>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>>
>>>>> On Fri, Jan 28, 2011 at 12:06 PM,  
>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>>>>  wrote:
>>>>>> Thanks Tom. I think I got it working with my own driver so I will go 
>>>>>> with it for now (unless that proves to be a bad option).
>>>>>>
>>>>>> BTW, could you tell me how to stick with one hadoop version while 
>>>>>> launching cluster. I have hadoop-0.20.2 in my classpath but it lookws 
>>>>>> like Whirr gets the latest hadoop from the repository. Since the latest 
>>>>>> version may be different depending on the time, I would like to stick to 
>>>>>> one version so that hadoop version mismatch won't happen.
>>>>>
>>>>> You do need to make sure that the versions are the same. See the Hadoop 
>>>>> integration tests, which specify the version of Hadoop to use in their 
>>>>> POM.
>>>>>
>>>>>>
>>>>>> Also what jar files are necessary for launching cluster using Java. 
>>>>>> Currently I have cli version of jar file but that's way too large since 
>>>>>> it has ervrything in it.
>>>>>
>>>>> You need Whirr's core and Hadoop jars, as well as their dependencies.
>>>>> If you look at the POMs in the source code they will tell you the 
>>>>> dependencies.
>>>>>
>>>>> Cheers
>>>>> Tom
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ext Tom White 
>>>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
>>>>>> Sent: Friday, January 28, 2011 2:12 PM
>>>>>> To: <mailto:whirr-user@incubator.apache.org> 
>>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>>>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>>>
>>>>>> On Fri, Jan 28, 2011 at 6:28 AM,  
>>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>>>>>  wrote:
>>>>>>> Thanks Tom. Could you eloborate little more on the second option.
>>>>>>>
>>>>>>> What is the HADOOP_CONF_DIR here, after launching the cluster?
>>>>>>
>>>>>> ~/.whirr/<cluster-name>
>>>>>>
>>>>>>> When you said run in new process, did you mean using command line Whirr 
>>>>>>> tool?
>>>>>>
>>>>>> I meant that you could launch Whirr using the CLI, or Java. Then run the 
>>>>>> job in another process, with HADOOP_CONF_DIR set.
>>>>>>
>>>>>> The MR jobs you are running I assume can be run against an arbitrary 
>>>>>> cluster, so you should be able to point them at a cluster started by 
>>>>>> Whirr.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>>>
>>>>>>> I may finally end up writing my own driver for running external mapred 
>>>>>>> jobs so I can have more control but I was just curious to know if 
>>>>>>> option #2 is better than writing my own driver.
>>>>>>>
>>>>>>> Praveen
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: ext Tom White 
>>>>>>> [mailto:<mailto:t...@cloudera.com>t...@cloudera.com<mailto:t...@cloudera.com>]
>>>>>>> Sent: Thursday, January 27, 2011 4:01 PM
>>>>>>> To: <mailto:whirr-user@incubator.apache.org> 
>>>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>>>>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>>>>
>>>>>>> If they implement the Tool interface then you can set configuration on 
>>>>>>> them. Failing that you could set HADOOP_CONF_DIR and run them in a new 
>>>>>>> process.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Tom
>>>>>>>
>>>>>>> On Thu, Jan 27, 2011 at 12:52 PM,  
>>>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>>>>>>  wrote:
>>>>>>>> Hmm...
>>>>>>>> I am running some of the map reduce jobs written by me but some of 
>>>>>>>> them are in external libraries (eg. Mahout) which I don't have control 
>>>>>>>> over. Since I can't modify the code in external libraries, is there 
>>>>>>>> any other way to make this work?
>>>>>>>>
>>>>>>>> Praveen
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: ext Tom White 
>>>>>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>]
>>>>>>>> Sent: Thursday, January 27, 2011 3:42 PM
>>>>>>>> To: <mailto:whirr-user@incubator.apache.org> 
>>>>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org>
>>>>>>>> Subject: Re: Running Mapred jobs after launching cluster
>>>>>>>>
>>>>>>>> You don't need to add anything to the classpath, but you need to use 
>>>>>>>> the configuration in the org.apache.whirr.service.Cluster object to 
>>>>>>>> populate your Hadoop Configuration object so that your code knows 
>>>>>>>> which cluster to connect to. See the getConfiguration() method in 
>>>>>>>> HadoopServiceController for how to do this.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Tom
>>>>>>>>
>>>>>>>> On Thu, Jan 27, 2011 at 12:21 PM,  
>>>>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>>
>>>>>>>>  wrote:
>>>>>>>>> Hello all,
>>>>>>>>> I wrote a java class HadoopLanucher that is very similar to
>>>>>>>>> HadoopServiceController. I was succesfully able to launch a
>>>>>>>>> cluster programtically from my application using Whirr. Now I
>>>>>>>>> want to copy files to hdfs and also run a job progrmatically.
>>>>>>>>>
>>>>>>>>> When I copy a file to hdfs its copying to local file system, not hdfs.
>>>>>>>>> Here is the code I used:
>>>>>>>>>
>>>>>>>>> Configuration conf = new Configuration(); FileSystem hdfs =
>>>>>>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new
>>>>>>>>> Path(localFilePath), new Path(hdfsFileDirectory));
>>>>>>>>>
>>>>>>>>> Do I need to add anything else to the classpath so Hadoop
>>>>>>>>> libraries know that it needs to talk to the dynamically lanuched 
>>>>>>>>> cluster?
>>>>>>>>> When running Whirr from command line I know it uses
>>>>>>>>> HADOOP_CONF_DIR to find the hadoop config files but when doing
>>>>>>>>> the same from Java I am wondering how to solve this issue.
>>>>>>>>>
>>>>>>>>> Praveen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Andrei Savu -- andreisavu.ro<http://andreisavu.ro>
>
>

Re: Running Mapred jobs after launching cluster

Reply via email to