After doing the search I found it in/etc/hadoop/conf/hadoop-site.xml and the property is defined correctly but all my malted tasks have virtual memory of 500m but I set the value to -Xmx1600m.
Praveen On Feb 3, 2011, at 4:57 PM, ext Andrei Savu <savu.and...@gmail.com> wrote: > Did you try do to a file search for *-site.xml? I will take a closer look at > the install scripts. > > -original message- > Subject: RE: Running Mapred jobs after launching cluster > From: <praveen.pe...@nokia.com> > Date: 03/02/2011 23:32 > > Hi Andrei, > I checked the /usr/local/hadoop-0.20.2/conf directory on master node and all > the *site.xml files are empty. I know some of my properties that I changed in > post_configure script are working but I wonder where that information is > stored. Even the master and slaves files are pointing to just localhost. It > almost looks like this is not the conf directory. > > Praveen > > ________________________________ > From: ext praveen.pe...@nokia.com [mailto:praveen.pe...@nokia.com] > Sent: Thursday, February 03, 2011 1:07 PM > To: whirr-user@incubator.apache.org > Cc: whirr-user@incubator.apache.org > Subject: Re: Running Mapred jobs after launching cluster > > Yes I modified the post_configure script. I modified 3 different properties > and I saw the other two are overwritten but not the mapred.child.java.opts > property. > > Praveen > > On Feb 3, 2011, at 12:33 PM, ext Andrei Savu > <savu.and...@gmail.com<mailto:savu.and...@gmail.com>> wrote: > > Here are the two relevant lines from the install scripts: > > HADOOP_VERSION=${HADOOP_VERSION:-0.20.2} > HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION > HADOOP_CONF_DIR=$HADOOP_HOME/conf > > Have you tried changing CHIL_OPTS in apache/hadoop/post-configure and use > that custom script to deploy a cluster? > > I don't have a running cluster to check this now. > > On Thu, Feb 3, 2011 at 7:18 PM, > <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> > wrote: > Hi Tom/all > Where are the hadoop config files stored on the cluster nodes? I would like > to debug this issue since I need to give more memory for child java mapred > processes to process huge chunks of data. > > Thanks > Praveen > -----Original Message----- > From: ext <mailto:praveen.pe...@nokia.com> > praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com> > [mailto:<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>] > Sent: Wednesday, February 02, 2011 5:23 PM > To: <mailto:whirr-user@incubator.apache.org> > whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> > Subject: RE: Running Mapred jobs after launching cluster > > Can anyone think of a reason why the below property is not honoured when I > overwrote this along with other properties in the post_configure. Other > properties are correctly overwritten except this one. I need to set the > mapred tasks jvm to bigger than 200m. > > Praveen > ________________________________________ > From: Peddi Praveen (Nokia-MS/Boston) > Sent: Tuesday, February 01, 2011 11:21 AM > To: <mailto:whirr-user@incubator.apache.org> > whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> > Subject: RE: Running Mapred jobs after launching cluster > > Thanks Tom. Silly me I should have thought of the property name. It works now > except one issue: I ran the wordcount example and I saw that no. of map and > reduce tasks are as I configured in post_configure script but for some reason > the below property in job.xml is always -Xmx200m and I set it to -Xmx1700m. > Not sure if this property is any special. > > mapred.child.java.opts -Xmx200m > > Praveen > ________________________________________ > From: ext Tom White > [<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] > Sent: Tuesday, February 01, 2011 12:13 AM > To: <mailto:whirr-user@incubator.apache.org> > whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> > Subject: Re: Running Mapred jobs after launching cluster > > Try setting whirr.run-url-base, not run-url-base. > > Tom > > On Mon, Jan 31, 2011 at 5:33 PM, > <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> > wrote: >> I am not using cdh (for now anyway) but the default hadoop. I even changed >> the "localhost" to ipaddress and still no luck. It likely that I am doing >> something wrong but having hard time debugging. >> Here are the properties I changed in /var/www/apache/hadoop/post-configure >> but when I run the job I am not seeing these values. >> MAX_MAP_TASKS=16 >> MAX_REDUCE_TASKS=24 >> CHILD_OPTS=-Xmx1700m >> >> Here is what I see in /tmp/runscript/runscript.sh of master node. It >> doesn't look like it used my scripts... >> >> installRunUrl || exit 1 >> runurl >> <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure> >> http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure >> -hostnames -c cloudservers runurl >> <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta> >> http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta >> ll runurl >> <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/> >> http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/ >> install -c cloudservers >> >> Any suggestions? >> Praveen >> ________________________________________ >> From: ext Tom White >> [<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >> Sent: Monday, January 31, 2011 6:23 PM >> To: <mailto:whirr-user@incubator.apache.org> >> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >> Subject: Re: Running Mapred jobs after launching cluster >> >> On Mon, Jan 31, 2011 at 3:03 PM, >> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >> wrote: >>> If I anyway have to upload the files to webservers, do I still need the >>> patch then? It looks like the script has these properties that I can >>> overwrite. >> >> I suggested you look at the patch (WHIRR-55) so you can see how it >> will be possible once it's committed. To try it out you need to upload >> the scripts to a webserver (since the patch changes one of them). >> >>> >>> BTW I tried with webserver path and I could not make it work so far. >>> >>> 1. I copied scripts/apache folder to my /var/www folder and modified below >>> 3 properties in /var/www/apache/hadoop/post-configure. >>> 2. I changed hadoop.properties added following line >>> run-url-base=<http://localhost/>http://localhost/ 3. Launched the >>> cluster and >>> verified the job properties are not what I changed to. They are all >>> defaults. >> >> This looks right to me. If you are using CDH you need to change >> cloudera/cdh/post-configure. >> >>> >>> How do I debug this issue? >> >> You can log into the instances (see the FAQ for how to do this) and >> look at the scripts that actually ran (and their output) in the /tmp >> directory. >> >> >> Tom >> >>> >>> Praveen >>> >>> >>> Launched the cluster and I didn't see child jvm have 2G alloc >>> -----Original Message----- >>> From: ext Tom White >>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>> Sent: Monday, January 31, 2011 3:02 PM >>> To: <mailto:whirr-user@incubator.apache.org> >>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>> Subject: Re: Running Mapred jobs after launching cluster >>> >>> Hi Praveen, >>> >>> I think removing the webserver dependency (or making it optional) >>> would be a good goal, but we're not there yet. I've just created >>> <https://issues.apache.org/jira/browse/WHIRR-225> >>> https://issues.apache.org/jira/browse/WHIRR-225 as a place to discuss the >>> design and implementation. >>> >>> In the meantime you could take a look at >>> <https://issues.apache.org/jira/browse/WHIRR-55> >>> https://issues.apache.org/jira/browse/WHIRR-55, and try using the patch >>> there to override some Hadoop properties (you will need to upload the >>> scripts to a webserver still however, until it is committed, since it >>> modifies Hadoop's post-configure script). >>> >>> Hope this helps. >>> >>> Cheers, >>> Tom >>> >>> BTW what are the security concerns you have? There are no credentials >>> embedded in the scripts, so it should be safe to host them publicly, no? >>> >>> On Mon, Jan 31, 2011 at 11:00 AM, >>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>> wrote: >>>> Hi Tom, >>>> If the hadoop install is fixed, Whirr must be getting all default hadoop >>>> properties from the hadoop install itself, correct? I sent an email about >>>> configuring hadoop properties and you mentioned I need to put the modified >>>> scripts on a webserver that is publicly accessible. I was wondering if >>>> there is place inside hadoop install I can change so that I don't need to >>>> put the scripts on webserver (for security reasons). Do you think it is >>>> possible? If so, how? I do not mind customizing the jar file for our >>>> purposes. I want to change the following properties: >>>> >>>> mapred.reduce.tasks=24 >>>> mapred.map.tasks=64 >>>> mapred.child.java.opts=-Xmx2048m >>>> >>>> Thanks in advance. >>>> Praveen >>>> >>>> -----Original Message----- >>>> From: ext Tom White >>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>> Sent: Friday, January 28, 2011 4:02 PM >>>> To: <mailto:whirr-user@incubator.apache.org> >>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>> Subject: Re: Running Mapred jobs after launching cluster >>>> >>>> It is fixed, and currently on 0.20.2. It will be made configurable in >>>> <https://issues.apache.org/jira/browse/WHIRR-222> >>>> https://issues.apache.org/jira/browse/WHIRR-222. >>>> >>>> Cheers >>>> Tom >>>> >>>> On Fri, Jan 28, 2011 at 12:56 PM, >>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>> wrote: >>>>> Hi Tom, >>>>> So the hadoop version is not going to change for a given Whirr install? I >>>>> thought Whirr is getting hadoop install dynamically from a URL which is >>>>> always going to have the latest hadoop version. If that is not the case I >>>>> guess I am fine. I just don't want to get hadoop version mismatch 6 >>>>> months after our software is released just because new hadoop version got >>>>> released. >>>>> >>>>> Thanks >>>>> Praveen >>>>> >>>>> -----Original Message----- >>>>> From: ext Tom White >>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>>> Sent: Friday, January 28, 2011 3:35 PM >>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>> >>>>> On Fri, Jan 28, 2011 at 12:06 PM, >>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>> wrote: >>>>>> Thanks Tom. I think I got it working with my own driver so I will go >>>>>> with it for now (unless that proves to be a bad option). >>>>>> >>>>>> BTW, could you tell me how to stick with one hadoop version while >>>>>> launching cluster. I have hadoop-0.20.2 in my classpath but it lookws >>>>>> like Whirr gets the latest hadoop from the repository. Since the latest >>>>>> version may be different depending on the time, I would like to stick to >>>>>> one version so that hadoop version mismatch won't happen. >>>>> >>>>> You do need to make sure that the versions are the same. See the Hadoop >>>>> integration tests, which specify the version of Hadoop to use in their >>>>> POM. >>>>> >>>>>> >>>>>> Also what jar files are necessary for launching cluster using Java. >>>>>> Currently I have cli version of jar file but that's way too large since >>>>>> it has ervrything in it. >>>>> >>>>> You need Whirr's core and Hadoop jars, as well as their dependencies. >>>>> If you look at the POMs in the source code they will tell you the >>>>> dependencies. >>>>> >>>>> Cheers >>>>> Tom >>>>> >>>>>> >>>>>> Thanks >>>>>> Praveen >>>>>> >>>>>> -----Original Message----- >>>>>> From: ext Tom White >>>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>>>> Sent: Friday, January 28, 2011 2:12 PM >>>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>>> >>>>>> On Fri, Jan 28, 2011 at 6:28 AM, >>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>>> wrote: >>>>>>> Thanks Tom. Could you eloborate little more on the second option. >>>>>>> >>>>>>> What is the HADOOP_CONF_DIR here, after launching the cluster? >>>>>> >>>>>> ~/.whirr/<cluster-name> >>>>>> >>>>>>> When you said run in new process, did you mean using command line Whirr >>>>>>> tool? >>>>>> >>>>>> I meant that you could launch Whirr using the CLI, or Java. Then run the >>>>>> job in another process, with HADOOP_CONF_DIR set. >>>>>> >>>>>> The MR jobs you are running I assume can be run against an arbitrary >>>>>> cluster, so you should be able to point them at a cluster started by >>>>>> Whirr. >>>>>> >>>>>> Tom >>>>>> >>>>>>> >>>>>>> I may finally end up writing my own driver for running external mapred >>>>>>> jobs so I can have more control but I was just curious to know if >>>>>>> option #2 is better than writing my own driver. >>>>>>> >>>>>>> Praveen >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ext Tom White >>>>>>> [mailto:<mailto:t...@cloudera.com>t...@cloudera.com<mailto:t...@cloudera.com>] >>>>>>> Sent: Thursday, January 27, 2011 4:01 PM >>>>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>>>> >>>>>>> If they implement the Tool interface then you can set configuration on >>>>>>> them. Failing that you could set HADOOP_CONF_DIR and run them in a new >>>>>>> process. >>>>>>> >>>>>>> Cheers, >>>>>>> Tom >>>>>>> >>>>>>> On Thu, Jan 27, 2011 at 12:52 PM, >>>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>>>> wrote: >>>>>>>> Hmm... >>>>>>>> I am running some of the map reduce jobs written by me but some of >>>>>>>> them are in external libraries (eg. Mahout) which I don't have control >>>>>>>> over. Since I can't modify the code in external libraries, is there >>>>>>>> any other way to make this work? >>>>>>>> >>>>>>>> Praveen >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: ext Tom White >>>>>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>>>>>> Sent: Thursday, January 27, 2011 3:42 PM >>>>>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>>>>> >>>>>>>> You don't need to add anything to the classpath, but you need to use >>>>>>>> the configuration in the org.apache.whirr.service.Cluster object to >>>>>>>> populate your Hadoop Configuration object so that your code knows >>>>>>>> which cluster to connect to. See the getConfiguration() method in >>>>>>>> HadoopServiceController for how to do this. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Tom >>>>>>>> >>>>>>>> On Thu, Jan 27, 2011 at 12:21 PM, >>>>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>>>>> wrote: >>>>>>>>> Hello all, >>>>>>>>> I wrote a java class HadoopLanucher that is very similar to >>>>>>>>> HadoopServiceController. I was succesfully able to launch a >>>>>>>>> cluster programtically from my application using Whirr. Now I >>>>>>>>> want to copy files to hdfs and also run a job progrmatically. >>>>>>>>> >>>>>>>>> When I copy a file to hdfs its copying to local file system, not hdfs. >>>>>>>>> Here is the code I used: >>>>>>>>> >>>>>>>>> Configuration conf = new Configuration(); FileSystem hdfs = >>>>>>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new >>>>>>>>> Path(localFilePath), new Path(hdfsFileDirectory)); >>>>>>>>> >>>>>>>>> Do I need to add anything else to the classpath so Hadoop >>>>>>>>> libraries know that it needs to talk to the dynamically lanuched >>>>>>>>> cluster? >>>>>>>>> When running Whirr from command line I know it uses >>>>>>>>> HADOOP_CONF_DIR to find the hadoop config files but when doing >>>>>>>>> the same from Java I am wondering how to solve this issue. >>>>>>>>> >>>>>>>>> Praveen >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > > -- > Andrei Savu -- andreisavu.ro<http://andreisavu.ro> > >