Yes I modified the post_configure script. I modified 3 different properties and I saw the other two are overwritten but not the mapred.child.java.opts property.
Praveen On Feb 3, 2011, at 12:33 PM, ext Andrei Savu <savu.and...@gmail.com<mailto:savu.and...@gmail.com>> wrote: Here are the two relevant lines from the install scripts: HADOOP_VERSION=${HADOOP_VERSION:-0.20.2} HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION HADOOP_CONF_DIR=$HADOOP_HOME/conf Have you tried changing CHIL_OPTS in apache/hadoop/post-configure and use that custom script to deploy a cluster? I don't have a running cluster to check this now. On Thu, Feb 3, 2011 at 7:18 PM, <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> wrote: Hi Tom/all Where are the hadoop config files stored on the cluster nodes? I would like to debug this issue since I need to give more memory for child java mapred processes to process huge chunks of data. Thanks Praveen -----Original Message----- From: ext <mailto:praveen.pe...@nokia.com> praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com> [mailto:<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>] Sent: Wednesday, February 02, 2011 5:23 PM To: <mailto:whirr-user@incubator.apache.org> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> Subject: RE: Running Mapred jobs after launching cluster Can anyone think of a reason why the below property is not honoured when I overwrote this along with other properties in the post_configure. Other properties are correctly overwritten except this one. I need to set the mapred tasks jvm to bigger than 200m. Praveen ________________________________________ From: Peddi Praveen (Nokia-MS/Boston) Sent: Tuesday, February 01, 2011 11:21 AM To: <mailto:whirr-user@incubator.apache.org> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> Subject: RE: Running Mapred jobs after launching cluster Thanks Tom. Silly me I should have thought of the property name. It works now except one issue: I ran the wordcount example and I saw that no. of map and reduce tasks are as I configured in post_configure script but for some reason the below property in job.xml is always -Xmx200m and I set it to -Xmx1700m. Not sure if this property is any special. mapred.child.java.opts -Xmx200m Praveen ________________________________________ From: ext Tom White [<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] Sent: Tuesday, February 01, 2011 12:13 AM To: <mailto:whirr-user@incubator.apache.org> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> Subject: Re: Running Mapred jobs after launching cluster Try setting whirr.run-url-base, not run-url-base. Tom On Mon, Jan 31, 2011 at 5:33 PM, <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> wrote: > I am not using cdh (for now anyway) but the default hadoop. I even changed > the "localhost" to ipaddress and still no luck. It likely that I am doing > something wrong but having hard time debugging. > Here are the properties I changed in /var/www/apache/hadoop/post-configure > but when I run the job I am not seeing these values. > MAX_MAP_TASKS=16 > MAX_REDUCE_TASKS=24 > CHILD_OPTS=-Xmx1700m > > Here is what I see in /tmp/runscript/runscript.sh of master node. It doesn't > look like it used my scripts... > > installRunUrl || exit 1 > runurl > <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure> > http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure > -hostnames -c cloudservers runurl > <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta> > http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta > ll runurl > <http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/> > http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/ > install -c cloudservers > > Any suggestions? > Praveen > ________________________________________ > From: ext Tom White > [<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] > Sent: Monday, January 31, 2011 6:23 PM > To: <mailto:whirr-user@incubator.apache.org> > whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> > Subject: Re: Running Mapred jobs after launching cluster > > On Mon, Jan 31, 2011 at 3:03 PM, > <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> > wrote: >> If I anyway have to upload the files to webservers, do I still need the >> patch then? It looks like the script has these properties that I can >> overwrite. > > I suggested you look at the patch (WHIRR-55) so you can see how it > will be possible once it's committed. To try it out you need to upload > the scripts to a webserver (since the patch changes one of them). > >> >> BTW I tried with webserver path and I could not make it work so far. >> >> 1. I copied scripts/apache folder to my /var/www folder and modified below 3 >> properties in /var/www/apache/hadoop/post-configure. >> 2. I changed hadoop.properties added following line >> run-url-base=<http://localhost/>http://localhost/ 3. Launched the >> cluster and >> verified the job properties are not what I changed to. They are all defaults. > > This looks right to me. If you are using CDH you need to change > cloudera/cdh/post-configure. > >> >> How do I debug this issue? > > You can log into the instances (see the FAQ for how to do this) and > look at the scripts that actually ran (and their output) in the /tmp > directory. > > > Tom > >> >> Praveen >> >> >> Launched the cluster and I didn't see child jvm have 2G alloc >> -----Original Message----- >> From: ext Tom White >> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >> Sent: Monday, January 31, 2011 3:02 PM >> To: <mailto:whirr-user@incubator.apache.org> >> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >> Subject: Re: Running Mapred jobs after launching cluster >> >> Hi Praveen, >> >> I think removing the webserver dependency (or making it optional) >> would be a good goal, but we're not there yet. I've just created >> <https://issues.apache.org/jira/browse/WHIRR-225> >> https://issues.apache.org/jira/browse/WHIRR-225 as a place to discuss the >> design and implementation. >> >> In the meantime you could take a look at >> <https://issues.apache.org/jira/browse/WHIRR-55> >> https://issues.apache.org/jira/browse/WHIRR-55, and try using the patch >> there to override some Hadoop properties (you will need to upload the >> scripts to a webserver still however, until it is committed, since it >> modifies Hadoop's post-configure script). >> >> Hope this helps. >> >> Cheers, >> Tom >> >> BTW what are the security concerns you have? There are no credentials >> embedded in the scripts, so it should be safe to host them publicly, no? >> >> On Mon, Jan 31, 2011 at 11:00 AM, >> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >> wrote: >>> Hi Tom, >>> If the hadoop install is fixed, Whirr must be getting all default hadoop >>> properties from the hadoop install itself, correct? I sent an email about >>> configuring hadoop properties and you mentioned I need to put the modified >>> scripts on a webserver that is publicly accessible. I was wondering if >>> there is place inside hadoop install I can change so that I don't need to >>> put the scripts on webserver (for security reasons). Do you think it is >>> possible? If so, how? I do not mind customizing the jar file for our >>> purposes. I want to change the following properties: >>> >>> mapred.reduce.tasks=24 >>> mapred.map.tasks=64 >>> mapred.child.java.opts=-Xmx2048m >>> >>> Thanks in advance. >>> Praveen >>> >>> -----Original Message----- >>> From: ext Tom White >>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>> Sent: Friday, January 28, 2011 4:02 PM >>> To: <mailto:whirr-user@incubator.apache.org> >>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>> Subject: Re: Running Mapred jobs after launching cluster >>> >>> It is fixed, and currently on 0.20.2. It will be made configurable in >>> <https://issues.apache.org/jira/browse/WHIRR-222> >>> https://issues.apache.org/jira/browse/WHIRR-222. >>> >>> Cheers >>> Tom >>> >>> On Fri, Jan 28, 2011 at 12:56 PM, >>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>> wrote: >>>> Hi Tom, >>>> So the hadoop version is not going to change for a given Whirr install? I >>>> thought Whirr is getting hadoop install dynamically from a URL which is >>>> always going to have the latest hadoop version. If that is not the case I >>>> guess I am fine. I just don't want to get hadoop version mismatch 6 months >>>> after our software is released just because new hadoop version got >>>> released. >>>> >>>> Thanks >>>> Praveen >>>> >>>> -----Original Message----- >>>> From: ext Tom White >>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>> Sent: Friday, January 28, 2011 3:35 PM >>>> To: <mailto:whirr-user@incubator.apache.org> >>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>> Subject: Re: Running Mapred jobs after launching cluster >>>> >>>> On Fri, Jan 28, 2011 at 12:06 PM, >>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>> wrote: >>>>> Thanks Tom. I think I got it working with my own driver so I will go with >>>>> it for now (unless that proves to be a bad option). >>>>> >>>>> BTW, could you tell me how to stick with one hadoop version while >>>>> launching cluster. I have hadoop-0.20.2 in my classpath but it lookws >>>>> like Whirr gets the latest hadoop from the repository. Since the latest >>>>> version may be different depending on the time, I would like to stick to >>>>> one version so that hadoop version mismatch won't happen. >>>> >>>> You do need to make sure that the versions are the same. See the Hadoop >>>> integration tests, which specify the version of Hadoop to use in their POM. >>>> >>>>> >>>>> Also what jar files are necessary for launching cluster using Java. >>>>> Currently I have cli version of jar file but that's way too large since >>>>> it has ervrything in it. >>>> >>>> You need Whirr's core and Hadoop jars, as well as their dependencies. >>>> If you look at the POMs in the source code they will tell you the >>>> dependencies. >>>> >>>> Cheers >>>> Tom >>>> >>>>> >>>>> Thanks >>>>> Praveen >>>>> >>>>> -----Original Message----- >>>>> From: ext Tom White >>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>>> Sent: Friday, January 28, 2011 2:12 PM >>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>> >>>>> On Fri, Jan 28, 2011 at 6:28 AM, >>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>> wrote: >>>>>> Thanks Tom. Could you eloborate little more on the second option. >>>>>> >>>>>> What is the HADOOP_CONF_DIR here, after launching the cluster? >>>>> >>>>> ~/.whirr/<cluster-name> >>>>> >>>>>> When you said run in new process, did you mean using command line Whirr >>>>>> tool? >>>>> >>>>> I meant that you could launch Whirr using the CLI, or Java. Then run the >>>>> job in another process, with HADOOP_CONF_DIR set. >>>>> >>>>> The MR jobs you are running I assume can be run against an arbitrary >>>>> cluster, so you should be able to point them at a cluster started by >>>>> Whirr. >>>>> >>>>> Tom >>>>> >>>>>> >>>>>> I may finally end up writing my own driver for running external mapred >>>>>> jobs so I can have more control but I was just curious to know if option >>>>>> #2 is better than writing my own driver. >>>>>> >>>>>> Praveen >>>>>> >>>>>> -----Original Message----- >>>>>> From: ext Tom White >>>>>> [mailto:<mailto:t...@cloudera.com>t...@cloudera.com<mailto:t...@cloudera.com>] >>>>>> Sent: Thursday, January 27, 2011 4:01 PM >>>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>>> >>>>>> If they implement the Tool interface then you can set configuration on >>>>>> them. Failing that you could set HADOOP_CONF_DIR and run them in a new >>>>>> process. >>>>>> >>>>>> Cheers, >>>>>> Tom >>>>>> >>>>>> On Thu, Jan 27, 2011 at 12:52 PM, >>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>>> wrote: >>>>>>> Hmm... >>>>>>> I am running some of the map reduce jobs written by me but some of them >>>>>>> are in external libraries (eg. Mahout) which I don't have control over. >>>>>>> Since I can't modify the code in external libraries, is there any other >>>>>>> way to make this work? >>>>>>> >>>>>>> Praveen >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ext Tom White >>>>>>> [mailto:<mailto:tom.e.wh...@gmail.com>tom.e.wh...@gmail.com<mailto:tom.e.wh...@gmail.com>] >>>>>>> Sent: Thursday, January 27, 2011 3:42 PM >>>>>>> To: <mailto:whirr-user@incubator.apache.org> >>>>>>> whirr-user@incubator.apache.org<mailto:whirr-user@incubator.apache.org> >>>>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>>>> >>>>>>> You don't need to add anything to the classpath, but you need to use >>>>>>> the configuration in the org.apache.whirr.service.Cluster object to >>>>>>> populate your Hadoop Configuration object so that your code knows which >>>>>>> cluster to connect to. See the getConfiguration() method in >>>>>>> HadoopServiceController for how to do this. >>>>>>> >>>>>>> Cheers, >>>>>>> Tom >>>>>>> >>>>>>> On Thu, Jan 27, 2011 at 12:21 PM, >>>>>>> <<mailto:praveen.pe...@nokia.com>praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> >>>>>>> wrote: >>>>>>>> Hello all, >>>>>>>> I wrote a java class HadoopLanucher that is very similar to >>>>>>>> HadoopServiceController. I was succesfully able to launch a >>>>>>>> cluster programtically from my application using Whirr. Now I >>>>>>>> want to copy files to hdfs and also run a job progrmatically. >>>>>>>> >>>>>>>> When I copy a file to hdfs its copying to local file system, not hdfs. >>>>>>>> Here is the code I used: >>>>>>>> >>>>>>>> Configuration conf = new Configuration(); FileSystem hdfs = >>>>>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new >>>>>>>> Path(localFilePath), new Path(hdfsFileDirectory)); >>>>>>>> >>>>>>>> Do I need to add anything else to the classpath so Hadoop >>>>>>>> libraries know that it needs to talk to the dynamically lanuched >>>>>>>> cluster? >>>>>>>> When running Whirr from command line I know it uses >>>>>>>> HADOOP_CONF_DIR to find the hadoop config files but when doing >>>>>>>> the same from Java I am wondering how to solve this issue. >>>>>>>> >>>>>>>> Praveen >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- Andrei Savu -- andreisavu.ro<http://andreisavu.ro>