Here are the two relevant lines from the install scripts: HADOOP_VERSION=${HADOOP_VERSION:-0.20.2} HADOOP_HOME=/usr/local/hadoop-$HADOOP_VERSION HADOOP_CONF_DIR=$HADOOP_HOME/conf
Have you tried changing CHIL_OPTS in apache/hadoop/post-configure and use that custom script to deploy a cluster? I don't have a running cluster to check this now. On Thu, Feb 3, 2011 at 7:18 PM, <praveen.pe...@nokia.com> wrote: > Hi Tom/all > Where are the hadoop config files stored on the cluster nodes? I would like > to debug this issue since I need to give more memory for child java mapred > processes to process huge chunks of data. > > Thanks > Praveen > -----Original Message----- > From: ext praveen.pe...@nokia.com [mailto:praveen.pe...@nokia.com] > Sent: Wednesday, February 02, 2011 5:23 PM > To: whirr-user@incubator.apache.org > Subject: RE: Running Mapred jobs after launching cluster > > Can anyone think of a reason why the below property is not honoured when I > overwrote this along with other properties in the post_configure. Other > properties are correctly overwritten except this one. I need to set the > mapred tasks jvm to bigger than 200m. > > Praveen > ________________________________________ > From: Peddi Praveen (Nokia-MS/Boston) > Sent: Tuesday, February 01, 2011 11:21 AM > To: whirr-user@incubator.apache.org > Subject: RE: Running Mapred jobs after launching cluster > > Thanks Tom. Silly me I should have thought of the property name. It works > now except one issue: I ran the wordcount example and I saw that no. of map > and reduce tasks are as I configured in post_configure script but for some > reason the below property in job.xml is always -Xmx200m and I set it to > -Xmx1700m. Not sure if this property is any special. > > mapred.child.java.opts -Xmx200m > > Praveen > ________________________________________ > From: ext Tom White [tom.e.wh...@gmail.com] > Sent: Tuesday, February 01, 2011 12:13 AM > To: whirr-user@incubator.apache.org > Subject: Re: Running Mapred jobs after launching cluster > > Try setting whirr.run-url-base, not run-url-base. > > Tom > > On Mon, Jan 31, 2011 at 5:33 PM, <praveen.pe...@nokia.com> wrote: > > I am not using cdh (for now anyway) but the default hadoop. I even > changed the "localhost" to ipaddress and still no luck. It likely that I am > doing something wrong but having hard time debugging. > > Here are the properties I changed in > /var/www/apache/hadoop/post-configure but when I run the job I am not > seeing these values. > > MAX_MAP_TASKS=16 > > MAX_REDUCE_TASKS=24 > > CHILD_OPTS=-Xmx1700m > > > > Here is what I see in /tmp/runscript/runscript.sh of master node. It > doesn't look like it used my scripts... > > > > installRunUrl || exit 1 > > runurl > > http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/util/configure > > -hostnames -c cloudservers runurl > > http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/sun/java/insta > > ll runurl > > http://whirr.s3.amazonaws.com/0.3.0-incubating-SNAPSHOT/apache/hadoop/ > > install -c cloudservers > > > > Any suggestions? > > Praveen > > ________________________________________ > > From: ext Tom White [tom.e.wh...@gmail.com] > > Sent: Monday, January 31, 2011 6:23 PM > > To: whirr-user@incubator.apache.org > > Subject: Re: Running Mapred jobs after launching cluster > > > > On Mon, Jan 31, 2011 at 3:03 PM, <praveen.pe...@nokia.com> wrote: > >> If I anyway have to upload the files to webservers, do I still need the > patch then? It looks like the script has these properties that I can > overwrite. > > > > I suggested you look at the patch (WHIRR-55) so you can see how it > > will be possible once it's committed. To try it out you need to upload > > the scripts to a webserver (since the patch changes one of them). > > > >> > >> BTW I tried with webserver path and I could not make it work so far. > >> > >> 1. I copied scripts/apache folder to my /var/www folder and modified > below 3 properties in /var/www/apache/hadoop/post-configure. > >> 2. I changed hadoop.properties added following line > >> run-url-base=http://localhost/ 3. Launched the cluster and > >> verified the job properties are not what I changed to. They are all > defaults. > > > > This looks right to me. If you are using CDH you need to change > > cloudera/cdh/post-configure. > > > >> > >> How do I debug this issue? > > > > You can log into the instances (see the FAQ for how to do this) and > > look at the scripts that actually ran (and their output) in the /tmp > > directory. > > > > > > Tom > > > >> > >> Praveen > >> > >> > >> Launched the cluster and I didn't see child jvm have 2G alloc > >> -----Original Message----- > >> From: ext Tom White [mailto:tom.e.wh...@gmail.com] > >> Sent: Monday, January 31, 2011 3:02 PM > >> To: whirr-user@incubator.apache.org > >> Subject: Re: Running Mapred jobs after launching cluster > >> > >> Hi Praveen, > >> > >> I think removing the webserver dependency (or making it optional) > >> would be a good goal, but we're not there yet. I've just created > >> https://issues.apache.org/jira/browse/WHIRR-225 as a place to discuss > the design and implementation. > >> > >> In the meantime you could take a look at > https://issues.apache.org/jira/browse/WHIRR-55, and try using the patch > there to override some Hadoop properties (you will need to upload the > scripts to a webserver still however, until it is committed, since it > modifies Hadoop's post-configure script). > >> > >> Hope this helps. > >> > >> Cheers, > >> Tom > >> > >> BTW what are the security concerns you have? There are no credentials > embedded in the scripts, so it should be safe to host them publicly, no? > >> > >> On Mon, Jan 31, 2011 at 11:00 AM, <praveen.pe...@nokia.com> wrote: > >>> Hi Tom, > >>> If the hadoop install is fixed, Whirr must be getting all default > hadoop properties from the hadoop install itself, correct? I sent an email > about configuring hadoop properties and you mentioned I need to put the > modified scripts on a webserver that is publicly accessible. I was wondering > if there is place inside hadoop install I can change so that I don't need to > put the scripts on webserver (for security reasons). Do you think it is > possible? If so, how? I do not mind customizing the jar file for our > purposes. I want to change the following properties: > >>> > >>> mapred.reduce.tasks=24 > >>> mapred.map.tasks=64 > >>> mapred.child.java.opts=-Xmx2048m > >>> > >>> Thanks in advance. > >>> Praveen > >>> > >>> -----Original Message----- > >>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] > >>> Sent: Friday, January 28, 2011 4:02 PM > >>> To: whirr-user@incubator.apache.org > >>> Subject: Re: Running Mapred jobs after launching cluster > >>> > >>> It is fixed, and currently on 0.20.2. It will be made configurable in > https://issues.apache.org/jira/browse/WHIRR-222. > >>> > >>> Cheers > >>> Tom > >>> > >>> On Fri, Jan 28, 2011 at 12:56 PM, <praveen.pe...@nokia.com> wrote: > >>>> Hi Tom, > >>>> So the hadoop version is not going to change for a given Whirr > install? I thought Whirr is getting hadoop install dynamically from a URL > which is always going to have the latest hadoop version. If that is not the > case I guess I am fine. I just don't want to get hadoop version mismatch 6 > months after our software is released just because new hadoop version got > released. > >>>> > >>>> Thanks > >>>> Praveen > >>>> > >>>> -----Original Message----- > >>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] > >>>> Sent: Friday, January 28, 2011 3:35 PM > >>>> To: whirr-user@incubator.apache.org > >>>> Subject: Re: Running Mapred jobs after launching cluster > >>>> > >>>> On Fri, Jan 28, 2011 at 12:06 PM, <praveen.pe...@nokia.com> wrote: > >>>>> Thanks Tom. I think I got it working with my own driver so I will go > with it for now (unless that proves to be a bad option). > >>>>> > >>>>> BTW, could you tell me how to stick with one hadoop version while > launching cluster. I have hadoop-0.20.2 in my classpath but it lookws like > Whirr gets the latest hadoop from the repository. Since the latest version > may be different depending on the time, I would like to stick to one version > so that hadoop version mismatch won't happen. > >>>> > >>>> You do need to make sure that the versions are the same. See the > Hadoop integration tests, which specify the version of Hadoop to use in > their POM. > >>>> > >>>>> > >>>>> Also what jar files are necessary for launching cluster using Java. > Currently I have cli version of jar file but that's way too large since it > has ervrything in it. > >>>> > >>>> You need Whirr's core and Hadoop jars, as well as their dependencies. > >>>> If you look at the POMs in the source code they will tell you the > dependencies. > >>>> > >>>> Cheers > >>>> Tom > >>>> > >>>>> > >>>>> Thanks > >>>>> Praveen > >>>>> > >>>>> -----Original Message----- > >>>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] > >>>>> Sent: Friday, January 28, 2011 2:12 PM > >>>>> To: whirr-user@incubator.apache.org > >>>>> Subject: Re: Running Mapred jobs after launching cluster > >>>>> > >>>>> On Fri, Jan 28, 2011 at 6:28 AM, <praveen.pe...@nokia.com> wrote: > >>>>>> Thanks Tom. Could you eloborate little more on the second option. > >>>>>> > >>>>>> What is the HADOOP_CONF_DIR here, after launching the cluster? > >>>>> > >>>>> ~/.whirr/<cluster-name> > >>>>> > >>>>>> When you said run in new process, did you mean using command line > Whirr tool? > >>>>> > >>>>> I meant that you could launch Whirr using the CLI, or Java. Then run > the job in another process, with HADOOP_CONF_DIR set. > >>>>> > >>>>> The MR jobs you are running I assume can be run against an arbitrary > cluster, so you should be able to point them at a cluster started by Whirr. > >>>>> > >>>>> Tom > >>>>> > >>>>>> > >>>>>> I may finally end up writing my own driver for running external > mapred jobs so I can have more control but I was just curious to know if > option #2 is better than writing my own driver. > >>>>>> > >>>>>> Praveen > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: ext Tom White [mailto:t...@cloudera.com] > >>>>>> Sent: Thursday, January 27, 2011 4:01 PM > >>>>>> To: whirr-user@incubator.apache.org > >>>>>> Subject: Re: Running Mapred jobs after launching cluster > >>>>>> > >>>>>> If they implement the Tool interface then you can set configuration > on them. Failing that you could set HADOOP_CONF_DIR and run them in a new > process. > >>>>>> > >>>>>> Cheers, > >>>>>> Tom > >>>>>> > >>>>>> On Thu, Jan 27, 2011 at 12:52 PM, <praveen.pe...@nokia.com> wrote: > >>>>>>> Hmm... > >>>>>>> I am running some of the map reduce jobs written by me but some of > them are in external libraries (eg. Mahout) which I don't have control over. > Since I can't modify the code in external libraries, is there any other way > to make this work? > >>>>>>> > >>>>>>> Praveen > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] > >>>>>>> Sent: Thursday, January 27, 2011 3:42 PM > >>>>>>> To: whirr-user@incubator.apache.org > >>>>>>> Subject: Re: Running Mapred jobs after launching cluster > >>>>>>> > >>>>>>> You don't need to add anything to the classpath, but you need to > use the configuration in the org.apache.whirr.service.Cluster object to > populate your Hadoop Configuration object so that your code knows which > cluster to connect to. See the getConfiguration() method in > HadoopServiceController for how to do this. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Tom > >>>>>>> > >>>>>>> On Thu, Jan 27, 2011 at 12:21 PM, <praveen.pe...@nokia.com> > wrote: > >>>>>>>> Hello all, > >>>>>>>> I wrote a java class HadoopLanucher that is very similar to > >>>>>>>> HadoopServiceController. I was succesfully able to launch a > >>>>>>>> cluster programtically from my application using Whirr. Now I > >>>>>>>> want to copy files to hdfs and also run a job progrmatically. > >>>>>>>> > >>>>>>>> When I copy a file to hdfs its copying to local file system, not > hdfs. > >>>>>>>> Here is the code I used: > >>>>>>>> > >>>>>>>> Configuration conf = new Configuration(); FileSystem hdfs = > >>>>>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new > >>>>>>>> Path(localFilePath), new Path(hdfsFileDirectory)); > >>>>>>>> > >>>>>>>> Do I need to add anything else to the classpath so Hadoop > >>>>>>>> libraries know that it needs to talk to the dynamically lanuched > cluster? > >>>>>>>> When running Whirr from command line I know it uses > >>>>>>>> HADOOP_CONF_DIR to find the hadoop config files but when doing > >>>>>>>> the same from Java I am wondering how to solve this issue. > >>>>>>>> > >>>>>>>> Praveen > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > -- Andrei Savu -- andreisavu.ro