On Mon, Jan 31, 2011 at 3:03 PM, <praveen.pe...@nokia.com> wrote: > If I anyway have to upload the files to webservers, do I still need the patch > then? It looks like the script has these properties that I can overwrite.
I suggested you look at the patch (WHIRR-55) so you can see how it will be possible once it's committed. To try it out you need to upload the scripts to a webserver (since the patch changes one of them). > > BTW I tried with webserver path and I could not make it work so far. > > 1. I copied scripts/apache folder to my /var/www folder and modified below 3 > properties in /var/www/apache/hadoop/post-configure. > 2. I changed hadoop.properties added following line > run-url-base=http://localhost/ > 3. Launched the cluster and verified the job properties are not what I > changed to. They are all defaults. This looks right to me. If you are using CDH you need to change cloudera/cdh/post-configure. > > How do I debug this issue? You can log into the instances (see the FAQ for how to do this) and look at the scripts that actually ran (and their output) in the /tmp directory. Tom > > Praveen > > > Launched the cluster and I didn't see child jvm have 2G alloc > -----Original Message----- > From: ext Tom White [mailto:tom.e.wh...@gmail.com] > Sent: Monday, January 31, 2011 3:02 PM > To: whirr-user@incubator.apache.org > Subject: Re: Running Mapred jobs after launching cluster > > Hi Praveen, > > I think removing the webserver dependency (or making it optional) would be a > good goal, but we're not there yet. I've just created > https://issues.apache.org/jira/browse/WHIRR-225 as a place to discuss the > design and implementation. > > In the meantime you could take a look at > https://issues.apache.org/jira/browse/WHIRR-55, and try using the patch there > to override some Hadoop properties (you will need to upload the scripts to a > webserver still however, until it is committed, since it modifies Hadoop's > post-configure script). > > Hope this helps. > > Cheers, > Tom > > BTW what are the security concerns you have? There are no credentials > embedded in the scripts, so it should be safe to host them publicly, no? > > On Mon, Jan 31, 2011 at 11:00 AM, <praveen.pe...@nokia.com> wrote: >> Hi Tom, >> If the hadoop install is fixed, Whirr must be getting all default hadoop >> properties from the hadoop install itself, correct? I sent an email about >> configuring hadoop properties and you mentioned I need to put the modified >> scripts on a webserver that is publicly accessible. I was wondering if there >> is place inside hadoop install I can change so that I don't need to put the >> scripts on webserver (for security reasons). Do you think it is possible? If >> so, how? I do not mind customizing the jar file for our purposes. I want to >> change the following properties: >> >> mapred.reduce.tasks=24 >> mapred.map.tasks=64 >> mapred.child.java.opts=-Xmx2048m >> >> Thanks in advance. >> Praveen >> >> -----Original Message----- >> From: ext Tom White [mailto:tom.e.wh...@gmail.com] >> Sent: Friday, January 28, 2011 4:02 PM >> To: whirr-user@incubator.apache.org >> Subject: Re: Running Mapred jobs after launching cluster >> >> It is fixed, and currently on 0.20.2. It will be made configurable in >> https://issues.apache.org/jira/browse/WHIRR-222. >> >> Cheers >> Tom >> >> On Fri, Jan 28, 2011 at 12:56 PM, <praveen.pe...@nokia.com> wrote: >>> Hi Tom, >>> So the hadoop version is not going to change for a given Whirr install? I >>> thought Whirr is getting hadoop install dynamically from a URL which is >>> always going to have the latest hadoop version. If that is not the case I >>> guess I am fine. I just don't want to get hadoop version mismatch 6 months >>> after our software is released just because new hadoop version got released. >>> >>> Thanks >>> Praveen >>> >>> -----Original Message----- >>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] >>> Sent: Friday, January 28, 2011 3:35 PM >>> To: whirr-user@incubator.apache.org >>> Subject: Re: Running Mapred jobs after launching cluster >>> >>> On Fri, Jan 28, 2011 at 12:06 PM, <praveen.pe...@nokia.com> wrote: >>>> Thanks Tom. I think I got it working with my own driver so I will go with >>>> it for now (unless that proves to be a bad option). >>>> >>>> BTW, could you tell me how to stick with one hadoop version while >>>> launching cluster. I have hadoop-0.20.2 in my classpath but it lookws like >>>> Whirr gets the latest hadoop from the repository. Since the latest version >>>> may be different depending on the time, I would like to stick to one >>>> version so that hadoop version mismatch won't happen. >>> >>> You do need to make sure that the versions are the same. See the Hadoop >>> integration tests, which specify the version of Hadoop to use in their POM. >>> >>>> >>>> Also what jar files are necessary for launching cluster using Java. >>>> Currently I have cli version of jar file but that's way too large since it >>>> has ervrything in it. >>> >>> You need Whirr's core and Hadoop jars, as well as their dependencies. >>> If you look at the POMs in the source code they will tell you the >>> dependencies. >>> >>> Cheers >>> Tom >>> >>>> >>>> Thanks >>>> Praveen >>>> >>>> -----Original Message----- >>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] >>>> Sent: Friday, January 28, 2011 2:12 PM >>>> To: whirr-user@incubator.apache.org >>>> Subject: Re: Running Mapred jobs after launching cluster >>>> >>>> On Fri, Jan 28, 2011 at 6:28 AM, <praveen.pe...@nokia.com> wrote: >>>>> Thanks Tom. Could you eloborate little more on the second option. >>>>> >>>>> What is the HADOOP_CONF_DIR here, after launching the cluster? >>>> >>>> ~/.whirr/<cluster-name> >>>> >>>>> When you said run in new process, did you mean using command line Whirr >>>>> tool? >>>> >>>> I meant that you could launch Whirr using the CLI, or Java. Then run the >>>> job in another process, with HADOOP_CONF_DIR set. >>>> >>>> The MR jobs you are running I assume can be run against an arbitrary >>>> cluster, so you should be able to point them at a cluster started by Whirr. >>>> >>>> Tom >>>> >>>>> >>>>> I may finally end up writing my own driver for running external mapred >>>>> jobs so I can have more control but I was just curious to know if option >>>>> #2 is better than writing my own driver. >>>>> >>>>> Praveen >>>>> >>>>> -----Original Message----- >>>>> From: ext Tom White [mailto:t...@cloudera.com] >>>>> Sent: Thursday, January 27, 2011 4:01 PM >>>>> To: whirr-user@incubator.apache.org >>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>> >>>>> If they implement the Tool interface then you can set configuration on >>>>> them. Failing that you could set HADOOP_CONF_DIR and run them in a new >>>>> process. >>>>> >>>>> Cheers, >>>>> Tom >>>>> >>>>> On Thu, Jan 27, 2011 at 12:52 PM, <praveen.pe...@nokia.com> wrote: >>>>>> Hmm... >>>>>> I am running some of the map reduce jobs written by me but some of them >>>>>> are in external libraries (eg. Mahout) which I don't have control over. >>>>>> Since I can't modify the code in external libraries, is there any other >>>>>> way to make this work? >>>>>> >>>>>> Praveen >>>>>> >>>>>> -----Original Message----- >>>>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com] >>>>>> Sent: Thursday, January 27, 2011 3:42 PM >>>>>> To: whirr-user@incubator.apache.org >>>>>> Subject: Re: Running Mapred jobs after launching cluster >>>>>> >>>>>> You don't need to add anything to the classpath, but you need to use the >>>>>> configuration in the org.apache.whirr.service.Cluster object to populate >>>>>> your Hadoop Configuration object so that your code knows which cluster >>>>>> to connect to. See the getConfiguration() method in >>>>>> HadoopServiceController for how to do this. >>>>>> >>>>>> Cheers, >>>>>> Tom >>>>>> >>>>>> On Thu, Jan 27, 2011 at 12:21 PM, <praveen.pe...@nokia.com> wrote: >>>>>>> Hello all, >>>>>>> I wrote a java class HadoopLanucher that is very similar to >>>>>>> HadoopServiceController. I was succesfully able to launch a >>>>>>> cluster programtically from my application using Whirr. Now I >>>>>>> want to copy files to hdfs and also run a job progrmatically. >>>>>>> >>>>>>> When I copy a file to hdfs its copying to local file system, not hdfs. >>>>>>> Here is the code I used: >>>>>>> >>>>>>> Configuration conf = new Configuration(); FileSystem hdfs = >>>>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new >>>>>>> Path(localFilePath), new Path(hdfsFileDirectory)); >>>>>>> >>>>>>> Do I need to add anything else to the classpath so Hadoop >>>>>>> libraries know that it needs to talk to the dynamically lanuched >>>>>>> cluster? >>>>>>> When running Whirr from command line I know it uses >>>>>>> HADOOP_CONF_DIR to find the hadoop config files but when doing >>>>>>> the same from Java I am wondering how to solve this issue. >>>>>>> >>>>>>> Praveen >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >