On Tue, Jan 11, 2011 at 7:24 PM, <praveen.pe...@nokia.com> wrote: > Another thing I noticed from whirr.log is below. Looks like its trying to > change ownership to hadoop user but hadoop user doesn't exist on hadoop > master. Am I missing anything?
No. I have been able to replicate this using the same version you are using. I believe that you have found a bug in Whirr 0.2.0. I suggest that you should use the trunk version, it's stable and it works fine using the same properties file - I have tested it on rackspacecloud. Whirr 0.3.0 should be ready for release in a few weeks. > > 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) << stderr > from runscript as r...@xx.xx.xx.xx > + DFS_DATA_DIR=/data/hadoop/hdfs/data > + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local > + MAX_MAP_TASKS=2 > + MAX_REDUCE_TASKS=1 > + CHILD_OPTS=-Xmx550m > + CHILD_ULIMIT=1126400 > + TMP_DIR='/data/tmp/hadoop-${user.name}' > + mkdir -p /data/hadoop > + chown hadoop:hadoop /data/hadoop > chown: invalid user: `hadoop:hadoop' > > Praveen > > -----Original Message----- > From: Peddi Praveen (Nokia-MS/Boston) > Sent: Tuesday, January 11, 2011 11:27 AM > To: whirr-user@incubator.apache.org; t...@cloudera.com > Cc: ham...@cloudera.com > Subject: RE: Dynamic creation and destroying hadoop on Rackspace > > whirr.hadoop-install-runurl=cloudera/cdh/install > whirr.hadoop-configure-runurl=cloudera/cdh/post-configure > > Does these two properties install all cloudera packages or just the hadoop > from cloudera dist? I am thinking something could be wrong here... > > Praveen > > -----Original Message----- > From: Peddi Praveen (Nokia-MS/Boston) > Sent: Monday, January 10, 2011 7:22 PM > To: t...@cloudera.com > Cc: ham...@cloudera.com; whirr-user@incubator.apache.org > Subject: RE: Dynamic creation and destroying hadoop on Rackspace > > Here are the properties. Please note that I tried w/o specifying > whirr.hadoop.install.runurl also and got the same problem. > > whirr.service-name=hadoop > whirr.cluster-name=relevancycluster > whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers > whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password> > #whirr.private-key-file=/home/hadoop/.ssh/id_rsa > #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub > > # Uncomment out these lines to run CDH > whirr.hadoop-install-runurl=cloudera/cdh/install > whirr.hadoop-configure-runurl=cloudera/cdh/post-configure > > # The size of the instance to use. See > http://www.rackspacecloud.com/cloud_hosting_products/serv$ > # id 3: 1GB, 1 virtual core > # id 4: 2GB, 2 virtual cores > # id 5: 4GB, 2 virtual cores > # id 6: 8GB, 4 virtual cores > # id 7: 15.5GB, 4 virtual cores > whirr.hardware-id=4 > # Ubuntu 10.04 LTS Lucid > whirr.image-id=49 > > ________________________________________ > From: ext Tom White [...@cloudera.com] > Sent: Monday, January 10, 2011 7:03 PM > To: Peddi Praveen (Nokia-MS/Boston) > Cc: ham...@cloudera.com; whirr-user@incubator.apache.org > Subject: Re: Dynamic creation and destroying hadoop on Rackspace > > Can you post your Whirr properties file please (with credentials removed). > > Thanks > Tom > > On Mon, Jan 10, 2011 at 3:59 PM, <praveen.pe...@nokia.com> wrote: >> I am using the latest Whirr. For hadoop, I actually specified cloudera URL >> in the properties file but on master hadoop machine I saw references to >> hadoop-0.20. OS of my client is CentOS but I am going with default OS for >> hadoop which is Ubuntu 10.04. >> >> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <t...@cloudera.com> wrote: >> >>> On Mon, Jan 10, 2011 at 2:22 PM, <praveen.pe...@nokia.com> wrote: >>>> Looks like hadoop was installed but never started on the master node. >>>> There were no files under /var/log/hadoop on master node either. >>>> >>>> r...@hadoop-master:~# netstat -a | grep 50030 returns nothing >>>> >>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I >>>> try to start Hadoop manually from hadoop master, I see following: >>>> >>>> -------------------------------- >>>> r...@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh >>>> starting namenode, logging to >>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10 >>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please >>>> specify HADOOP_NAMENODE_USER >>> >>> That's the problem. Which version of Whirr, Hadoop, OS are you using? >>> >>> Tom >>> >>>> The authenticity of host 'localhost (127.0.0.1)' can't be established. >>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d. >>>> Are you sure you want to continue connecting (yes/no)? yes >>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of >>>> known hosts. >>>> localhost: starting datanode, logging to >>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10 >>>> 6-96-62.static.cloud-ips.com.out >>>> localhost: May not run daemons as root. Please specify >>>> HADOOP_DATANODE_USER >>>> localhost: starting secondarynamenode, logging to >>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno >>>> de-184-106-96-62.static.cloud-ips.com.out >>>> localhost: May not run daemons as root. Please specify >>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to >>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184- >>>> 106-96-62.static.cloud-ips.com.out >>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER >>>> localhost: starting tasktracker, logging to >>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184 >>>> -106-96-62.static.cloud-ips.com.out >>>> localhost: May not run daemons as root. Please specify >>>> HADOOP_TASKTRACKER_USER >>>> -------------------------------- >>>> >>>> Praveen >>>> -----Original Message----- >>>> From: ext Tom White [mailto:t...@cloudera.com] >>>> Sent: Monday, January 10, 2011 5:08 PM >>>> To: Peddi Praveen (Nokia-MS/Boston) >>>> Cc: ham...@cloudera.com; whirr-user@incubator.apache.org >>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace >>>> >>>> Can you connect to the jobtracker UI? It's running on the master, port >>>> 50030. You can also ssh into the machine and look at the logs under >>>> /var/log/hadoop to see if there are any errors. >>>> >>>> Tom >>>> >>>> On Mon, Jan 10, 2011 at 12:33 PM, <praveen.pe...@nokia.com> wrote: >>>>> Hi Tom, >>>>> Thank you very much for your response. We were able to figure out how to >>>>> launch and destroy the cluster using the command line tool. We haven't >>>>> tried Java client yet (we will do it soon). But with command line tool, >>>>> we could not access hadoop fs and any of the hadoop command. We also ran >>>>> the proxy script. Here is the error I am getting. My client node is not >>>>> able to talk to hadoo master node. We tried as hadoop user and root but >>>>> no luck. Do you think we are missing anything? >>>>> >>>>> [r...@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs >>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED: >>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is >>>>> deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml >>>>> to override properties of core-default.xml, mapred-default.xml and >>>>> hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying >>>>> connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 >>>>> time(s). >>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 >>>>> time(s). >>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 >>>>> time(s). >>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 >>>>> time(s). >>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 >>>>> time(s). >>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 >>>>> time(s). >>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 >>>>> time(s). >>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 >>>>> time(s). >>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 >>>>> time(s). >>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: >>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 >>>>> time(s). >>>>> >>>>> I should say Whirr is cool so far! >>>>> >>>>> Thanks again >>>>> Praveen >>>>> >>>>> -----Original Message----- >>>>> From: ext Tom White [mailto:t...@cloudera.com] >>>>> Sent: Monday, January 10, 2011 2:23 PM >>>>> To: Peddi Praveen (Nokia-MS/Boston) >>>>> Cc: whirr-user@incubator.apache.org; ham...@cloudera.com >>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace >>>>> >>>>> Hi Praveen, >>>>> >>>>> You should be able to do exactly this using Whirr. There's not a lot of >>>>> documentation to describe what you want to do, but I recommend you start >>>>> by having a look at http://incubator.apache.org/whirr/. The Hadoop unit >>>>> tests will show you how to start and stop a cluster from Java and submit >>>>> a job. E.g. >>>>> >>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado >>>>> op/ >>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer >>>>> vic >>>>> eController.java >>>>> >>>>> Finally, check out the recipes for advice on setting configuration >>>>> for >>>>> Rackspace: >>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties. >>>>> >>>>> Thanks, >>>>> Tom >>>>> >>>>> On Mon, Jan 10, 2011 at 10:27 AM, <praveen.pe...@nokia.com> wrote: >>>>>> Hello all, >>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud. >>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have >>>>>> manually installed and configrued Hadoop on Rackspace which is a >>>>>> laborious process (especially given that we have about 10 >>>>>> environments that we need to configure). So my question is about >>>>>> automatic creation and desrtoying of Hadoop cluster using a program >>>>>> (preferably Java). >>>>>> Here is my current deployment. >>>>>> >>>>>> Glassfish (Node 1) >>>>>> Mysql (Node 2) >>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8) >>>>>> >>>>>> We can install Glassfish and MySql manually but we would like to >>>>>> dynamically create/install hadoop cluster, start the servers, run >>>>>> jobs and then destroy cluster on the cloud. Primary purpose of >>>>>> doing this is to make deployment easy and save costs. Since the >>>>>> jobs are run only for few hours a day we don't want to have Hadoop >>>>>> running on the cloud for the whole day. >>>>>> >>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and >>>>>> he was positive that I can do the above steps using Whirr. Has >>>>>> anyone done this using Whirr on Rackspace. I could not find any >>>>>> examples on how to dynamically install Hadoop cluster on >>>>>> Rackspace. Any information on this task would be greatly appreciated. >>>>>> >>>>>> Thanks >>>>>> Praveen >>>>>> >>>>>> >>>>>> >>>>> >>>> >> > -- Andrei Savu -- andreisavu.ro