Re: Dynamic creation and destroying hadoop on Rackspace

Andrei Savu Tue, 11 Jan 2011 11:20:29 -0800

On Tue, Jan 11, 2011 at 7:24 PM,  <praveen.pe...@nokia.com> wrote:
> Another thing I noticed from whirr.log is below. Looks like its trying to 
> change ownership to hadoop user but hadoop user doesn't exist on hadoop 
> master. Am I missing anything?


No. I have been able to replicate this using the same version you are
using. I believe that you have found a bug in Whirr 0.2.0.

I suggest that you should use the trunk version, it's stable and it
works fine using the same properties file - I have tested it on
rackspacecloud.

Whirr 0.3.0 should be ready for release in a few weeks.

>
> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) << stderr 
> from runscript as r...@xx.xx.xx.xx
> + DFS_DATA_DIR=/data/hadoop/hdfs/data
> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
> + MAX_MAP_TASKS=2
> + MAX_REDUCE_TASKS=1
> + CHILD_OPTS=-Xmx550m
> + CHILD_ULIMIT=1126400
> + TMP_DIR='/data/tmp/hadoop-${user.name}'
> + mkdir -p /data/hadoop
> + chown hadoop:hadoop /data/hadoop
> chown: invalid user: `hadoop:hadoop'
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Tuesday, January 11, 2011 11:27 AM
> To: whirr-user@incubator.apache.org; t...@cloudera.com
> Cc: ham...@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> Does these two properties install all cloudera packages or just the hadoop 
> from cloudera dist? I am thinking something could be wrong here...
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Monday, January 10, 2011 7:22 PM
> To: t...@cloudera.com
> Cc: ham...@cloudera.com; whirr-user@incubator.apache.org
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Here are the properties. Please note that I tried w/o specifying 
> whirr.hadoop.install.runurl also and got the same problem.
>
> whirr.service-name=hadoop
> whirr.cluster-name=relevancycluster
> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers 
> whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>
> # Uncomment out these lines to run CDH
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> # The size of the instance to use. See 
> http://www.rackspacecloud.com/cloud_hosting_products/serv$
> # id 3: 1GB, 1 virtual core
> # id 4: 2GB, 2 virtual cores
> # id 5: 4GB, 2 virtual cores
> # id 6: 8GB, 4 virtual cores
> # id 7: 15.5GB, 4 virtual cores
> whirr.hardware-id=4
> # Ubuntu 10.04 LTS Lucid
> whirr.image-id=49
>
> ________________________________________
> From: ext Tom White [...@cloudera.com]
> Sent: Monday, January 10, 2011 7:03 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: ham...@cloudera.com; whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Can you post your Whirr properties file please (with credentials removed).
>
> Thanks
> Tom
>
> On Mon, Jan 10, 2011 at 3:59 PM,  <praveen.pe...@nokia.com> wrote:
>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL 
>> in the properties file but on master hadoop machine I saw references to 
>> hadoop-0.20. OS of my client is CentOS but I am going with default OS for 
>> hadoop which is Ubuntu 10.04.
>>
>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <t...@cloudera.com> wrote:
>>
>>> On Mon, Jan 10, 2011 at 2:22 PM,  <praveen.pe...@nokia.com> wrote:
>>>> Looks like hadoop was installed but never started on the master node. 
>>>> There were no files under /var/log/hadoop on master node either.
>>>>
>>>> r...@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>
>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I 
>>>> try to start Hadoop manually from hadoop master, I see following:
>>>>
>>>> --------------------------------
>>>> r...@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>> starting namenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10
>>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please
>>>> specify HADOOP_NAMENODE_USER
>>>
>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>
>>> Tom
>>>
>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of 
>>>> known hosts.
>>>> localhost: starting datanode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10
>>>> 6-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_DATANODE_USER
>>>> localhost: starting secondarynamenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno
>>>> de-184-106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-
>>>> 106-96-62.static.cloud-ips.com.out
>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>> localhost: starting tasktracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184
>>>> -106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_TASKTRACKER_USER
>>>> --------------------------------
>>>>
>>>> Praveen
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:t...@cloudera.com]
>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: ham...@cloudera.com; whirr-user@incubator.apache.org
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Can you connect to the jobtracker UI? It's running on the master, port 
>>>> 50030. You can also ssh into the machine and look at the logs under 
>>>> /var/log/hadoop to see if there are any errors.
>>>>
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <praveen.pe...@nokia.com> wrote:
>>>>> Hi Tom,
>>>>> Thank you very much for your response. We were able to figure out how to 
>>>>> launch and destroy the cluster using the command line tool. We haven't 
>>>>> tried Java client yet (we will do it soon). But with command line tool, 
>>>>> we could not access hadoop fs and any of the hadoop command. We also ran 
>>>>> the proxy script. Here is the error I am getting. My client node is not 
>>>>> able to talk to hadoo master node. We tried as hadoop user and root but 
>>>>> no luck. Do you think we are missing anything?
>>>>>
>>>>> [r...@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is 
>>>>> deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml 
>>>>> to override properties of core-default.xml, mapred-default.xml and 
>>>>> hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying 
>>>>> connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 
>>>>> time(s).
>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 
>>>>> time(s).
>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 
>>>>> time(s).
>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 
>>>>> time(s).
>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 
>>>>> time(s).
>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 
>>>>> time(s).
>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 
>>>>> time(s).
>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 
>>>>> time(s).
>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 
>>>>> time(s).
>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 
>>>>> 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 
>>>>> time(s).
>>>>>
>>>>> I should say Whirr is cool so far!
>>>>>
>>>>> Thanks again
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:t...@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: whirr-user@incubator.apache.org; ham...@cloudera.com
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Hi Praveen,
>>>>>
>>>>> You should be able to do exactly this using Whirr. There's not a lot of 
>>>>> documentation to describe what you want to do, but I recommend you start 
>>>>> by having a look at http://incubator.apache.org/whirr/. The Hadoop unit 
>>>>> tests will show you how to start and stop a cluster from Java and submit 
>>>>> a job. E.g.
>>>>>
>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado
>>>>> op/
>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer
>>>>> vic
>>>>> eController.java
>>>>>
>>>>> Finally, check out the recipes for advice on setting configuration
>>>>> for
>>>>> Rackspace: 
>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <praveen.pe...@nokia.com> wrote:
>>>>>> Hello all,
>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>> laborious process (especially given that we have about 10
>>>>>> environments that we need to configure). So my question is about
>>>>>> automatic creation and desrtoying of Hadoop cluster using a program 
>>>>>> (preferably Java).
>>>>>> Here is my current deployment.
>>>>>>
>>>>>> Glassfish (Node 1)
>>>>>> Mysql (Node 2)
>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>
>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>>> jobs and then destroy cluster on the cloud. Primary purpose of
>>>>>> doing this is to make deployment easy and save costs. Since the
>>>>>> jobs are run only for few hours a day we don't want to have Hadoop 
>>>>>> running on the cloud for the whole day.
>>>>>>
>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and
>>>>>> he was positive that I can do the above steps using Whirr. Has
>>>>>> anyone done this using Whirr on Rackspace. I could not find any
>>>>>> examples on how to dynamically install Hadoop cluster on
>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>



-- 
Andrei Savu -- andreisavu.ro

Re: Dynamic creation and destroying hadoop on Rackspace

Reply via email to