Re: aws 64-bit c1.xlarge problems

Benjamin Clark Thu, 17 Mar 2011 22:16:32 -0700

Andrei,

The release candidate code does work.  Perhaps something is different, relative 
to the patched frankenstein I was using, perhaps I had some local corruption or 
config problem.


It sets up everything as whatever my local user is, by default, and the 
override as whirr.cluster-user works as well.

In any case, at the rate AWS seems to be changing the configuration of 'amazon 
linux' perhaps it's less useful than I thought.  Last week the default amis in 
the console had a bunch of spare disk space on the /media/ephemeral0 partition, 
which I could symlink /mnt to in the install_cdh_hadoop.sh script, and then 
hdfs would have a decent amount of space.  Now there is no such thing, so I 
suppose I would have to launch an ebs volume per node and mount that.  This is 
now tipping over into the "too much trouble" zone for me.  And in the mean time 
I got all my native stuff (hadoop-lzo and R/Rhipe) working on ubuntu, so I 
think I'm going to use the Alestic image from the recipe for a while.  If 
there's an obvious candidate up there for "reasonably-modern redhat derivative 
ami from a source on the good lists that behaves well," I'd like to know what 
it is.  By 'reasonably modern' I mean having default python >= 2.5.

I liked the old custom of having /mnt be a separate partition of a decent size. 
 I hope this is just a glitch with AWS.  I suspect it may be because 
jclouds/whirr is showing (e.g.) in the output:
volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, 
isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, 
durable=false, isBootDevice=false]
So theoretically the disk space is still there on those non-boot, non-durable 
devices, but I cannot mount them.  


I also tried the cluster ami, because I am intrigued by the possibilities for 
good performance.  Sounds great for hadoop, doesn't it?  But it won't even 
start the nodes, giving this:

Configuring template
Unexpected error while starting 1 nodes, minimum 1 nodes for [hadoop-namenode, 
hadoop-jobtracker] of cluster bhcLA
java.util.concurrent.ExecutionException: 
org.jclouds.http.HttpResponseException: command: POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 
400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' 
currently may only be used with Cluster Compute instance types.]
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at 
org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307)
        at 
org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260)
        at 
org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:680)
Caused by: org.jclouds.http.HttpResponseException: command: POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 
400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' 
currently may only be used with Cluster Compute instance types.]
        at 
org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)

There must be something a bit more involved to specify cluster instances in the 
amazon api, perhaps not (yet) supported by jclouds?  I'm afraid I don't need 
this enough right now to justify digging further .


Anyway, thanks for all your help and advice on this.

--Ben


On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote:

> Strange! I will try your properties file tomorrow.
> 
> If you want to try again you can find the artifacts for 0.4.0 RC1 here:
> http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1
> 
> On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <b...@daltonclark.com> wrote:
>> Andrei,
>> 
>> Thanks for looking at this.  Unfortunately it does not seem to work.
>> 
>> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it 
>> to 'ben' or whatever else, I get this.
>> 
>> 1) SshException on node us-east-1/i-62de280d:
>> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to 
>> session.
>>        at 
>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>        at 
>> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>        at 
>> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>> 
>> So it doesn't seem to be honoring that property, and it's definitely not 
>> allowing me to log in to any nodes,  'ben', 'ec2-user' or 'root'.
>> 
>> The ubuntu ami from the recipes continues to work fine.
>> 
>> Here's the full config file I'm using.  I grabbed the recipe from trunk and 
>> put my stuff back in, to make sure I'm not missing a new setting:
>> 
>> whirr.cluster-name=bhcTL
>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 
>> hadoop-datanode+hadoop-tasktracker
>> whirr.hadoop-install-function=install_cdh_hadoop
>> whirr.hadoop-configure-function=configure_cdh_hadoop
>> whirr.provider=aws-ec2
>> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
>> whirr.cluster-user=ben
>> # Amazon linux 32-bit--works
>> #whirr.hardware-id=c1.medium
>> #whirr.image-id=us-east-1/ami-d59d6bbc
>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
>> #whirr.hardware-id=c1.xlarge
>> #whirr.image-id=us-east-1/ami-da0cf8b3
>> # Amazon linux 64-bit as of 3/11:--doesn't work
>> whirr.hardware-id=c1.xlarge
>> whirr.image-id=us-east-1/ami-8e1fece7
>> #Cluster compute --doesn't work
>> #whirr.hardward-id=cc1.4xlarge
>> #whirr.image-id=us-east-1/ami-321eed5b
>> whirr.location-id=us-east-1d
>> hadoop-hdfs.dfs.permissions=false
>> hadoop-hdfs.dfs.replication=2
>> 
>> 
>> --Ben
>> 
>> 
>> 
>> 
>> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>> 
>>> Ben,  could you give it one more try using the current trunk?
>>> 
>>> You can specify the user by setting the option whirr.cluster-user
>>> (defaults to current system user).
>>> 
>>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <b...@daltonclark.com> 
>>> wrote:
>>>> Andrei,
>>>> 
>>>> Thanks.
>>>> 
>>>> After patching with 158, it launches fine as me on that Ubuntu image from 
>>>> the recipe (i.e. on my client machine I am 'ben', so now the aws user that 
>>>> has sudo, and as whom I can log in is also 'ben'), so that looks good.
>>>> 
>>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the 
>>>> default 64-bit ami a few days ago, and may still be) during launch:
>>>> 
>>>> 1) SshException on node us-east-1/i-b2678ddd:
>>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to 
>>>> session.
>>>>        at 
>>>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>>        at 
>>>> org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>>        at 
>>>> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>>        at 
>>>> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>>        at 
>>>> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>>>        at 
>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>> 
>>>> So it seems as if the key part of jclouds authentication setup is still 
>>>> failing for the amazon linux/ec2-user scenario, i.e. trying to set up as 
>>>> the local user, but failing.
>>>> 
>>>> Is there a property for the user it launches as?  Or does it just do 
>>>> whichever user you are locally, instead of ec2-user/ubuntu/root, depending 
>>>> on the default, as before?
>>>> 
>>>> I can switch to ubuntu, but I have a fair amount of native code setup in 
>>>> my custom scripts and would prefer to stick with a redhattish version if 
>>>> possible.
>>>> 
>>>> Looking ahead, I want to benchmark plain old 64-bit instances against 
>>>> cluster instances, to see if the allegedly improved networking gives us a 
>>>> boost, and the available ones I see are Suse and Amazon linux.  When I 
>>>> switch to the amazon linux one, like so:
>>>> 
>>>> whirr.hardward-id=cc1.4xlarge
>>>> whirr.image-id=us-east-1/ami-321eed5b
>>>> 
>>>> I get different a different problem:
>>>> 
>>>> Exception in thread "main" java.util.NoSuchElementException: hardwares 
>>>> don't support any images: [biggest=false, fastest=false, imageName=null, 
>>>> imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, 
>>>> imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, 
>>>> scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], 
>>>> metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, 
>>>> osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, 
>>>> osArch=hvm, os64Bit=true, hardwareId=m1.small]
>>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, 
>>>> processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, 
>>>> volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, 
>>>> isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, 
>>>> durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, 
>>>> device=/dev/sdc, durable=false, isBootDevice=false]], supportsI
>>>> 
>>>> but I imagine that if using cluster instances is going to be possible, 
>>>> support for amazon linux will be needed.
>>>> 
>>>> --Ben
>>>> 
>>>> 
>>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>>>> 
>>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>>>> release after fixing this issue.
>>>>> 
>>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>>>> 
>>>>> -- Andrei Savu / andreisavu.ro
>>>>> 
>>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <b...@daltonclark.com> 
>>>>> wrote:
>>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium 
>>>>>> amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was 
>>>>>> the default for new amazon linux instances, a few days ago) with good 
>>>>>> success.  I took the default hadoop-ec2.properties recipe and modified 
>>>>>> it slightly to suit my needs.  I'm now trying with basically the same 
>>>>>> properties file, but when I use
>>>>>> 
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> 
>>>>>> and then either this (from the recipe)
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> 
>>>>>> or this:
>>>>>> # Amazon linux 64-bit, default as of 3/11:
>>>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> 
>>>>>> I get a a failure to install the right public key, so that I can't log 
>>>>>> into the name node (or any other nodes, for that matter).
>>>>>> 
>>>>>> 
>>>>>> My whole config file is this:
>>>>>> 
>>>>>> whirr.cluster-name=bhcL4
>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 
>>>>>> hadoop-datanode+hadoop-tasktracker
>>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>>> whirr.provider=aws-ec2
>>>>>> whirr.identity=...
>>>>>> whirr.credential=...
>>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> #whirr.hardware-id=c1.medium
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> # Amazon linux as of 3/11:
>>>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> # If you choose a different location, make sure whirr.image-id is 
>>>>>> updated too
>>>>>> whirr.location-id=us-east-1d
>>>>>> hadoop-hdfs.dfs.permissions=false
>>>>>> hadoop-hdfs.dfs.replication=2
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Am I doing something wrong here?  I tried with 
>>>>>> whirr.location-id=us-east-1d and whirr.location-id=us-east-1
>>>> 
>>>> 
>> 
>>

Re: aws 64-bit c1.xlarge problems

Reply via email to