Andrei, The release candidate code does work. Perhaps something is different, relative to the patched frankenstein I was using, perhaps I had some local corruption or config problem.
It sets up everything as whatever my local user is, by default, and the override as whirr.cluster-user works as well. In any case, at the rate AWS seems to be changing the configuration of 'amazon linux' perhaps it's less useful than I thought. Last week the default amis in the console had a bunch of spare disk space on the /media/ephemeral0 partition, which I could symlink /mnt to in the install_cdh_hadoop.sh script, and then hdfs would have a decent amount of space. Now there is no such thing, so I suppose I would have to launch an ebs volume per node and mount that. This is now tipping over into the "too much trouble" zone for me. And in the mean time I got all my native stuff (hadoop-lzo and R/Rhipe) working on ubuntu, so I think I'm going to use the Alestic image from the recipe for a while. If there's an obvious candidate up there for "reasonably-modern redhat derivative ami from a source on the good lists that behaves well," I'd like to know what it is. By 'reasonably modern' I mean having default python >= 2.5. I liked the old custom of having /mnt be a separate partition of a decent size. I hope this is just a glitch with AWS. I suspect it may be because jclouds/whirr is showing (e.g.) in the output: volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false] So theoretically the disk space is still there on those non-boot, non-durable devices, but I cannot mount them. I also tried the cluster ami, because I am intrigued by the possibilities for good performance. Sounds great for hadoop, doesn't it? But it won't even start the nodes, giving this: Configuring template Unexpected error while starting 1 nodes, minimum 1 nodes for [hadoop-namenode, hadoop-jobtracker] of cluster bhcLA java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.] at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307) at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260) at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.] at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75) There must be something a bit more involved to specify cluster instances in the amazon api, perhaps not (yet) supported by jclouds? I'm afraid I don't need this enough right now to justify digging further . Anyway, thanks for all your help and advice on this. --Ben On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote: > Strange! I will try your properties file tomorrow. > > If you want to try again you can find the artifacts for 0.4.0 RC1 here: > http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1 > > On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <b...@daltonclark.com> wrote: >> Andrei, >> >> Thanks for looking at this. Unfortunately it does not seem to work. >> >> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it >> to 'ben' or whatever else, I get this. >> >> 1) SshException on node us-east-1/i-62de280d: >> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to >> session. >> at >> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252) >> at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206) >> at >> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90) >> at >> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70) >> >> So it doesn't seem to be honoring that property, and it's definitely not >> allowing me to log in to any nodes, 'ben', 'ec2-user' or 'root'. >> >> The ubuntu ami from the recipes continues to work fine. >> >> Here's the full config file I'm using. I grabbed the recipe from trunk and >> put my stuff back in, to make sure I'm not missing a new setting: >> >> whirr.cluster-name=bhcTL >> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 >> hadoop-datanode+hadoop-tasktracker >> whirr.hadoop-install-function=install_cdh_hadoop >> whirr.hadoop-configure-function=configure_cdh_hadoop >> whirr.provider=aws-ec2 >> whirr.identity=${env:AWS_ACCESS_KEY_ID} >> whirr.credential=${env:AWS_SECRET_ACCESS_KEY} >> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey >> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub >> whirr.cluster-user=ben >> # Amazon linux 32-bit--works >> #whirr.hardware-id=c1.medium >> #whirr.image-id=us-east-1/ami-d59d6bbc >> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works >> #whirr.hardware-id=c1.xlarge >> #whirr.image-id=us-east-1/ami-da0cf8b3 >> # Amazon linux 64-bit as of 3/11:--doesn't work >> whirr.hardware-id=c1.xlarge >> whirr.image-id=us-east-1/ami-8e1fece7 >> #Cluster compute --doesn't work >> #whirr.hardward-id=cc1.4xlarge >> #whirr.image-id=us-east-1/ami-321eed5b >> whirr.location-id=us-east-1d >> hadoop-hdfs.dfs.permissions=false >> hadoop-hdfs.dfs.replication=2 >> >> >> --Ben >> >> >> >> >> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote: >> >>> Ben, could you give it one more try using the current trunk? >>> >>> You can specify the user by setting the option whirr.cluster-user >>> (defaults to current system user). >>> >>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <b...@daltonclark.com> >>> wrote: >>>> Andrei, >>>> >>>> Thanks. >>>> >>>> After patching with 158, it launches fine as me on that Ubuntu image from >>>> the recipe (i.e. on my client machine I am 'ben', so now the aws user that >>>> has sudo, and as whom I can log in is also 'ben'), so that looks good. >>>> >>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the >>>> default 64-bit ami a few days ago, and may still be) during launch: >>>> >>>> 1) SshException on node us-east-1/i-b2678ddd: >>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to >>>> session. >>>> at >>>> org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252) >>>> at >>>> org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206) >>>> at >>>> org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90) >>>> at >>>> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70) >>>> at >>>> org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45) >>>> at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>> >>>> So it seems as if the key part of jclouds authentication setup is still >>>> failing for the amazon linux/ec2-user scenario, i.e. trying to set up as >>>> the local user, but failing. >>>> >>>> Is there a property for the user it launches as? Or does it just do >>>> whichever user you are locally, instead of ec2-user/ubuntu/root, depending >>>> on the default, as before? >>>> >>>> I can switch to ubuntu, but I have a fair amount of native code setup in >>>> my custom scripts and would prefer to stick with a redhattish version if >>>> possible. >>>> >>>> Looking ahead, I want to benchmark plain old 64-bit instances against >>>> cluster instances, to see if the allegedly improved networking gives us a >>>> boost, and the available ones I see are Suse and Amazon linux. When I >>>> switch to the amazon linux one, like so: >>>> >>>> whirr.hardward-id=cc1.4xlarge >>>> whirr.image-id=us-east-1/ami-321eed5b >>>> >>>> I get different a different problem: >>>> >>>> Exception in thread "main" java.util.NoSuchElementException: hardwares >>>> don't support any images: [biggest=false, fastest=false, imageName=null, >>>> imageDescription=Amazon Linux AMI x86_64 HVM EBS EXT4, >>>> imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1, >>>> scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], >>>> metadata={}], minCores=0.0, minRam=0, osFamily=unrecognized, osName=null, >>>> osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4, osVersion=, >>>> osArch=hvm, os64Bit=true, hardwareId=m1.small] >>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, >>>> processors=[[cores=4.0, speed=4.0], [cores=4.0, speed=4.0]], ram=23552, >>>> volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, >>>> isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb, >>>> durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, >>>> device=/dev/sdc, durable=false, isBootDevice=false]], supportsI >>>> >>>> but I imagine that if using cluster instances is going to be possible, >>>> support for amazon linux will be needed. >>>> >>>> --Ben >>>> >>>> >>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote: >>>> >>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are >>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem >>>>> you are seeing. We should be able to restart the vote for the 0.4.0 >>>>> release after fixing this issue. >>>>> >>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264 >>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158 >>>>> >>>>> -- Andrei Savu / andreisavu.ro >>>>> >>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <b...@daltonclark.com> >>>>> wrote: >>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium >>>>>> amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was >>>>>> the default for new amazon linux instances, a few days ago) with good >>>>>> success. I took the default hadoop-ec2.properties recipe and modified >>>>>> it slightly to suit my needs. I'm now trying with basically the same >>>>>> properties file, but when I use >>>>>> >>>>>> whirr.hardware-id=c1.xlarge >>>>>> >>>>>> and then either this (from the recipe) >>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ >>>>>> whirr.image-id=us-east-1/ami-da0cf8b3 >>>>>> >>>>>> or this: >>>>>> # Amazon linux 64-bit, default as of 3/11: >>>>>> whirr.image-id=us-east-1/ami-8e1fece7 >>>>>> >>>>>> I get a a failure to install the right public key, so that I can't log >>>>>> into the name node (or any other nodes, for that matter). >>>>>> >>>>>> >>>>>> My whole config file is this: >>>>>> >>>>>> whirr.cluster-name=bhcL4 >>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 >>>>>> hadoop-datanode+hadoop-tasktracker >>>>>> whirr.hadoop-install-function=install_cdh_hadoop >>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop >>>>>> whirr.provider=aws-ec2 >>>>>> whirr.identity=... >>>>>> whirr.credential=... >>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop >>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub >>>>>> whirr.hardware-id=c1.xlarge >>>>>> #whirr.hardware-id=c1.medium >>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ >>>>>> whirr.image-id=us-east-1/ami-da0cf8b3 >>>>>> # Amazon linux as of 3/11: >>>>>> #whirr.image-id=us-east-1/ami-8e1fece7 >>>>>> # If you choose a different location, make sure whirr.image-id is >>>>>> updated too >>>>>> whirr.location-id=us-east-1d >>>>>> hadoop-hdfs.dfs.permissions=false >>>>>> hadoop-hdfs.dfs.replication=2 >>>>>> >>>>>> >>>>>> >>>>>> Am I doing something wrong here? I tried with >>>>>> whirr.location-id=us-east-1d and whirr.location-id=us-east-1 >>>> >>>> >> >>