Hi,
I am exploring options to deploy a small hadoop cluster on EC2. I read
about Whirr and would like to try it out. I have a few questions before I
dive into this:
1. My laptop currently runs Ubuntu 11.10 (Oneiric Ocelot). I am running
Hadoop 0.20.2+923.142 - CDH3u2 on my laptop. Is there a compatible Whirr
for this Hadoop release? I read this:
http://ashenfad.blogspot.com/2011/01/hadoop-cluster-on-ec2-using-cloudera.html.
2. Is it really necessary to have the same Hadoop version running on my
laptop as what the Whirr instance is using?
3. Is there an example whirr config file that shows how to create an
EC-2 instance with Hadoop 20.2, Hive, Sqoop and Flume? I guess I can
configure Whirr to download and install all the latest hadoop ecosysem
tools and then create a custom AMI out of that first instance.
Please let me know the caveats and other fine points I need to know to use
all the latest packages and Whirr. What should I keep in mind while I begin
to use Whirr?
many thanks,
PD/