choices for deploying a small hadoop cluster on EC2

2011-11-29 Thread Periya.Data
Hi All, I am just beginning to learn how to deploy a small cluster (a 3 node cluster) on EC2. After some quick Googling, I see the following approaches: 1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it have features for persisting (EBS)? 2. CDH Cloud Scripts

Re: choices for deploying a small hadoop cluster on EC2

2011-11-29 Thread Prashant Sharma
yes pallets library. https://github.com/pallet/pallet-hadoop-example On Wed, Nov 30, 2011 at 1:58 AM, Periya.Data wrote: > Hi All, >I am just beginning to learn how to deploy a small cluster (a 3 > node cluster) on EC2. After some quick Googling, I see the following > approaches: > >

Re: choices for deploying a small hadoop cluster on EC2

2011-11-29 Thread John Conwell
I'm a big fan of Whirr, though I dont think it support EBS persistance. My hadoop deployment strategy has always been store input and output data on S3, spin up my hadoop cluster with either whirr or Elastic Map Reduce, run the job, store output data on S3, and kill the cluster. On Tue, Nov 29,

Re: choices for deploying a small hadoop cluster on EC2

2011-11-29 Thread Konstantin Boudnik
I'd suggest you use BigTop (cross-posting to bigtop-dev@ list) produced bit which also posses Puppet recipes allowing for fully automated deployment and configuration. BigTop also uses Jenkins EC2 plugin for deployment part and it seems to work real great! Cos On Tue, Nov 29, 2011 at 12:28PM, Per

Re: choices for deploying a small hadoop cluster on EC2

2011-11-29 Thread Periya.Data
Thanks for all your help and replies. Though I am leaning towards option 1 or 2, I looked up Big Table...an Incubator project in Apache. Could not find enough info on it in its website. I have a few more questions...and hope they apply to these mailing-list.. 1. Cos: Can you please point me to a