Hey Andrei, From: Andrei Savu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wed, 11 Jan 2012 02:40:04 -0800 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [newbie] Unable to setup hadoop cluster on ec2
Have you tried to login on the remote machines? Are the daemons running as expected? (check with jps) May be I need to RTFM, but should hadoop processes be started by user ? If so, how to do this ? The Hadoop daemons are started by Whirr at the end of the configuration scripts. Unfortunately due to timing issues they fail to start sometimes (e.g. datanode trying to start before namenode). BTW for 0.7.1 / 0.8.0 we are working on adding the ability to restart a specific service through Whirr: https://issues.apache.org/jira/browse/WHIRR-421 Not a full-blown warehouse at this point; but might contain a week's worth of data. I could use Amazon EMR but I'm thinking using whirr would minimize the changes to our jobs running on hadoop cluster. What are your thoughts I think you should use Apache Whirr the same way you would use Amazon EMR. Store data in S3 and start the Hadoop cluster only when you need to process things. Are you running jobs on a continuos basis? Yes we're going to be running jobs on a continuous basis Also, how can I specify ebs volumes for these machines ? Unfortunately there is no easy way to do this with the current implementation. Do you want to take the lead on this? See https://issues.apache.org/jira/browse/WHIRR-290 I may not have the bandwidth to ramp up but would appreciate if you could send me some pointers on getting started ! Thanks, Madhu Thanks, -- Andrei Savu / andreisavu.ro<http://andreisavu.ro>
