Hey Andrei,

From: Andrei Savu <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wed, 11 Jan 2012 02:40:04 -0800
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [newbie] Unable to setup hadoop cluster on ec2



Have you tried to login on the remote machines? Are the daemons running as 
expected? (check with jps)

May be I need to RTFM, but should hadoop processes be started by user ? If so, 
how to do this ?

The Hadoop daemons are started by Whirr at the end of the configuration 
scripts. Unfortunately due to timing issues they fail to start sometimes (e.g. 
datanode trying to start before namenode).

BTW for 0.7.1 / 0.8.0 we are working on adding the ability to restart a 
specific service through Whirr:
https://issues.apache.org/jira/browse/WHIRR-421


Not a full-blown warehouse at this point; but might contain a week's worth of 
data. I could use Amazon EMR but I'm thinking using whirr would minimize the 
changes to our jobs running on hadoop cluster. What are your thoughts

I think you should use Apache Whirr the same way you would use Amazon EMR. 
Store data in S3 and start the Hadoop cluster only when you need to process 
things. Are you running jobs on a continuos basis?


Yes we're going to be running jobs on a continuous basis


Also, how can I specify ebs volumes for these machines ?

Unfortunately there is no easy way to do this with the current implementation. 
Do you want to take the lead on this?

See https://issues.apache.org/jira/browse/WHIRR-290


I may not have the bandwidth to ramp up but would appreciate if you could send 
me some pointers on getting started !

Thanks,
Madhu


Thanks,

-- Andrei Savu / andreisavu.ro<http://andreisavu.ro>

Reply via email to