> > Have you tried to login on the remote machines? Are the daemons running as > expected? (check with jps) > > > May be I need to RTFM, but should hadoop processes be started by user ? If > so, how to do this ? >
The Hadoop daemons are started by Whirr at the end of the configuration scripts. Unfortunately due to timing issues they fail to start sometimes (e.g. datanode trying to start before namenode). BTW for 0.7.1 / 0.8.0 we are working on adding the ability to restart a specific service through Whirr: https://issues.apache.org/jira/browse/WHIRR-421 > > Not a full-blown warehouse at this point; but might contain a week's worth > of data. I could use Amazon EMR but I'm thinking using whirr would minimize > the changes to our jobs running on hadoop cluster. What are your thoughts > I think you should use Apache Whirr the same way you would use Amazon EMR. Store data in S3 and start the Hadoop cluster only when you need to process things. Are you running jobs on a continuos basis? > > Also, how can I specify ebs volumes for these machines ? > Unfortunately there is no easy way to do this with the current implementation. Do you want to take the lead on this? See https://issues.apache.org/jira/browse/WHIRR-290 Thanks, -- Andrei Savu / andreisavu.ro
