some nodes terminating at startup

Selwyn McCracken Mon, 07 Mar 2011 13:51:56 -0800

Hi Whirrers,

I have been successfully launching smaller clusters with whirr (<= 4
data nodes).


When I try to scale to something larger (8+ nodes), some of the nodes
terminate during the startup process, and frequently it is the name
node.

I have reviewed the logs and there doesn't to be anything I can spot
(in fact the whirr script hangs and never closes, so the log never
completes).

I suspect something is timing out if the cluster is being launched serially...

Has there been any progress made in adding nodes to an already running
cluster? This might help to work around this problem, and make it
easier for my benchmarking tests, where I am trying to show a linear
decrease in processing time as the number of nodes increase. That is,
I wont have to start a fresh cluster and reload the data into HDFS for
each test run.

Anyway, here is the recipe I have been using:

whirr.cluster-name=hadoop8l
whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,8
hadoop-datanode+hadoop-tasktracker
whirr.hadoop-install-function=install_cdh_hadoop
whirr.hadoop-configure-function=configure_cdh_hadoop
whirr.provider=aws-ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.hardware-id=m1.large
whirr.image-id=us-east-1/ami-da0cf8b3
whirr.location-id=us-east-1

Any help greatly appreciated.
Selwyn

some nodes terminating at startup

Reply via email to