I wanted to elaborate on what happened.
A hadoop slave was added to a live cluster. Turns out, I think the
mapred-site.xml was not configured with the correct master host. But
alas, in any case when the commands were run:
* |$ hadoop mradmin -refreshNodes|
* |$ hadoop dfsadmin
I assume you are on Linux. Also assuming that your tasks are so resource
intensive that they are taking down nodes. You should enable limits per task,
see http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring
What it does is that jobs are now forced to up front provide their
Yes, I mentioned below we're running RHEL.
In this case, when I went to add the node, I ran hadoop mradmin
-refreshNodes (as user hadoop) and the master node went completely nuts
- the system load jumped to 60 (top was frozen on the console) and
required a hard reboot.
Whether or not the
We recently experienced a couple of situations that brought one or more
Hadoop nodes down (unresponsive). One was related to a bug in a
utility we use (ffmpeg) that was resolved by compiling a new version.
The next, today, occurred after attempting to join a new node to the
cluster.
A