I have an update - and it's a weird one. In the course of trying to fix my other problem (http://mail-archives.apache.org/mod_mbox/hadoop-user/201806.mbox/ajax/%3C3276cbc6-6f04-9dcb-9be0-186334a0cf7a%40att.net%3E) I hit upon some strange behavior germane to this problem.

I tried to cut down my 3.1.0 cluster to a two-node one in hopes of eliminating the dual-homed machine from the cluster to see if that restored proper operation. When I tried to run start-dfs.sh I got something to the effect of:

   hostname msba02b,msba02c not found

In other words, it was taking the output of my sed command (as shown below; a comma-separated list of the host names in the workers file, as expected by the --hostnames option) and interpreting the whole thing as a single hostname.

So I thought, is the hdfs datanode command not liking the absence of periods, since my host tables and Hadoop configs now contain atomic hostnames? I made up a fake FQDN for everything; didn't make any difference.

Finally, ***removing the --hostnames option entirely from the command*** - making it look like it did in the original distribution - did the trick.

To me, this makes no sense at all; how could the hdfs datanode command work one way when the machines use dynamic DNS and another way when using host tables for name resolution?

Just as an FYI and hopefully leading to a bug-fix: the command to start Datanode daemons in .../sbin/start-dfs.sh and .../sbin/stop-dfs.sh from the prebuilt Hadoop distribution (both 3.0.1 and 3.1.0) won't run as written.

Here's the command that errors out:

    hadoop_uservar_su hdfs datanode "${HADOOP_HDFS_HOME}/bin/hdfs" \
        --workers \
        --config "${HADOOP_CONF_DIR}" \
        --daemon start \
        datanode ${dataStartOpt}

What happens is that that it thinks the path to the workers file is the network name of a datanode.

To get this to work properly, I use the --hostnames option and supply as a value a space-delimited version of the one-name-per-line workers file like so:

hadoop_uservar_su hdfs datanode "${HADOOP_HDFS_HOME}/bin/hdfs" \
    --workers \
    --hostnames `sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g' ${HADOOP_CONF_DIR}/workers` \
    --config "${HADOOP_CONF_DIR}" \
    --daemon start \
    datanode ${dataStartOpt}

