Having an issue with host names on my new Hadoop cluster.

The cluster is currently 1 name node and 2 data nodes, running in a cloud vendor data center. All is well with general operations of the cluster - i.e., name node and data nodes can talk just fine, I can read/write to/from the HDFS, yada yada.

The problem is when I try to view the DFS through the web GUI. The http://<namenode>:50070/dfsnodelist.jsp page lists the data nodes, but the links don't work properly.

I think the reason is because I don't have dns entries set up for the slave machines. And their /etc/hosts file is somewhat sketchy/sparse, i.e.:

[r...@hddata01 conf]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 004.admin.lax1 004 localhost.localdomain localhost hddata01
::1             localhost6.localdomain6 localhost6

(Given the above hosts file, we would internally think of the node as being named "hdddata01". But again, there's no DNS entry for that.)

So the data nodes all appear (incorrectly) in the HDFS node list page as "004", with an erroneous link to http://004.admin.lax1:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F - which is obviously a broken link.

Is there any way to fix this issue without setting up DNS entries for the data nodes? e.g., is there any way to tell Hadoop to only use IP addresses in the GUI?

I also did some googling on this issue today, and saw mention of a "slave.host.name" configuration setting that sounded like it might solve the problem. But it doesn't appear to be well documented, and it wasn't clear that this was the solution.

Any suggestions much appreciated!

TIA,

DR

Reply via email to