In ec2 instances instead of public dns names a public ip address is resolved 
for the started master node which causes workers to not be able to connect back 
to the master
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: WHIRR-128
                 URL: https://issues.apache.org/jira/browse/WHIRR-128
             Project: Whirr
          Issue Type: Bug
    Affects Versions: 0.3.0
         Environment: Running hadoop (apache or CDH distro) in ec2 instances 
(Ubuntu or CentOS or Fedora).
The same issue with the integration test of whirr.
            Reporter: Tibor Kiss


The problem it is related to the nature how it is resolved the reverse address 
in ec2 instances.
After isolating the problem I could write a very simple app which reproduces 
the cause of the issue.
Pass in args the public ip address of the ec2 instance where are you running 
the following small code.
    InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
    System.out.println("getHostAddress: " + 
namenodePublicAddress.getHostAddress());
    System.out.println("getHostName: " + namenodePublicAddress.getHostName());
    System.out.println("getCanonicalHostName: " + 
namenodePublicAddress.getCanonicalHostName());

If I am running it on my laptop I get
getHostAddress: 50.16.71.64
getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com

if I am running it on ec2 instance
getHostAddress: 50.16.71.64
getHostName: 50.16.71.64
getCanonicalHostName: 50.16.71.64 

My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf 
in each cases contains a nameserver entry.
For some unknown reason, the java.net.InetAddress's getHostName() or 
getCanonicalHostName() does not resolves reverse dns names for ec2 public 
addresses if it was running in ec2 instance.
But any other resolver tools correctly resolves that reverse dns name.

In whirr codebase there are some getHostName() calls, which because of the 
previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on 
the worker nodes are incorrectly filled with ip addresses instead of dns names. 
As we know, it is important to use public dns name of the ec2 instance because 
amazon's nameserver it can resolve to an external or internal ip address, one 
that is better for direct communication. In case of hadoop cluster, the used 
security group does not allow intercommunication between nodes by using public 
ip address and therefore the worker nodes cannot contact the services on the 
master node. Looking into the hadoop logs it is clearly visible the problem 
that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to