[jira] Commented: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Tibor Kiss (JIRA) Mon, 08 Nov 2010 08:23:35 -0800

    [ 
https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929616#action_12929616
 ]


Tibor Kiss commented on WHIRR-128:
----------------------------------

I made a new patch in which:
 - I moved out the resolveAddress to a DnsUtil class, then I placed this inside 
to services/hadoop module only, because currently only this service module is 
affected by the problem.
 - I wrote an integration test for resolveAddress which is running in hosts 
with multi-interfaces configuration too. Basically I apply a check over all the 
interfaces and if there are some with reverse address, I do a cross-check to 
proof that the obtained reverse is still valid in forward direction. This test 
works on ec2 too, where the same test logic would fail with 
java.net.InetAddress's getHostName(ip) <-> getByName(reverse).
 - I also removed the tabs.

Remained the Rackspace test. Tom, could you please do a check on Rackspace?

> In ec2 instances instead of public dns names a public ip address is resolved 
> for the started master node which causes workers to not be able to connect 
> back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances 
> (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, 
> on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse 
> address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces 
> the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running 
> the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + 
> namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + 
> namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf 
> in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or 
> getCanonicalHostName() does not resolves reverse dns names for ec2 public 
> addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the 
> previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on 
> the worker nodes are incorrectly filled with ip addresses instead of dns 
> names. As we know, it is important to use public dns name of the ec2 instance 
> because amazon's nameserver it can resolve to an external or internal ip 
> address, one that is better for direct communication. In case of hadoop 
> cluster, the used security group does not allow intercommunication between 
> nodes by using public ip address and therefore the worker nodes cannot 
> contact the services on the master node. Looking into the hadoop logs it is 
> clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (WHIRR-128) In ec2 instances instead of public dns names a public ip address is resolved for the started master node which causes workers to not be able to connect back to the master

Reply via email to