[
https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929616#action_12929616
]
Tibor Kiss commented on WHIRR-128:
----------------------------------
I made a new patch in which:
- I moved out the resolveAddress to a DnsUtil class, then I placed this inside
to services/hadoop module only, because currently only this service module is
affected by the problem.
- I wrote an integration test for resolveAddress which is running in hosts
with multi-interfaces configuration too. Basically I apply a check over all the
interfaces and if there are some with reverse address, I do a cross-check to
proof that the obtained reverse is still valid in forward direction. This test
works on ec2 too, where the same test logic would fail with
java.net.InetAddress's getHostName(ip) <-> getByName(reverse).
- I also removed the tabs.
Remained the Rackspace test. Tom, could you please do a check on Rackspace?
> In ec2 instances instead of public dns names a public ip address is resolved
> for the started master node which causes workers to not be able to connect
> back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WHIRR-128
> URL: https://issues.apache.org/jira/browse/WHIRR-128
> Project: Whirr
> Issue Type: Bug
> Components: core
> Affects Versions: 0.3.0
> Environment: Running hadoop (apache or CDH distro) in ec2 instances
> (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
> Reporter: Tibor Kiss
> Assignee: Tibor Kiss
> Fix For: 0.3.0
>
> Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz,
> on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse
> address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces
> the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running
> the following small code.
> InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
> System.out.println("getHostAddress: " +
> namenodePublicAddress.getHostAddress());
> System.out.println("getHostName: " + namenodePublicAddress.getHostName());
> System.out.println("getCanonicalHostName: " +
> namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf
> in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or
> getCanonicalHostName() does not resolves reverse dns names for ec2 public
> addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the
> previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on
> the worker nodes are incorrectly filled with ip addresses instead of dns
> names. As we know, it is important to use public dns name of the ec2 instance
> because amazon's nameserver it can resolve to an external or internal ip
> address, one that is better for direct communication. In case of hadoop
> cluster, the used security group does not allow intercommunication between
> nodes by using public ip address and therefore the worker nodes cannot
> contact the services on the master node. Looking into the hadoop logs it is
> clearly visible the problem that workers cannot connect to master.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.