[ 
https://issues.apache.org/jira/browse/WHIRR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928657#action_12928657
 ] 

Tom White commented on WHIRR-128:
---------------------------------

Thanks for submitting this Tibor. Overall it looks good. A few comments:

* Could you put the resolving logic (resolveAddress) outside the Service class? 
It's really an implementation detail and Service is a public interface for 
users.
* Is there a test we could write for resolveAddress?
* We should test that this works with Rackspace too. (I've got some credentials 
and can do that.)
* There are some tabs in the patch.


> In ec2 instances instead of public dns names a public ip address is resolved 
> for the started master node which causes workers to not be able to connect 
> back to the master
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: WHIRR-128
>                 URL: https://issues.apache.org/jira/browse/WHIRR-128
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.3.0
>         Environment: Running hadoop (apache or CDH distro) in ec2 instances 
> (Ubuntu or CentOS or Fedora).
> The same issue with the integration test of whirr.
>            Reporter: Tibor Kiss
>            Assignee: Tibor Kiss
>             Fix For: 0.3.0
>
>         Attachments: compare-myhost-with-ec2.txt, on-ec2-after-patch.tar.gz, 
> on-ec2-before-patch.tar.gz, whirr-trunk.patch
>
>
> The problem it is related to the nature how it is resolved the reverse 
> address in ec2 instances.
> After isolating the problem I could write a very simple app which reproduces 
> the cause of the issue.
> Pass in args the public ip address of the ec2 instance where are you running 
> the following small code.
>     InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>     System.out.println("getHostAddress: " + 
> namenodePublicAddress.getHostAddress());
>     System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     System.out.println("getCanonicalHostName: " + 
> namenodePublicAddress.getCanonicalHostName());
> If I am running it on my laptop I get
> getHostAddress: 50.16.71.64
> getHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> getCanonicalHostName: ec2-50-16-71-64.compute-1.amazonaws.com
> if I am running it on ec2 instance
> getHostAddress: 50.16.71.64
> getHostName: 50.16.71.64
> getCanonicalHostName: 50.16.71.64 
> My laptop has the same CentOS 5.5 as my ec2 instance and the /etc/resolv.conf 
> in each cases contains a nameserver entry.
> For some unknown reason, the java.net.InetAddress's getHostName() or 
> getCanonicalHostName() does not resolves reverse dns names for ec2 public 
> addresses if it was running in ec2 instance.
> But any other resolver tools correctly resolves that reverse dns name.
> In whirr codebase there are some getHostName() calls, which because of the 
> previously described symptom, causes that /etc/hadoop/conf/hadoop-site.xml on 
> the worker nodes are incorrectly filled with ip addresses instead of dns 
> names. As we know, it is important to use public dns name of the ec2 instance 
> because amazon's nameserver it can resolve to an external or internal ip 
> address, one that is better for direct communication. In case of hadoop 
> cluster, the used security group does not allow intercommunication between 
> nodes by using public ip address and therefore the worker nodes cannot 
> contact the services on the master node. Looking into the hadoop logs it is 
> clearly visible the problem that workers cannot connect to master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to