Hi Barnet,

Allow me to interject.
Are you saying that you run master on your local machine and launching openMPI 
process on EC2?  You are saying that 1) tcp port tcp://192.168.1.101:35272 is 
on your local system and 2) the ec2 instance is trying to connect your local 
machine’s port 35272 , and hanging.  Is that correct?

I have just a bit different situation.  I am running 2 ec2 instances and trying 
to run mpirun on both instances.  My ssh debug output looks quite similar to 
yours and mpirun behavior also very similar.  Here’s what I captured:
  Sending command:  orted --daemonize -mca ess env -mca orte_ess_jobid 
1025769472 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 
"1025769472.0;tcp://10.118.23.4:60941"
And here’s what I did on the instance from which I issued mpirun:
  [tsakai@ip-10-118-23-4 ~]$ nslookup `hostname`
  Server:         172.16.0.23
  Address:        172.16.0.23#53

  Non-authoritative answer:
  Name:   ip-10-118-23-4.ec2.internal
  Address: 10.118.23.4

So that tcp port does belong to this instance.  Furthermore, it cannot come 
into it.  No router (which may perform address translation?) is involved and it 
appears the same thing as what you describe is happening.  Incidentally, here’s 
how I ran mpirun:
  [tsakai@ip-10-118-23-4 ~]$ mpirun -app app.ac
With app.ac file:
  [tsakai@ip-10-118-23-4 ~]$ cat app.ac
  -H ip-10-118-23-4.ec2.internal -np 1 /bin/hostname
  -H ip-10-118-23-4.ec2.internal -np 1 /bin/hostname
  -H ip-10-118-18-172.ec2.internal -np 1 /bin/hostname
  -H ip-10-118-18-172.ec2.internal -np 1 /bin/hostname

The first two lines spawns /bin/hostname on this instance 
(ip-10-118-23-4.ec2.internal) and the bottom 2 lines on the remote instance.
Here’s the security group used for these instances:

  connetion       protocol   from     to      source
  -------------        -----------   ------    -----   ------------
  SSH                 tcp           22      22    0.0.0.0/0

Am I making sense?

Regards,

Tena




On 2/16/11 8:56 PM, "Barnet Wagman" <b...@norbl.com> wrote:

  I've run into a problem involving accessing a remote host via a router and I 
think need to understand how opmpi determines ip addresses.  If there's 
anything posted on this subject, please point me to it.

 Here's the problem:

 I've installed opmpi (1.4.3) on a remote system (an Amazon ec2 instance).  If 
the local system I'm working on has a static ip address (and a direct 
connection to the internet), there's no problem.  But if the local system 
accesses the internet through a router (which itself gets it's ip via dhcp), a 
call to runmpi command hangs.

 This is not firewall problem - I've disabled the firewalls on all the system 
that are involved (and the router).

 It is also not an ssh problem.  The ssh connection is being made and it 
appears that the application has been launched on the remote system.  After the 
runmpi command has been launched locally, a ps on the remote system shows a 
process

orted --daemonize -mca ess env -mca orte_ess_jobid 1187643392 -mca 
orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 
1187643392.0;tcp://192.168.1.101:35272


 While I don't really understand the orted process, I assume this indicates 
that a command to execute an app has been received and that opmpi is trying to 
run it.

 I suspect that the problem is related to the '--hnp-uri ... 
tcp://192.168.1.101' argument.  192.168.1.101 is the address of my local system 
on my local network (attached to the router), which of course is not accessible 
over the net.  It appears that opmpi is transmitting the local (static) ip 
address to the remote host.

 It would help to know how opmpi determines and distributes IP addresses.  And 
if there's any way to control this.

 Any thoughts on dealing with this would be greatly appreciated.

 Thanks,

 bw



Reply via email to