Hi Kevin,

Thanks for your reply.
Dasher is physically located under my desk and vixen is in a
cecure data center.

>  does dasher have any network interfaces that vixen does not?

No, I don't think so.
Here is more definitive info:
  [tsakai@dasher Rmpi]$ ifconfig
  eth0      Link encap:Ethernet  HWaddr 00:1A:A0:E1:84:A9
            inet addr:172.16.0.116  Bcast:172.16.3.255  Mask:255.255.252.0
            inet6 addr: fe80::21a:a0ff:fee1:84a9/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:2347 errors:0 dropped:0 overruns:0 frame:0
            TX packets:1005 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:100
            RX bytes:531809 (519.3 KiB)  TX bytes:269872 (263.5 KiB)
            Memory:c2200000-c2220000

  lo        Link encap:Local Loopback
            inet addr:127.0.0.1  Mask:255.0.0.0
            inet6 addr: ::1/128 Scope:Host
            UP LOOPBACK RUNNING  MTU:16436  Metric:1
            RX packets:74 errors:0 dropped:0 overruns:0 frame:0
            TX packets:74 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:7824 (7.6 KiB)  TX bytes:7824 (7.6 KiB)

  [tsakai@dasher Rmpi]$

However, vixen has two ethernet[tsakai@vixen Rmpi]$ cat moo
  [root@vixen ec2]# /sbin/ifconfig
  eth0      Link encap:Ethernet  HWaddr 00:1A:A0:1C:00:31
            inet addr:10.1.1.2  Bcast:192.168.255.255  Mask:255.0.0.0
            inet6 addr: fe80::21a:a0ff:fe1c:31/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:61913135 errors:0 dropped:0 overruns:0 frame:0
            TX packets:61923635 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:47832124690 (44.5 GiB)  TX bytes:54515478860 (50.7 GiB)
            Interrupt:185 Memory:ea000000-ea012100

  eth1      Link encap:Ethernet  HWaddr 00:1A:A0:1C:00:33
            inet addr:172.16.1.107  Bcast:172.16.3.255  Mask:255.255.252.0
            inet6 addr: fe80::21a:a0ff:fe1c:33/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:5204431112 errors:0 dropped:0 overruns:0 frame:0
            TX packets:8935796075 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:371123590892 (345.6 GiB)  TX bytes:13424246629869 (12.2
TiB)
            Interrupt:193 Memory:ec000000-ec012100

  lo        Link encap:Local Loopback
            inet addr:127.0.0.1  Mask:255.0.0.0
            inet6 addr: ::1/128 Scope:Host
            UP LOOPBACK RUNNING  MTU:16436  Metric:1
            RX packets:244169216 errors:0 dropped:0 overruns:0 frame:0
            TX packets:244169216 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:1190976360356 (1.0 TiB)  TX bytes:1190976360356 (1.0
TiB)

  [root@vixen ec2]# interfaces:

Please see the mail posting that follows this, my reply to Ashley,
whom nailed the problem precisely.

Regards,

Tena


On 2/14/11 1:35 PM, "kevin.buck...@ecs.vuw.ac.nz"
<kevin.buck...@ecs.vuw.ac.nz> wrote:

> 
> This probably shows my lack of understanding as to how OpenMPI
> negotiates the connectivity between nodes when given a choice
> of interfaces but anyway:
> 
>  does dasher have any network interfaces that vixen does not?
> 
> The scenario I am imgaining would be that you ssh into dasher
> from vixen using a "network" that both share and similarly, when
> you mpirun from vixen, the network that OpenMPI uses is constrained
> by the interfaces that can be seen from vixen, so you are fine.
> 
> However when you are on dasher, mpirun sees another interface which
> it takes a liking to and so tries to use that, but that interface
> is not available to vixen so the OpenMPI processes spawned there
> terminate when they can't find that interface so as to talk back
> to dasher's controlling process.
> 
> I know that you are no longer working with VMs but it's along those
> lines that I was thinking: extra network interfaces that you assume
> won't be used but which are and which could then be overcome by use
> of an explicit
> 
>  --mca btl_tcp_if_exclude virbr0
> 
> or some such construction (virbr0 used as an example here).
> 
> Kevin


Reply via email to