Hi Kevin, Thanks for your reply. Dasher is physically located under my desk and vixen is in a cecure data center.
> does dasher have any network interfaces that vixen does not? No, I don't think so. Here is more definitive info: [tsakai@dasher Rmpi]$ ifconfig eth0 Link encap:Ethernet HWaddr 00:1A:A0:E1:84:A9 inet addr:172.16.0.116 Bcast:172.16.3.255 Mask:255.255.252.0 inet6 addr: fe80::21a:a0ff:fee1:84a9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2347 errors:0 dropped:0 overruns:0 frame:0 TX packets:1005 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:531809 (519.3 KiB) TX bytes:269872 (263.5 KiB) Memory:c2200000-c2220000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:74 errors:0 dropped:0 overruns:0 frame:0 TX packets:74 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7824 (7.6 KiB) TX bytes:7824 (7.6 KiB) [tsakai@dasher Rmpi]$ However, vixen has two ethernet[tsakai@vixen Rmpi]$ cat moo [root@vixen ec2]# /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:31 inet addr:10.1.1.2 Bcast:192.168.255.255 Mask:255.0.0.0 inet6 addr: fe80::21a:a0ff:fe1c:31/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:61913135 errors:0 dropped:0 overruns:0 frame:0 TX packets:61923635 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:47832124690 (44.5 GiB) TX bytes:54515478860 (50.7 GiB) Interrupt:185 Memory:ea000000-ea012100 eth1 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:33 inet addr:172.16.1.107 Bcast:172.16.3.255 Mask:255.255.252.0 inet6 addr: fe80::21a:a0ff:fe1c:33/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5204431112 errors:0 dropped:0 overruns:0 frame:0 TX packets:8935796075 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:371123590892 (345.6 GiB) TX bytes:13424246629869 (12.2 TiB) Interrupt:193 Memory:ec000000-ec012100 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:244169216 errors:0 dropped:0 overruns:0 frame:0 TX packets:244169216 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1190976360356 (1.0 TiB) TX bytes:1190976360356 (1.0 TiB) [root@vixen ec2]# interfaces: Please see the mail posting that follows this, my reply to Ashley, whom nailed the problem precisely. Regards, Tena On 2/14/11 1:35 PM, "kevin.buck...@ecs.vuw.ac.nz" <kevin.buck...@ecs.vuw.ac.nz> wrote: > > This probably shows my lack of understanding as to how OpenMPI > negotiates the connectivity between nodes when given a choice > of interfaces but anyway: > > does dasher have any network interfaces that vixen does not? > > The scenario I am imgaining would be that you ssh into dasher > from vixen using a "network" that both share and similarly, when > you mpirun from vixen, the network that OpenMPI uses is constrained > by the interfaces that can be seen from vixen, so you are fine. > > However when you are on dasher, mpirun sees another interface which > it takes a liking to and so tries to use that, but that interface > is not available to vixen so the OpenMPI processes spawned there > terminate when they can't find that interface so as to talk back > to dasher's controlling process. > > I know that you are no longer working with VMs but it's along those > lines that I was thinking: extra network interfaces that you assume > won't be used but which are and which could then be overcome by use > of an explicit > > --mca btl_tcp_if_exclude virbr0 > > or some such construction (virbr0 used as an example here). > > Kevin