Hi
I've been fighting trying to run comparitive test of IMB using OpenMPI
1.6.3 on the same node using an Intel Truescale card and the onboard
Ethernet.
Turns out that all of the problems were due to the IP v6 addresses being
firewalled on the nodes but OpenMPI was trying to use the IPv6 addresses
of the nodes in spite of me explicitly specifying the IP v4 address as
in the following example:
mpirun --mca btl ^openib --mca mtl ^psm --mca btl_tcp_if_include eth0
--mca btl_tcp_if_include 10.141.0.0/16 --mca btl_base_verbose 30 -np 2
--hostfile ./hostfile ./IMB-MPI1 pingpong
. . .
[node041:16301] select: initializing btl component tcp
[node041:16301] btl: tcp: Searching for include address+prefix:
10.141.0.0 / 16
[node041:16301] btl: tcp: Found match: 10.141.0.41 (eth0)
[node041:16301] select: init of component tcp returned success
[node041:16301] btl: tcp: attempting to connect() to address
2002:bccb:3a13:141:225:90ff:fe58:5986 on port 4
When I tried to exclude the IP v6 addresses as well I'm told that --mca
btl_tcp_if_include and --mca btl_tcp_if_exclude are mutually exclusive
so I assume that this is not the expected behaviour.
I also cannot find a command line switch in the documentation to disable
IPv6 or IPv4.
To fix this I first manually deleted the ipv6 addresses on the two nodes
and it worked as expected. I then reenabled them unfirewalled the v6
addresses and it also worked correctly using those (in spite of
specifying the IPv4 address explicitly).
This is all running on Scientific Linux release 6.3
I haven't tried to reproduce this on a node without a TrueScale card in
but I do not seem why this would make any difference to the tcp component.
Thanks
Antony