Hi

I've been fighting trying to run comparitive test of IMB using OpenMPI 1.6.3 on the same node using an Intel Truescale card and the onboard Ethernet.

Turns out that all of the problems were due to the IP v6 addresses being firewalled on the nodes but OpenMPI was trying to use the IPv6 addresses of the nodes in spite of me explicitly specifying the IP v4 address as in the following example:

mpirun --mca btl ^openib --mca mtl ^psm --mca btl_tcp_if_include eth0 --mca btl_tcp_if_include 10.141.0.0/16 --mca btl_base_verbose 30 -np 2 --hostfile ./hostfile ./IMB-MPI1 pingpong
. . .
[node041:16301] select: initializing btl component tcp
[node041:16301] btl: tcp: Searching for include address+prefix: 10.141.0.0 / 16
[node041:16301] btl: tcp: Found match: 10.141.0.41 (eth0)
[node041:16301] select: init of component tcp returned success
[node041:16301] btl: tcp: attempting to connect() to address 2002:bccb:3a13:141:225:90ff:fe58:5986 on port 4

When I tried to exclude the IP v6 addresses as well I'm told that --mca btl_tcp_if_include and --mca btl_tcp_if_exclude are mutually exclusive so I assume that this is not the expected behaviour.

I also cannot find a command line switch in the documentation to disable IPv6 or IPv4.

To fix this I first manually deleted the ipv6 addresses on the two nodes and it worked as expected. I then reenabled them unfirewalled the v6 addresses and it also worked correctly using those (in spite of specifying the IPv4 address explicitly).

This is all running on Scientific Linux release 6.3

I haven't tried to reproduce this on a node without a TrueScale card in but I do not seem why this would make any difference to the tcp component.

Thanks

Antony

Reply via email to