I'm running Intel's IMB benchmark over an InfiniBand cluster; though other 
benchmarks that Open MPI has done fine in the past are also performing 
poorly.

The cluster has DDR IB, and the fabric isn't seeing the kind of symbol errors 
that indicate a bad fabric; (non-mpi) bandwidth tests over the IB fabric are 
in the expected range.

When the number of processes in IMB becomes greater than one node can handle, 
the bandwidth reported by IMB's 'Sendrecv',  and 'Exchange'  test drops from 
1.9 GB/sec (4 process - or one process per core in the first node) to 20 
MB/sec over 8 processes (and two nodes).  

In other words, when we move from using shared memory and 'self' to an actual 
network interface, IMB reports _really_ lousy performance, lower by 30x than 
I've recorded for SDR IB.  (For the same test with a different cluster using 
SDR IB & Open MPI, I've clocked ~650 MB/sec - quite a bit higher than 20 
MB/sec)

On this cluster, however IMB's reported bandwidth remains the same from 2-36 
nodes, over DDR InfiniBand:  ~20 MB/sec

We've used the OFED 1.1.1 and 1.2 driver releases so far.

the command line is pretty simple:
mpirun -np 128 -machinefile <foo> -mca btl openib,sm,self ./IMB-MPI1

As far as I'm aware, our command-line excludes TCP/IP (and hence ethernet) 
from being used; yet we're seeing speeds that are far below the abilities of 
InfiniBand.

I've used Open MPI quite a bit, since before the 1.0 days; I've been dealing 
with IB for even longer.  (And the guy I'm writing in behalf of has used Open 
MPI on large IB systems as well).

Even when we specify that only the 'openib' module be used, we are seeing 20 
MB/sec.

Oddly enough, the management ethernet is 10/100, and 20 MB/sec seems 'in the 
same ballpark' as would be reported by IMB when 10/100 ethernet is used.

We aren't receiving any error messages from Open MPI.  (As normally you would 
when part of the fabric is down.)

So we're left a bit stumped:  We're getting speeds you would expect from 100 
Mbit ethernet, but we're specifying the IB interface, and not receiving any 
errors from Open MPI.  There isn't an unusual number of symbol errors (ie. 
errors are low, not increasing, etc.) on the IB fabric, the SM is up and 
operational.

One more tidbit that is probably insignificant, but I'll mention anyway:  We 
are running IBM's GPFS via IPoIB, so there is a little bit of IB traffic from 
GPFS - which is also a configuration we've used with no problems in the past.

Any ideas on what I can do to verify that OpenMPI is in fact using the IB 
fabric?
-- 
Troy Telford

Reply via email to