On Oct 5, 2008, at 1:22 PM, Lenny Verkhovsky wrote:

you should probably use -mca tcp,self -mca btl_openib_if_include ib0.8109


Really? I thought we only took OpenFabrics device names in the openib_if_include MCA param...? It looks like ib0.8109 is an IPoIB device name.


Lenny.


On 10/3/08, Matt Burgess <burgess.m...@gmail.com> wrote:
Hi,


I'm trying to get openmpi working over openib partitions. On this cluster, the partition number is 0x109. The ib interfaces are pingable over the appropriate ib0.8109 interface:

d2:/opt/openmpi-ib # ifconfig ib0.8109
ib0.8109 Link encap:UNSPEC HWaddr 80-00-00-4A- FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.21.48.2  Bcast:10.21.255.255  Mask:255.255.0.0
          inet6 addr: fe80::202:c902:26:ca01/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:16811 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15848 errors:0 dropped:1 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:102229428 (97.4 Mb)  TX bytes:102324172 (97.5 Mb)


I have tried the following:

/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile machinefile -mca btl openib,self -mca btl_openib_max_btls 1 -mca btl_openib_ib_pkey_val 0x8109 -mca btl_openib_ib_pkey_ix 1 /cluster/ pallas/x86_64-ib/IMB-MPI1

but I just get a RETRY EXCEEDED ERROR. Is there a MCA parameter I am missing?

I was successful using tcp only:

/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile machinefile -mca btl tcp,self -mca btl_openib_max_btls 1 -mca btl_openib_ib_pkey_val 0x8109 /cluster/pallas/x86_64-ib/IMB-MPI1



Thanks,
Matt Burgess

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to