Hi

I could successfully use the following rankfile on Linux with
openmpi-1.6.4rc3r27923, but it doesn't work with a patched
openmpi-1.6.4rc4r28022 (patch.diff from Eugene). Perhaps this
information helps to track down the error.

tyr rankfiles 114 cat rf_ex_linpc 
# mpiexec -report-bindings -rf rf_ex_linpc hostname
rank 0=linpc0 slot=0:0-1,1:0-1
rank 1=linpc1 slot=0:0-1
rank 2=linpc1 slot=1:0
rank 3=linpc1 slot=1:1


linpc1 rankfiles 99 mpiexec -report-bindings -rf rf_ex_linpc hostname
------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots.  Please review your rank-slot
assignments and your host allocation to ensure a proper match.  Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: linpc0
------------------------------------------------------------------------

linpc1 rankfiles 100 ompi_info | grep "MPI:"
                Open MPI: 1.6.4rc4r28022
linpc1 rankfiles 101 exit



tyr rankfiles 110 ssh linpc1
linpc1 fd1026 96 cd .../prog/mpi/rankfiles/
linpc1 rankfiles 97 mpiexec -report-bindings -rf rf_ex_linpc hostname
[linpc1:21351] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
[linpc1:21351] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[linpc1:21351] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
[linpc0:08012] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]:
  [B B][B B] (slot list 0:0-1,1:0-1)

linpc1 rankfiles 98 ompi_info | grep "MPI:"
                Open MPI: 1.6.4rc3r27923
linpc1 rankfiles 99 


I will build an unpatched openmpi-1.6.4rc4 and check if the
above rankfile will work. Unfortunately I can check only tomorrow
because new packages will be mirrored in the night to all machines
so that it is not available on both machines today. I let you know
the result.


Kind regards

Siegmar

Reply via email to