Hi

When i use a rankfile, i get an error message which i don't understand:

[jody@plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts ./HelloMPI
--------------------------------------------------------------------------
Rankfile claimed host plankton that was not allocated or
oversubscribed it's slots:

--------------------------------------------------------------------------
[plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
file rmaps_rank_file.c at line 108
[plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
file base/rmaps_base_map_job.c at line 87
[plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
file base/plm_base_launch_support.c at line 77
[plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
file plm_rsh_module.c at line 990
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished



With out the '-rf rankfile' option everything works as expected.

My hostfile :
[jody@plankton tests]$ cat testhosts
# The following node is a quad-processor machine, and we absolutely
# want to disallow over-subscribing it:
plankton slots=3  max-slots=3
# The following nodes are dual-processor machines:
nano_00  slots=2 max-slots=2
nano_01  slots=2 max-slots=2
nano_02  slots=2 max-slots=2
nano_03  slots=2 max-slots=2
nano_04  slots=2 max-slots=2
nano_05  slots=2 max-slots=2
nano_06  slots=2 max-slots=2

my rank file:
[jody@plankton neander]$ cat rankfile
rank  0=nano_00  slot=1
rank  1=plankton slot=0
rank  2=nano_01  slot=1

my Open MPI version: 1.3.2

i get the same error if i use a rankfile which has a single line
  rank  0=plankton  slot=0
(plankton is my local machine) and call mpirun with np 1

What does the "Rankfile claimed..." message mean?
Did i make an error in my rankfile?
If yes, what would be the correct way to write it?

Thank You
  Jody

Reply via email to