Yes, somehow I'm not seeing all the output that I expect to see.  Can you 
ensure that if you're copy-and-pasting from the email, that it's actually using 
"dash dash" in front of "mca" and "machinefile" (vs. a copy-and-pasted "em 
dash")?

--
Jeff Squyres
jsquy...@cisco.com
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet 
via users <users@lists.open-mpi.org>
Sent: Sunday, November 13, 2022 9:18 PM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
Subject: Re: [OMPI users] [OMPI devel] There are not enough slots available in 
the system to satisfy the 2, slots that were requested by the application

There is a typo in your command line.
You should use --mca (minus minus) instead of -mca

Also, you can try --machinefile instead of -machinefile

Cheers,

Gilles

There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

  –mca

On Mon, Nov 14, 2022 at 11:04 AM timesir via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

(py3.9) ➜  /share  mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose 100 
--mca ras_base_verbose 100  which mpirun
[computer01:04570] mca: base: component_find: searching NULL for ras components
[computer01:04570] mca: base: find_dyn_components: checking NULL for ras 
components
[computer01:04570] pmix:mca: base: components_register: registering framework 
ras components
[computer01:04570] pmix:mca: base: components_register: found loaded component 
simulator
[computer01:04570] pmix:mca: base: components_register: component simulator 
register function successful
[computer01:04570] pmix:mca: base: components_register: found loaded component 
pbs
[computer01:04570] pmix:mca: base: components_register: component pbs register 
function successful
[computer01:04570] pmix:mca: base: components_register: found loaded component 
slurm
[computer01:04570] pmix:mca: base: components_register: component slurm 
register function successful
[computer01:04570] mca: base: components_open: opening ras components
[computer01:04570] mca: base: components_open: found loaded component simulator
[computer01:04570] mca: base: components_open: found loaded component pbs
[computer01:04570] mca: base: components_open: component pbs open function 
successful
[computer01:04570] mca: base: components_open: found loaded component slurm
[computer01:04570] mca: base: components_open: component slurm open function 
successful
[computer01:04570] mca:base:select: Auto-selecting ras components
[computer01:04570] mca:base:select:(  ras) Querying component [simulator]
[computer01:04570] mca:base:select:(  ras) Querying component [pbs]
[computer01:04570] mca:base:select:(  ras) Querying component [slurm]
[computer01:04570] mca:base:select:(  ras) No component selected!

======================   ALLOCATED NODES   ======================               
                                                                   [10/1444]
    computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
        Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
        aliases: 192.168.180.48
    192.168.60.203<http://192.168.60.203>: slots=1 max_slots=0 slots_inuse=0 
state=UNKNOWN
        Flags: SLOTS_GIVEN
        aliases: NONE
=================================================================

======================   ALLOCATED NODES   ======================
    computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
        Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
        aliases: 192.168.180.48
    hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP
        Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
        aliases: 192.168.60.203,172.17.180.203,172.168.10.23,172.168.10.143
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

  –mca

Either request fewer procs for your application, or make more slots
available for use.

A "slot" is the PRRTE term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which PRRTE processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, PRRTE defaults to the number of processor cores

In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------



在 2022/11/13 23:42, Jeff Squyres (jsquyres) 写道:
Interesting.  It says:

[computer01:106117] AVAILABLE NODES FOR MAPPING:
[computer01:106117] node: computer01 daemon: 0 slots_available: 1

This is why it tells you you're out of slots: you're asking for 2, but it only 
found 1.  This means it's not seeing your hostfile somehow.

I should have asked you to run with 2​ variables last time -- can you re-run 
with "mpirun --mca rmaps_base_verbose 100 --mca ras_base_verbose 100 ..."?

Turning on the RAS verbosity should show us what the hostfile component is 
doing.

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
________________________________
From: 龙龙 <mrlong...@gmail.com><mailto:mrlong...@gmail.com>
Sent: Sunday, November 13, 2022 3:13 AM
To: Jeff Squyres (jsquyres) <jsquy...@cisco.com><mailto:jsquy...@cisco.com>; 
Open MPI Users <users@lists.open-mpi.org><mailto:users@lists.open-mpi.org>
Subject: Re: [OMPI devel] There are not enough slots available in the system to 
satisfy the 2, slots that were requested by the application


(py3.9) ➜ /share mpirun –version

mpirun (Open MPI) 5.0.0rc9

Report bugs to https://www.open-mpi.org/community/help/

(py3.9) ➜ /share cat hosts

192.168.180.48 slots=1
192.168.60.203 slots=1

(py3.9) ➜ /share mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose 100 
which mpirun

[computer01:106117] mca: base: component_find: searching NULL for rmaps 
components
[computer01:106117] mca: base: find_dyn_components: checking NULL for rmaps 
components
[computer01:106117] pmix:mca: base: components_register: registering framework 
rmaps components
[computer01:106117] pmix:mca: base: components_register: found loaded component 
ppr
[computer01:106117] pmix:mca: base: components_register: component ppr register 
function successful
[computer01:106117] pmix:mca: base: components_register: found loaded component 
rank_file
[computer01:106117] pmix:mca: base: components_register: component rank_file 
has no register or open function
[computer01:106117] pmix:mca: base: components_register: found loaded component 
round_robin
[computer01:106117] pmix:mca: base: components_register: component round_robin 
register function successful
[computer01:106117] pmix:mca: base: components_register: found loaded component 
seq
[computer01:106117] pmix:mca: base: components_register: component seq register 
function successful
[computer01:106117] mca: base: components_open: opening rmaps components
[computer01:106117] mca: base: components_open: found loaded component ppr
[computer01:106117] mca: base: components_open: component ppr open function 
successful
[computer01:106117] mca: base: components_open: found loaded component rank_file
[computer01:106117] mca: base: components_open: found loaded component 
round_robin
[computer01:106117] mca: base: components_open: component round_robin open 
function successful
[computer01:106117] mca: base: components_open: found loaded component seq
[computer01:106117] mca: base: components_open: component seq open function 
successful
[computer01:106117] mca:rmaps:select: checking available component ppr
[computer01:106117] mca:rmaps:select: Querying component [ppr]
[computer01:106117] mca:rmaps:select: checking available component rank_file
[computer01:106117] mca:rmaps:select: Querying component [rank_file]
[computer01:106117] mca:rmaps:select: checking available component round_robin
[computer01:106117] mca:rmaps:select: Querying component [round_robin]
[computer01:106117] mca:rmaps:select: checking available component seq
[computer01:106117] mca:rmaps:select: Querying component [seq]
[computer01:106117] [prterun-computer01-106117@0,0]: Final mapper priorities
[computer01:106117] Mapper: ppr Priority: 90
[computer01:106117] Mapper: seq Priority: 60
[computer01:106117] Mapper: round_robin Priority: 10
[computer01:106117] Mapper: rank_file Priority: 0
[computer01:106117] mca:rmaps: mapping job prterun-computer01-106117@1

[computer01:106117] mca:rmaps: setting mapping policies for job 
prterun-computer01-106117@1 inherit TRUE hwtcpus FALSE [9/1957]
[computer01:106117] mca:rmaps[358] mapping not given - using bycore
[computer01:106117] setdefaultbinding[365] binding not given - using bycore
[computer01:106117] mca:rmaps:ppr: job prterun-computer01-106117@1 not using 
ppr mapper PPR NULL policy PPR NOTSET
[computer01:106117] mca:rmaps:seq: job prterun-computer01-106117@1 not using 
seq mapper
[computer01:106117] mca:rmaps:rr: mapping job prterun-computer01-106117@1
[computer01:106117] AVAILABLE NODES FOR MAPPING:
[computer01:106117] node: computer01 daemon: 0 slots_available: 1
[computer01:106117] mca:rmaps:rr: mapping by Core for job 
prterun-computer01-106117@1 slots 1 num_procs 2

________________________________

There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

which

Either request fewer procs for your application, or make more slots
available for use.

A “slot” is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:

  1.  Hostfile, via “slots=N” clauses (N defaults to number of
processor cores if not provided)
  2.  The –host command line parameter, via a “:N” suffix on the
hostname (N defaults to 1 if not provided)
  3.  Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4.  If none of a hostfile, the –host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores

In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
–use-hwthread-cpus option.

Alternatively, you can use the –map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.

________________________________
在 2022/11/8 05:46, Jeff Squyres (jsquyres) 写道:
In the future, can you please just mail one of the lists?  This particular 
question is probably more of a users type of question (since we're not talking 
about the internals of Open MPI itself), so I'll reply just on the users list.

For what it's worth, I'm unable to replicate your error:


$ mpirun --version

mpirun (Open MPI) 5.0.0rc9


Report bugs to https://www.open-mpi.org/community/help/

$ cat hostfile

mpi002 slots=1

mpi005 slots=1

$ mpirun -n 2 --machinefile hostfile hostname

mpi002

mpi005

Can you try running with "--mca rmaps_base_verbose 100" so that we can get some 
debugging output and see why the slots aren't working for you?  Show the full 
output, like I did above (e.g., cat the hostfile, and then mpirun with the MCA 
param and all the output).  Thanks!

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
________________________________
From: devel 
<devel-boun...@lists.open-mpi.org><mailto:devel-boun...@lists.open-mpi.org> on 
behalf of mrlong via devel 
<de...@lists.open-mpi.org><mailto:de...@lists.open-mpi.org>
Sent: Monday, November 7, 2022 3:37 AM
To: de...@lists.open-mpi.org<mailto:de...@lists.open-mpi.org> 
<de...@lists.open-mpi.org><mailto:de...@lists.open-mpi.org>; Open MPI Users 
<users@lists.open-mpi.org><mailto:users@lists.open-mpi.org>
Cc: mrlong <mrlong...@gmail.com><mailto:mrlong...@gmail.com>
Subject: [OMPI devel] There are not enough slots available in the system to 
satisfy the 2, slots that were requested by the application


Two machines, each with 64 cores. The contents of the hosts file are:

192.168.180.48 slots=1
192.168.60.203 slots=1

Why do you get the following error when running with openmpi 5.0.0rc9?

(py3.9) [user@machine01 share]0.5692263713929891nbsp; mpirun -n 2 --machinefile 
hosts hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

  hostname

Either request fewer procs for your application, or make more slots
available for use.

A "slot" is the PRRTE term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which PRRTE processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, PRRTE defaults to the number of processor cores

In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.

Reply via email to