*(py3.9) ➜ /share mpirun -n 2 --machinefile hosts --mca
rmaps_base_verbose 100 --mca ras_base_verbose 100 which mpirun*
[computer01:39342] mca: base: component_find: searching NULL for ras
components
[computer01:39342] mca: base: find_dyn_components: checking NULL for
ras components
[computer01:39342] pmix:mca: base: components_register: registering
framework ras components
[computer01:39342] pmix:mca: base: components_register: found loaded
component simulator
[computer01:39342] pmix:mca: base: components_register: component
simulator register function successful
[computer01:39342] pmix:mca: base: components_register: found loaded
component pbs
[computer01:39342] pmix:mca: base: components_register: component pbs
register function successful
[computer01:39342] pmix:mca: base: components_register: found loaded
component slurm
[computer01:39342] pmix:mca: base: components_register: component
slurm register function successful
[computer01:39342] mca: base: components_open: opening ras components
[computer01:39342] mca: base: components_open: found loaded component
simulator
[computer01:39342] mca: base: components_open: found loaded component pbs
[computer01:39342] mca: base: components_open: component pbs open
function successful
[computer01:39342] mca: base: components_open: found loaded component
slurm
[computer01:39342] mca: base: components_open: component slurm open
function successful
[computer01:39342] mca:base:select: Auto-selecting ras components
[computer01:39342] mca:base:select:( ras) Querying component [simulator]
[computer01:39342] mca:base:select:( ras) Querying component [pbs]
[computer01:39342] mca:base:select:( ras) Querying component [slurm]
[computer01:39342] mca:base:select:( ras) No component selected!
[computer01:39342] mca: base: component_find: searching NULL for rmaps
components
[computer01:39342] mca: base: find_dyn_components: checking NULL for
rmaps components
[computer01:39342] pmix:mca: base: components_register: registering
framework rmaps components
[computer01:39342] pmix:mca: base: components_register: found loaded
component ppr
[computer01:39342] pmix:mca: base: components_register: component ppr
register function successful
[computer01:39342] pmix:mca: base: components_register: found loaded
component rank_file
[computer01:39342] pmix:mca: base: components_register: component
rank_file has no register or open function
[computer01:39342] pmix:mca: base: components_register: found loaded
component round_robin
[computer01:39342] pmix:mca: base: components_register: component
round_robin register function successful
[computer01:39342] pmix:mca: base: components_register: found loaded
component seq
[computer01:39342] pmix:mca: base: components_register: component seq
register function successful
[computer01:39342] mca: base: components_open: opening rmaps components
[computer01:39342] mca: base: components_open: found loaded component ppr
[computer01:39342] mca: base: components_open: component ppr open
function successful
[computer01:39342] mca: base: components_open: found loaded component
rank_file
[computer01:39342] mca: base: components_open: found loaded component
round_robin
[computer01:39342] mca: base: components_open: component round_robin
open function successful
[computer01:39342] mca: base: components_open: found loaded component
seq [35/405]
[computer01:39342] mca: base: components_open: component seq open
function successful
[computer01:39342] mca:rmaps:select: checking available component ppr
[computer01:39342] mca:rmaps:select: Querying component [ppr]
[computer01:39342] mca:rmaps:select: checking available component
rank_file
[computer01:39342] mca:rmaps:select: Querying component [rank_file]
[computer01:39342] mca:rmaps:select: checking available component
round_robin
[computer01:39342] mca:rmaps:select: Querying component [round_robin]
[computer01:39342] mca:rmaps:select: checking available component seq
[computer01:39342] mca:rmaps:select: Querying component [seq]
[computer01:39342] [prterun-computer01-39342@0,0]: Final mapper priorities
[computer01:39342] Mapper: ppr Priority: 90
[computer01:39342] Mapper: seq Priority: 60
[computer01:39342] Mapper: round_robin Priority: 10
[computer01:39342] Mapper: rank_file Priority: 0
====================== ALLOCATED NODES ======================
computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
aliases: 192.168.180.48
192.168.60.203: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
Flags: SLOTS_GIVEN
aliases: NONE
=================================================================
====================== ALLOCATED NODES ======================
computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
aliases: 192.168.180.48
hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
aliases:
192.168.60.203,hepslustretest03.ihep.ac.cn,172.17.180.203,172.168.10.23,172.168.10.143
=================================================================
[computer01:39342] mca:rmaps: mapping job prterun-computer01-39342@1
[computer01:39342] mca:rmaps: setting mapping policies for job
prterun-computer01-39342@1 inherit TRUE hwtcpus FALSE
[computer01:39342] mca:rmaps[358] mapping not given - using bycore
[computer01:39342] setdefaultbinding[365] binding not given - using bycore
[computer01:39342] mca:rmaps:ppr: job prterun-computer01-39342@1 not
using ppr mapper PPR NULL policy PPR NOTSET
[computer01:39342] mca:rmaps:seq: job prterun-computer01-39342@1 not
using seq mapper
[computer01:39342] mca:rmaps:rr: mapping job prterun-computer01-39342@1
[computer01:39342] AVAILABLE NODES FOR MAPPING:
[computer01:39342] node: computer01 daemon: 0 slots_available: 1
[computer01:39342] mca:rmaps:rr: mapping by Core for job
prterun-computer01-39342@1 slots 1 num_procs 2
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
which
Either request fewer procs for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to
ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
在 2022/11/15 02:04, users-requ...@lists.open-mpi.org 写道:
Send users mailing list submissions to
users@lists.open-mpi.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.open-mpi.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
users-requ...@lists.open-mpi.org
You can reach the person managing the list at
users-ow...@lists.open-mpi.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."
Today's Topics:
1. Re: [OMPI devel] There are not enough slots available in the
system to satisfy the 2, slots that were requested by the
application (Jeff Squyres (jsquyres))
2. Re: Tracing of openmpi internal functions
(Jeff Squyres (jsquyres))
----------------------------------------------------------------------
Message: 1
Date: Mon, 14 Nov 2022 17:04:24 +0000
From: "Jeff Squyres (jsquyres)"<jsquy...@cisco.com>
To: Open MPI Users<users@lists.open-mpi.org>
Subject: Re: [OMPI users] [OMPI devel] There are not enough slots
available in the system to satisfy the 2, slots that were requested by
the application
Message-ID:
<bl0pr11mb29801261edb4fd0e9ef2f4ecc0...@bl0pr11mb2980.namprd11.prod.outlook.com>
Content-Type: text/plain; charset="utf-8"
Yes, somehow I'm not seeing all the output that I expect to see. Can you ensure that if you're copy-and-pasting from
the email, that it's actually using "dash dash" in front of "mca" and "machinefile" (vs.
a copy-and-pasted "em dash")?
--
Jeff Squyres
jsquy...@cisco.com
________________________________
From: users<users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet via
users<users@lists.open-mpi.org>
Sent: Sunday, November 13, 2022 9:18 PM
To: Open MPI Users<users@lists.open-mpi.org>
Cc: Gilles Gouaillardet<gilles.gouaillar...@gmail.com>
Subject: Re: [OMPI users] [OMPI devel] There are not enough slots available in
the system to satisfy the 2, slots that were requested by the application
There is a typo in your command line.
You should use --mca (minus minus) instead of -mca
Also, you can try --machinefile instead of -machinefile
Cheers,
Gilles
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
?mca
On Mon, Nov 14, 2022 at 11:04 AM timesir via users
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:
(py3.9) ? /share mpirun -n 2 -machinefile hosts ?mca rmaps_base_verbose 100
--mca ras_base_verbose 100 which mpirun
[computer01:04570] mca: base: component_find: searching NULL for ras components
[computer01:04570] mca: base: find_dyn_components: checking NULL for ras
components
[computer01:04570] pmix:mca: base: components_register: registering framework
ras components
[computer01:04570] pmix:mca: base: components_register: found loaded component
simulator
[computer01:04570] pmix:mca: base: components_register: component simulator
register function successful
[computer01:04570] pmix:mca: base: components_register: found loaded component
pbs
[computer01:04570] pmix:mca: base: components_register: component pbs register
function successful
[computer01:04570] pmix:mca: base: components_register: found loaded component
slurm
[computer01:04570] pmix:mca: base: components_register: component slurm
register function successful
[computer01:04570] mca: base: components_open: opening ras components
[computer01:04570] mca: base: components_open: found loaded component simulator
[computer01:04570] mca: base: components_open: found loaded component pbs
[computer01:04570] mca: base: components_open: component pbs open function
successful
[computer01:04570] mca: base: components_open: found loaded component slurm
[computer01:04570] mca: base: components_open: component slurm open function
successful
[computer01:04570] mca:base:select: Auto-selecting ras components
[computer01:04570] mca:base:select:( ras) Querying component [simulator]
[computer01:04570] mca:base:select:( ras) Querying component [pbs]
[computer01:04570] mca:base:select:( ras) Querying component [slurm]
[computer01:04570] mca:base:select:( ras) No component selected!
====================== ALLOCATED NODES ======================
[10/1444]
computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
aliases: 192.168.180.48
192.168.60.203<http://192.168.60.203>: slots=1 max_slots=0 slots_inuse=0
state=UNKNOWN
Flags: SLOTS_GIVEN
aliases: NONE
=================================================================
====================== ALLOCATED NODES ======================
computer01: slots=1 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
aliases: 192.168.180.48
hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
aliases: 192.168.60.203,172.17.180.203,172.168.10.23,172.168.10.143
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
?mca
Either request fewer procs for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
? 2022/11/13 23:42, Jeff Squyres (jsquyres) ??:
Interesting. It says:
[computer01:106117] AVAILABLE NODES FOR MAPPING:
[computer01:106117] node: computer01 daemon: 0 slots_available: 1
This is why it tells you you're out of slots: you're asking for 2, but it only
found 1. This means it's not seeing your hostfile somehow.
I should have asked you to run with 2? variables last time -- can you re-run with
"mpirun --mca rmaps_base_verbose 100 --mca ras_base_verbose 100 ..."?
Turning on the RAS verbosity should show us what the hostfile component is
doing.
--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
________________________________
From: ??<mrlong...@gmail.com><mailto:mrlong...@gmail.com>
Sent: Sunday, November 13, 2022 3:13 AM
To: Jeff Squyres (jsquyres)<jsquy...@cisco.com><mailto:jsquy...@cisco.com>; Open MPI
Users<users@lists.open-mpi.org><mailto:users@lists.open-mpi.org>
Subject: Re: [OMPI devel] There are not enough slots available in the system to
satisfy the 2, slots that were requested by the application
(py3.9) ? /share mpirun ?version
mpirun (Open MPI) 5.0.0rc9
Report bugs tohttps://www.open-mpi.org/community/help/
(py3.9) ? /share cat hosts
192.168.180.48 slots=1
192.168.60.203 slots=1
(py3.9) ? /share mpirun -n 2 -machinefile hosts ?mca rmaps_base_verbose 100
which mpirun
[computer01:106117] mca: base: component_find: searching NULL for rmaps
components
[computer01:106117] mca: base: find_dyn_components: checking NULL for rmaps
components
[computer01:106117] pmix:mca: base: components_register: registering framework
rmaps components
[computer01:106117] pmix:mca: base: components_register: found loaded component
ppr
[computer01:106117] pmix:mca: base: components_register: component ppr register
function successful
[computer01:106117] pmix:mca: base: components_register: found loaded component
rank_file
[computer01:106117] pmix:mca: base: components_register: component rank_file
has no register or open function
[computer01:106117] pmix:mca: base: components_register: found loaded component
round_robin
[computer01:106117] pmix:mca: base: components_register: component round_robin
register function successful
[computer01:106117] pmix:mca: base: components_register: found loaded component
seq
[computer01:106117] pmix:mca: base: components_register: component seq register
function successful
[computer01:106117] mca: base: components_open: opening rmaps components
[computer01:106117] mca: base: components_open: found loaded component ppr
[computer01:106117] mca: base: components_open: component ppr open function
successful
[computer01:106117] mca: base: components_open: found loaded component rank_file
[computer01:106117] mca: base: components_open: found loaded component
round_robin
[computer01:106117] mca: base: components_open: component round_robin open
function successful
[computer01:106117] mca: base: components_open: found loaded component seq
[computer01:106117] mca: base: components_open: component seq open function
successful
[computer01:106117] mca:rmaps:select: checking available component ppr
[computer01:106117] mca:rmaps:select: Querying component [ppr]
[computer01:106117] mca:rmaps:select: checking available component rank_file
[computer01:106117] mca:rmaps:select: Querying component [rank_file]
[computer01:106117] mca:rmaps:select: checking available component round_robin
[computer01:106117] mca:rmaps:select: Querying component [round_robin]
[computer01:106117] mca:rmaps:select: checking available component seq
[computer01:106117] mca:rmaps:select: Querying component [seq]
[computer01:106117] [prterun-computer01-106117@0,0]: Final mapper priorities
[computer01:106117] Mapper: ppr Priority: 90
[computer01:106117] Mapper: seq Priority: 60
[computer01:106117] Mapper: round_robin Priority: 10
[computer01:106117] Mapper: rank_file Priority: 0
[computer01:106117] mca:rmaps: mapping job prterun-computer01-106117@1
[computer01:106117] mca:rmaps: setting mapping policies for job
prterun-computer01-106117@1 inherit TRUE hwtcpus FALSE [9/1957]
[computer01:106117] mca:rmaps[358] mapping not given - using bycore
[computer01:106117] setdefaultbinding[365] binding not given - using bycore
[computer01:106117] mca:rmaps:ppr: job prterun-computer01-106117@1 not using
ppr mapper PPR NULL policy PPR NOTSET
[computer01:106117] mca:rmaps:seq: job prterun-computer01-106117@1 not using
seq mapper
[computer01:106117] mca:rmaps:rr: mapping job prterun-computer01-106117@1
[computer01:106117] AVAILABLE NODES FOR MAPPING:
[computer01:106117] node: computer01 daemon: 0 slots_available: 1
[computer01:106117] mca:rmaps:rr: mapping by Core for job
prterun-computer01-106117@1 slots 1 num_procs 2
________________________________
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
which
Either request fewer procs for your application, or make more slots
available for use.
A ?slot? is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via ?slots=N? clauses (N defaults to number of
processor cores if not provided)
2. The ?host command line parameter, via a ?:N? suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the ?host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
?use-hwthread-cpus option.
Alternatively, you can use the ?map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
________________________________
? 2022/11/8 05:46, Jeff Squyres (jsquyres) ??:
In the future, can you please just mail one of the lists? This particular
question is probably more of a users type of question (since we're not talking
about the internals of Open MPI itself), so I'll reply just on the users list.
For what it's worth, I'm unable to replicate your error:
$ mpirun --version
mpirun (Open MPI) 5.0.0rc9
Report bugs tohttps://www.open-mpi.org/community/help/
$ cat hostfile
mpi002 slots=1
mpi005 slots=1
$ mpirun -n 2 --machinefile hostfile hostname
mpi002
mpi005
Can you try running with "--mca rmaps_base_verbose 100" so that we can get some
debugging output and see why the slots aren't working for you? Show the full output,
like I did above (e.g., cat the hostfile, and then mpirun with the MCA param and all the
output). Thanks!
--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
________________________________
From: devel<devel-boun...@lists.open-mpi.org><mailto:devel-boun...@lists.open-mpi.org> on
behalf of mrlong via devel<de...@lists.open-mpi.org><mailto:de...@lists.open-mpi.org>
Sent: Monday, November 7, 2022 3:37 AM
To:de...@lists.open-mpi.org<mailto:de...@lists.open-mpi.org>
<de...@lists.open-mpi.org><mailto:de...@lists.open-mpi.org>; Open MPI
Users<users@lists.open-mpi.org><mailto:users@lists.open-mpi.org>
Cc: mrlong<mrlong...@gmail.com><mailto:mrlong...@gmail.com>
Subject: [OMPI devel] There are not enough slots available in the system to
satisfy the 2, slots that were requested by the application
Two machines, each with 64 cores. The contents of the hosts file are:
192.168.180.48 slots=1
192.168.60.203 slots=1
Why do you get the following error when running with openmpi 5.0.0rc9?
(py3.9) [user@machine01 share]0.5692263713929891nbsp; mpirun -n 2 --machinefile
hosts hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
hostname
Either request fewer procs for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:<https://lists.open-mpi.org/mailman/private/users/attachments/20221114/2c75fc85/attachment.html>
------------------------------
Message: 2
Date: Mon, 14 Nov 2022 18:04:06 +0000
From: "Jeff Squyres (jsquyres)"<jsquy...@cisco.com>
To:"users@lists.open-mpi.org" <users@lists.open-mpi.org>
Cc: arun c<arun.edar...@gmail.com>
Subject: Re: [OMPI users] Tracing of openmpi internal functions
Message-ID:
<bl0pr11mb2980b144bc115f202701558dc0...@bl0pr11mb2980.namprd11.prod.outlook.com>
Content-Type: text/plain; charset="us-ascii"
Open MPI uses plug-in modules for its implementations of the MPI collective
algorithms. From that perspective, once you understand that infrastructure,
it's exactly the same regardless of whether the MPI job is using intra-node or
inter-node collectives.
We don't have much in the way of detailed internal function call tracing inside
Open MPI itself, due to performance considerations. You might want to look
into flamegraphs, or something similar...?
--
Jeff Squyres
jsquy...@cisco.com
________________________________
From: users<users-boun...@lists.open-mpi.org> on behalf of arun c via
users<users@lists.open-mpi.org>
Sent: Saturday, November 12, 2022 9:46 AM
To:users@lists.open-mpi.org <users@lists.open-mpi.org>
Cc: arun c<arun.edar...@gmail.com>
Subject: [OMPI users] Tracing of openmpi internal functions
Hi All,
I am new to openmpi and trying to learn the internals (source code
level) of data transfer during collective operations. At first, I will
limit it to intra-node (between cpu cores, and sockets) to minimize
the scope of learning.
What are the best options (Looking for only free and open methods) for
tracing the openmpi code? (say I want to execute alltoall collective
and trace all the function calls and event callbacks that happened
inside the libmpi.so on all the cores)
Linux kernel has something called ftrace, it gives a neat call graph
of all the internal functions inside the kernel with time, is
something similar available?
--Arun
-------------- next part --------------
An HTML attachment was scrubbed...
URL:<https://lists.open-mpi.org/mailman/private/users/attachments/20221114/0c9d0e69/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
------------------------------
End of users Digest, Vol 4818, Issue 1
**************************************