do you receive my email? timesir <mrlong...@gmail.com> 于2022年11月15日周二 12:33写道:
> *(py3.9) ➜ /share mpirun -n 2 --machinefile hosts --mca > rmaps_base_verbose 100 --mca ras_base_verbose 100 which mpirun* > [computer01:39342] mca: base: component_find: searching NULL for ras > components > [computer01:39342] mca: base: find_dyn_components: checking NULL for ras > components > [computer01:39342] pmix:mca: base: components_register: registering > framework ras components > [computer01:39342] pmix:mca: base: components_register: found loaded > component simulator > [computer01:39342] pmix:mca: base: components_register: component > simulator register function successful > [computer01:39342] pmix:mca: base: components_register: found loaded > component pbs > [computer01:39342] pmix:mca: base: components_register: component pbs > register function successful > [computer01:39342] pmix:mca: base: components_register: found loaded > component slurm > [computer01:39342] pmix:mca: base: components_register: component slurm > register function successful > [computer01:39342] mca: base: components_open: opening ras components > [computer01:39342] mca: base: components_open: found loaded component > simulator > [computer01:39342] mca: base: components_open: found loaded component pbs > [computer01:39342] mca: base: components_open: component pbs open function > successful > [computer01:39342] mca: base: components_open: found loaded component slurm > [computer01:39342] mca: base: components_open: component slurm open > function successful > [computer01:39342] mca:base:select: Auto-selecting ras components > [computer01:39342] mca:base:select:( ras) Querying component [simulator] > [computer01:39342] mca:base:select:( ras) Querying component [pbs] > [computer01:39342] mca:base:select:( ras) Querying component [slurm] > [computer01:39342] mca:base:select:( ras) No component selected! > [computer01:39342] mca: base: component_find: searching NULL for rmaps > components > [computer01:39342] mca: base: find_dyn_components: checking NULL for rmaps > components > [computer01:39342] pmix:mca: base: components_register: registering > framework rmaps components > [computer01:39342] pmix:mca: base: components_register: found loaded > component ppr > [computer01:39342] pmix:mca: base: components_register: component ppr > register function successful > [computer01:39342] pmix:mca: base: components_register: found loaded > component rank_file > [computer01:39342] pmix:mca: base: components_register: component > rank_file has no register or open function > [computer01:39342] pmix:mca: base: components_register: found loaded > component round_robin > [computer01:39342] pmix:mca: base: components_register: component > round_robin register function successful > [computer01:39342] pmix:mca: base: components_register: found loaded > component seq > [computer01:39342] pmix:mca: base: components_register: component seq > register function successful > [computer01:39342] mca: base: components_open: opening rmaps components > [computer01:39342] mca: base: components_open: found loaded component ppr > [computer01:39342] mca: base: components_open: component ppr open function > successful > [computer01:39342] mca: base: components_open: found loaded component > rank_file > [computer01:39342] mca: base: components_open: found loaded component > round_robin > [computer01:39342] mca: base: components_open: component round_robin open > function successful > [computer01:39342] mca: base: components_open: found loaded component > seq > [35/405] > [computer01:39342] mca: base: components_open: component seq open function > successful > [computer01:39342] mca:rmaps:select: checking available component ppr > [computer01:39342] mca:rmaps:select: Querying component [ppr] > [computer01:39342] mca:rmaps:select: checking available component rank_file > [computer01:39342] mca:rmaps:select: Querying component [rank_file] > [computer01:39342] mca:rmaps:select: checking available component > round_robin > [computer01:39342] mca:rmaps:select: Querying component [round_robin] > [computer01:39342] mca:rmaps:select: checking available component seq > [computer01:39342] mca:rmaps:select: Querying component [seq] > [computer01:39342] [prterun-computer01-39342@0,0]: Final mapper priorities > [computer01:39342] Mapper: ppr Priority: 90 > [computer01:39342] Mapper: seq Priority: 60 > [computer01:39342] Mapper: round_robin Priority: 10 > [computer01:39342] Mapper: rank_file Priority: 0 > > ====================== ALLOCATED NODES ====================== > computer01: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.180.48 > 192.168.60.203: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > Flags: SLOTS_GIVEN > aliases: NONE > ================================================================= > > ====================== ALLOCATED NODES ====================== > computer01: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.180.48 > hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.60.203,hepslustretest03.ihep.ac.cn > ,172.17.180.203,172.168.10.23,172.168.10.143 > ================================================================= > [computer01:39342] mca:rmaps: mapping job prterun-computer01-39342@1 > [computer01:39342] mca:rmaps: setting mapping policies for job > prterun-computer01-39342@1 inherit TRUE hwtcpus FALSE > [computer01:39342] mca:rmaps[358] mapping not given - using bycore > [computer01:39342] setdefaultbinding[365] binding not given - using bycore > [computer01:39342] mca:rmaps:ppr: job prterun-computer01-39342@1 not > using ppr mapper PPR NULL policy PPR NOTSET > [computer01:39342] mca:rmaps:seq: job prterun-computer01-39342@1 not > using seq mapper > [computer01:39342] mca:rmaps:rr: mapping job prterun-computer01-39342@1 > [computer01:39342] AVAILABLE NODES FOR MAPPING: > [computer01:39342] node: computer01 daemon: 0 slots_available: 1 > [computer01:39342] mca:rmaps:rr: mapping by Core for job > prterun-computer01-39342@1 slots 1 num_procs 2 > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > which > > Either request fewer procs for your application, or make more slots > available for use. > > A "slot" is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via "slots=N" clauses (N defaults to number of > processor cores if not provided) > 2. The --host command line parameter, via a ":N" suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the --host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > --use-hwthread-cpus option. > > Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > -------------------------------------------------------------------------- > 在 2022/11/15 02:04, users-requ...@lists.open-mpi.org 写道: > > Send users mailing list submissions to > users@lists.open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.open-mpi.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@lists.open-mpi.org > > You can reach the person managing the list at > users-ow...@lists.open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: [OMPI devel] There are not enough slots available in the > system to satisfy the 2, slots that were requested by the > application (Jeff Squyres (jsquyres)) > 2. Re: Tracing of openmpi internal functions > (Jeff Squyres (jsquyres)) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 14 Nov 2022 17:04:24 +0000 > From: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> <jsquy...@cisco.com> > To: Open MPI Users <users@lists.open-mpi.org> <users@lists.open-mpi.org> > Subject: Re: [OMPI users] [OMPI devel] There are not enough slots > available in the system to satisfy the 2, slots that were requested by > the application > Message-ID: > > <bl0pr11mb29801261edb4fd0e9ef2f4ecc0...@bl0pr11mb2980.namprd11.prod.outlook.com> > > <bl0pr11mb29801261edb4fd0e9ef2f4ecc0...@bl0pr11mb2980.namprd11.prod.outlook.com> > > Content-Type: text/plain; charset="utf-8" > > Yes, somehow I'm not seeing all the output that I expect to see. Can you > ensure that if you're copy-and-pasting from the email, that it's actually > using "dash dash" in front of "mca" and "machinefile" (vs. a copy-and-pasted > "em dash")? > > -- > Jeff squyresjsquy...@cisco.com > ________________________________ > From: users <users-boun...@lists.open-mpi.org> > <users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet via users > <users@lists.open-mpi.org> <users@lists.open-mpi.org> > Sent: Sunday, November 13, 2022 9:18 PM > To: Open MPI Users <users@lists.open-mpi.org> <users@lists.open-mpi.org> > Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > <gilles.gouaillar...@gmail.com> > Subject: Re: [OMPI users] [OMPI devel] There are not enough slots available > in the system to satisfy the 2, slots that were requested by the application > > There is a typo in your command line. > You should use --mca (minus minus) instead of -mca > > Also, you can try --machinefile instead of -machinefile > > Cheers, > > Gilles > > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > ?mca > > On Mon, Nov 14, 2022 at 11:04 AM timesir via users > <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> > <users@lists.open-mpi.org>> wrote: > > (py3.9) ? /share mpirun -n 2 -machinefile hosts ?mca rmaps_base_verbose 100 > --mca ras_base_verbose 100 which mpirun > [computer01:04570] mca: base: component_find: searching NULL for ras > components > [computer01:04570] mca: base: find_dyn_components: checking NULL for ras > components > [computer01:04570] pmix:mca: base: components_register: registering framework > ras components > [computer01:04570] pmix:mca: base: components_register: found loaded > component simulator > [computer01:04570] pmix:mca: base: components_register: component simulator > register function successful > [computer01:04570] pmix:mca: base: components_register: found loaded > component pbs > [computer01:04570] pmix:mca: base: components_register: component pbs > register function successful > [computer01:04570] pmix:mca: base: components_register: found loaded > component slurm > [computer01:04570] pmix:mca: base: components_register: component slurm > register function successful > [computer01:04570] mca: base: components_open: opening ras components > [computer01:04570] mca: base: components_open: found loaded component > simulator > [computer01:04570] mca: base: components_open: found loaded component pbs > [computer01:04570] mca: base: components_open: component pbs open function > successful > [computer01:04570] mca: base: components_open: found loaded component slurm > [computer01:04570] mca: base: components_open: component slurm open function > successful > [computer01:04570] mca:base:select: Auto-selecting ras components > [computer01:04570] mca:base:select:( ras) Querying component [simulator] > [computer01:04570] mca:base:select:( ras) Querying component [pbs] > [computer01:04570] mca:base:select:( ras) Querying component [slurm] > [computer01:04570] mca:base:select:( ras) No component selected! > > ====================== ALLOCATED NODES ====================== > [10/1444] > computer01: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.180.48 > 192.168.60.203<http://192.168.60.203> <http://192.168.60.203>: slots=1 > max_slots=0 slots_inuse=0 state=UNKNOWN > Flags: SLOTS_GIVEN > aliases: NONE > ================================================================= > > ====================== ALLOCATED NODES ====================== > computer01: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.180.48 > hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.60.203,172.17.180.203,172.168.10.23,172.168.10.143 > ================================================================= > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > ?mca > > Either request fewer procs for your application, or make more slots > available for use. > > A "slot" is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via "slots=N" clauses (N defaults to number of > processor cores if not provided) > 2. The --host command line parameter, via a ":N" suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the --host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > --use-hwthread-cpus option. > > Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > -------------------------------------------------------------------------- > > > > ? 2022/11/13 23:42, Jeff Squyres (jsquyres) ??: > Interesting. It says: > > [computer01:106117] AVAILABLE NODES FOR MAPPING: > [computer01:106117] node: computer01 daemon: 0 slots_available: 1 > > This is why it tells you you're out of slots: you're asking for 2, but it > only found 1. This means it's not seeing your hostfile somehow. > > I should have asked you to run with 2? variables last time -- can you re-run > with "mpirun --mca rmaps_base_verbose 100 --mca ras_base_verbose 100 ..."? > > Turning on the RAS verbosity should show us what the hostfile component is > doing. > > -- > Jeff squyresjsquy...@cisco.com<mailto:jsquy...@cisco.com> <jsquy...@cisco.com> > ________________________________ > From: ?? <mrlong...@gmail.com> > <mrlong...@gmail.com><mailto:mrlong...@gmail.com> <mrlong...@gmail.com> > Sent: Sunday, November 13, 2022 3:13 AM > To: Jeff Squyres (jsquyres) <jsquy...@cisco.com> > <jsquy...@cisco.com><mailto:jsquy...@cisco.com> <jsquy...@cisco.com>; Open > MPI Users <users@lists.open-mpi.org> > <users@lists.open-mpi.org><mailto:users@lists.open-mpi.org> > <users@lists.open-mpi.org> > Subject: Re: [OMPI devel] There are not enough slots available in the system > to satisfy the 2, slots that were requested by the application > > > (py3.9) ? /share mpirun ?version > > mpirun (Open MPI) 5.0.0rc9 > > Report bugs to https://www.open-mpi.org/community/help/ > > (py3.9) ? /share cat hosts > > 192.168.180.48 slots=1 > 192.168.60.203 slots=1 > > (py3.9) ? /share mpirun -n 2 -machinefile hosts ?mca rmaps_base_verbose 100 > which mpirun > > [computer01:106117] mca: base: component_find: searching NULL for rmaps > components > [computer01:106117] mca: base: find_dyn_components: checking NULL for rmaps > components > [computer01:106117] pmix:mca: base: components_register: registering > framework rmaps components > [computer01:106117] pmix:mca: base: components_register: found loaded > component ppr > [computer01:106117] pmix:mca: base: components_register: component ppr > register function successful > [computer01:106117] pmix:mca: base: components_register: found loaded > component rank_file > [computer01:106117] pmix:mca: base: components_register: component rank_file > has no register or open function > [computer01:106117] pmix:mca: base: components_register: found loaded > component round_robin > [computer01:106117] pmix:mca: base: components_register: component > round_robin register function successful > [computer01:106117] pmix:mca: base: components_register: found loaded > component seq > [computer01:106117] pmix:mca: base: components_register: component seq > register function successful > [computer01:106117] mca: base: components_open: opening rmaps components > [computer01:106117] mca: base: components_open: found loaded component ppr > [computer01:106117] mca: base: components_open: component ppr open function > successful > [computer01:106117] mca: base: components_open: found loaded component > rank_file > [computer01:106117] mca: base: components_open: found loaded component > round_robin > [computer01:106117] mca: base: components_open: component round_robin open > function successful > [computer01:106117] mca: base: components_open: found loaded component seq > [computer01:106117] mca: base: components_open: component seq open function > successful > [computer01:106117] mca:rmaps:select: checking available component ppr > [computer01:106117] mca:rmaps:select: Querying component [ppr] > [computer01:106117] mca:rmaps:select: checking available component rank_file > [computer01:106117] mca:rmaps:select: Querying component [rank_file] > [computer01:106117] mca:rmaps:select: checking available component round_robin > [computer01:106117] mca:rmaps:select: Querying component [round_robin] > [computer01:106117] mca:rmaps:select: checking available component seq > [computer01:106117] mca:rmaps:select: Querying component [seq] > [computer01:106117] [prterun-computer01-106117@0,0]: Final mapper priorities > [computer01:106117] Mapper: ppr Priority: 90 > [computer01:106117] Mapper: seq Priority: 60 > [computer01:106117] Mapper: round_robin Priority: 10 > [computer01:106117] Mapper: rank_file Priority: 0 > [computer01:106117] mca:rmaps: mapping job prterun-computer01-106117@1 > > [computer01:106117] mca:rmaps: setting mapping policies for job > prterun-computer01-106117@1 inherit TRUE hwtcpus FALSE [9/1957] > [computer01:106117] mca:rmaps[358] mapping not given - using bycore > [computer01:106117] setdefaultbinding[365] binding not given - using bycore > [computer01:106117] mca:rmaps:ppr: job prterun-computer01-106117@1 not using > ppr mapper PPR NULL policy PPR NOTSET > [computer01:106117] mca:rmaps:seq: job prterun-computer01-106117@1 not using > seq mapper > [computer01:106117] mca:rmaps:rr: mapping job prterun-computer01-106117@1 > [computer01:106117] AVAILABLE NODES FOR MAPPING: > [computer01:106117] node: computer01 daemon: 0 slots_available: 1 > [computer01:106117] mca:rmaps:rr: mapping by Core for job > prterun-computer01-106117@1 slots 1 num_procs 2 > > ________________________________ > > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > which > > Either request fewer procs for your application, or make more slots > available for use. > > A ?slot? is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via ?slots=N? clauses (N defaults to number of > processor cores if not provided) > 2. The ?host command line parameter, via a ?:N? suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the ?host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > ?use-hwthread-cpus option. > > Alternatively, you can use the ?map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > > ________________________________ > ? 2022/11/8 05:46, Jeff Squyres (jsquyres) ??: > In the future, can you please just mail one of the lists? This particular > question is probably more of a users type of question (since we're not > talking about the internals of Open MPI itself), so I'll reply just on the > users list. > > For what it's worth, I'm unable to replicate your error: > > > $ mpirun --version > > mpirun (Open MPI) 5.0.0rc9 > > > Report bugs to https://www.open-mpi.org/community/help/ > > $ cat hostfile > > mpi002 slots=1 > > mpi005 slots=1 > > $ mpirun -n 2 --machinefile hostfile hostname > > mpi002 > > mpi005 > > Can you try running with "--mca rmaps_base_verbose 100" so that we can get > some debugging output and see why the slots aren't working for you? Show the > full output, like I did above (e.g., cat the hostfile, and then mpirun with > the MCA param and all the output). Thanks! > > -- > Jeff squyresjsquy...@cisco.com<mailto:jsquy...@cisco.com> <jsquy...@cisco.com> > ________________________________ > From: devel <devel-boun...@lists.open-mpi.org> > <devel-boun...@lists.open-mpi.org><mailto:devel-boun...@lists.open-mpi.org> > <devel-boun...@lists.open-mpi.org> on behalf of mrlong via devel > <de...@lists.open-mpi.org> > <de...@lists.open-mpi.org><mailto:de...@lists.open-mpi.org> > <de...@lists.open-mpi.org> > Sent: Monday, November 7, 2022 3:37 AM > To: de...@lists.open-mpi.org<mailto:de...@lists.open-mpi.org> > <de...@lists.open-mpi.org> <de...@lists.open-mpi.org> > <de...@lists.open-mpi.org><mailto:de...@lists.open-mpi.org> > <de...@lists.open-mpi.org>; Open MPI Users <users@lists.open-mpi.org> > <users@lists.open-mpi.org><mailto:users@lists.open-mpi.org> > <users@lists.open-mpi.org> > Cc: mrlong <mrlong...@gmail.com> > <mrlong...@gmail.com><mailto:mrlong...@gmail.com> <mrlong...@gmail.com> > Subject: [OMPI devel] There are not enough slots available in the system to > satisfy the 2, slots that were requested by the application > > > Two machines, each with 64 cores. The contents of the hosts file are: > > 192.168.180.48 slots=1 > 192.168.60.203 slots=1 > > Why do you get the following error when running with openmpi 5.0.0rc9? > > (py3.9) [user@machine01 share]0.5692263713929891nbsp; mpirun -n 2 > --machinefile hosts hostname > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > hostname > > Either request fewer procs for your application, or make more slots > available for use. > > A "slot" is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via "slots=N" clauses (N defaults to number of > processor cores if not provided) > 2. The --host command line parameter, via a ":N" suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the --host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > --use-hwthread-cpus option. > > Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.open-mpi.org/mailman/private/users/attachments/20221114/2c75fc85/attachment.html> > > <https://lists.open-mpi.org/mailman/private/users/attachments/20221114/2c75fc85/attachment.html> > > ------------------------------ > > Message: 2 > Date: Mon, 14 Nov 2022 18:04:06 +0000 > From: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> <jsquy...@cisco.com> > To: "users@lists.open-mpi.org" <users@lists.open-mpi.org> > <users@lists.open-mpi.org> <users@lists.open-mpi.org> > Cc: arun c <arun.edar...@gmail.com> <arun.edar...@gmail.com> > Subject: Re: [OMPI users] Tracing of openmpi internal functions > Message-ID: > > <bl0pr11mb2980b144bc115f202701558dc0...@bl0pr11mb2980.namprd11.prod.outlook.com> > > <bl0pr11mb2980b144bc115f202701558dc0...@bl0pr11mb2980.namprd11.prod.outlook.com> > > Content-Type: text/plain; charset="us-ascii" > > Open MPI uses plug-in modules for its implementations of the MPI collective > algorithms. From that perspective, once you understand that infrastructure, > it's exactly the same regardless of whether the MPI job is using intra-node > or inter-node collectives. > > We don't have much in the way of detailed internal function call tracing > inside Open MPI itself, due to performance considerations. You might want to > look into flamegraphs, or something similar...? > > -- > Jeff squyresjsquy...@cisco.com > ________________________________ > From: users <users-boun...@lists.open-mpi.org> > <users-boun...@lists.open-mpi.org> on behalf of arun c via users > <users@lists.open-mpi.org> <users@lists.open-mpi.org> > Sent: Saturday, November 12, 2022 9:46 AM > To: users@lists.open-mpi.org <users@lists.open-mpi.org> > <users@lists.open-mpi.org> > Cc: arun c <arun.edar...@gmail.com> <arun.edar...@gmail.com> > Subject: [OMPI users] Tracing of openmpi internal functions > > Hi All, > > I am new to openmpi and trying to learn the internals (source code > level) of data transfer during collective operations. At first, I will > limit it to intra-node (between cpu cores, and sockets) to minimize > the scope of learning. > > What are the best options (Looking for only free and open methods) for > tracing the openmpi code? (say I want to execute alltoall collective > and trace all the function calls and event callbacks that happened > inside the libmpi.so on all the cores) > > Linux kernel has something called ftrace, it gives a neat call graph > of all the internal functions inside the kernel with time, is > something similar available? > > --Arun > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.open-mpi.org/mailman/private/users/attachments/20221114/0c9d0e69/attachment.html> > > <https://lists.open-mpi.org/mailman/private/users/attachments/20221114/0c9d0e69/attachment.html> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > users mailing > listus...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/users > > ------------------------------ > > End of users Digest, Vol 4818, Issue 1 > ************************************** > >