name looks a little odd. Do you by chance have a reproducer and
instructions on how you’re running it that we could try?
Howard
From: users
mailto:users-boun...@lists.open-mpi.org>> on
behalf of "Mccall, Kurt E. (MSFC-EV41) via users"
mailto:users@lists.open-mpi.org>>
Re
: [EXTERNAL] [OMPI users] Slurm or OpenMPI error?
Hello Kurt,
The host name looks a little odd. Do you by chance have a reproducer and
instructions on how you’re running it that we could try?
Howard
From: users
mailto:users-boun...@lists.open-mpi.org>> on
behalf of "Mccall, Kurt E. (MSF
Using OpenMPI 5.0.3 and Slurm slurm 20.11.8.
Is this error message issued by Slurm or by OpenMPI? A google search on the
error message yielded nothing.
--
At least one of the requested hosts is not included in the current
a
pplication.
Since you are using Slurm and MPI_Comm_spawn(), it is important to understand
whether you are using mpirun or srun
FWIW, --mpi=pmix is a srun option. you can srun --mpi=list to find the
available options.
Cheers,
Gilles
On Sat, Jun 17, 2023 at 2:53 AM Mccall, Kurt E. (MSFC-EV41)
cristal orb thinks you might
want to try the -mpi=pmix flag for srun as documented for slurm+openmpi:
https://slurm.schedmd.com/mpi_guide.html#open_mpi
-Joachim
From: users
mailto:users-boun...@lists.open-mpi.org>> on
behalf of Mccall, Kurt E. (MSFC-EV41
My job immediately crashes with the error message below. I don’t know where
to begin looking for the cause
of the error, or what information to provide to help you understand it. Maybe
you could clue me in 😊.
I am on RedHat 4.18.0, using Slurm 20.11.8 and OpenMPI 4.1.5 compiled with gcc
8.5
Hi,
If a single process needs to exit, MPI_Finalize will pause at a barrier,
possibly waiting for pending communications to complete. Does OpenMPI have any
means to disable this behavior, so the a single process can exit normally if
the application calls for it?
Thanks,
Kurt
Just an update: eliminated the error below by telling MPI_Comm_spawn to
create non-MPI processes, via the info key:
MPI_Info_set(info, "ompi_non_mpi", "true");
If you still want to pursue this matter, let me know.
Kurt
From: Mccall, Kurt E. (MSFC-EV41)
Sent: Thursday, March 17, 2022 5:58 PM
My job successfully spawned a large number of subprocesses via MPI_Comm_spawn,
filling up the available cores. When some of those subprocesses terminated,
it attempted to spawn more. It appears that the latter calls to
MPI_Comm_spawn caused this error:
[n022.cluster.com:08996] [[56319,0],0]
are not fully connected. In general, specifying which interface OMPI can use
(with --mca btl_tcp_if_include x.y.z.t/s) solves the problem.
George.
On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via users
mailto:users@lists.open-mpi.org>> wrote:
I’m using OpenMpi 4.1.2 under
I'm using OpenMpi 4.1.2 under Slurm 20.11.8. My 2 process job is successfully
launched, but when the main process rank 0
attempts to create an intercommunicator with process rank 1 on the other node:
MPI_Comm intercom;
MPI_Intercomm_create(MPI_COMM_SELF, 0, MPI_COMM_WORLD, 1, , &intercom);
Op
The Slurm MPI User's Guide at https://slurm.schedmd.com/mpi_guide.html#open_mpi
has a note that states:
NOTE: OpenMPI has a limitation that does not support calls to MPI_Comm_spawn()
from within a Slurm allocation. If you need to use the MPI_Comm_spawn()
function you will need to use another MP
rg>>
Subject: [EXTERNAL] Re: [OMPI users] Reserving slots and filling them after job
launch with MPI_Comm_spawn
Could you please ensure it was configured with --enable-debug and then add
"--mca rmaps_base_verbose 5" to the mpirun cmd line?
On Nov 3, 2021, at 9:10 AM, Mcc
eed to use a hostfile.
As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...
Cheers,
Gilles
On Wed, Nov 3, 2021 at 6:06 PM Mccall, Kurt E. (MSFC-EV41) via users
mailto:users@lists.open-mpi.org>> wrote:
I’m using OpenMPI 4.1.1 compiled with Nvidia’s nvc++ 20.9, and c
I'm using OpenMPI 4.1.1 compiled with Nvidia's nvc++ 20.9, and compiled with
Torque support.
I want to reserve multiple slots on each node, and then launch a single manager
process on each node. The remaining slots would be filled up as the manager
spawns new processes with MPI_Comm_spawn on
How can I run OpenMPI's Memchecker on a process created by MPI_Comm_spawn()?
I've configured OpenMPI 4.0.3 for Memchecker, along with Valgrind 3.15.0 and it
works quite well on processes created directly by mpiexec.
I tried to do something analogous by pre-pending "valgrind" onto the command
cified - used only in non-managed environments
#define PRRTE_NODE_NON_USABLE 0x20 // the node is
hosting a tool and is NOT to be used for jobs
On Apr 13, 2020, at 2:15 PM, Mccall, Kurt E. (MSFC-EV41) via users
mailto:users@lists.open-mpi.org>> wrote:
My ap
GIVEN 0x10 // the number of slots
was specified - used only in non-managed environments
#define PRRTE_NODE_NON_USABLE 0x20 // the node is
hosting a tool and is NOT to be used for jobs
On Apr 13, 2020, at 2:15 PM, Mccall, Kurt E. (MSFC-EV41) via users
mailto:
My application is behaving correctly on node n006, and incorrectly on the lower
numbered nodes. The flags in the error message below may give a clue as to
why. What is the meaning of the flag values 0x11 and 0x13?
== ALLOCATED NODES ==
n006
Peter Kjellström
Sent: Thursday, November 21, 2019 3:40 AM
To: Mccall, Kurt E. (MSFC-EV41)
Cc: users@lists.open-mpi.org
Subject: [EXTERNAL] Re: [OMPI users] Please help me interpret MPI output
On Wed, 20 Nov 2019 17:38:19 +0000
"Mccall, Kurt E. \(MSFC-EV41\) via users"
wrote:
> Hi,
>
Hi,
My job is behaving differently on its two nodes, refusing to MPI_Comm_spawn() a
process on one of them but succeeding on the other. Please help me interpret
the output that MPI is producing - I am hoping it will yield clues as to what
is different between the two nodes.
Here is one instan
I'm trying to debug a problem with my job, launched with the mpiexec options
-display-map and -display-allocation, but I don't know how to interpret the
output. For example, mpiexec displays the following when a job is spawned by
MPI_Comm_spawn():
== ALLOCATED NODES =
> Something is odd here, though -- I have two separately compiled OpenMpi
> directories, one with and one without Torque support
> (via the -with-tm configure flag). >Ompi_info chose the one without Torque
> support. Why would it choose one over the other?
> The one with Torque support is w
Just to double check, does ompi_info show that you have C++ exception support?
-
$ ompi_info --all | grep exceptions
C++ exceptions: yes
-
Indeed it does:
$ ompi_info --all | grep exceptions
Configure command line: '--prefix=/opt/openmpi_pgc' '--enable-mpi-cxx'
'--enable-cx
it be
something else?
Kurt
From: Jeff Squyres (jsquyres)
Subject: [EXTERNAL] Re: [OMPI users] OpenMpi not throwing C++ exceptions
On Nov 7, 2019, at 3:02 PM, Mccall, Kurt E. (MSFC-EV41) via users
mailto:users@lists.open-mpi.org>> wrote:
My program is failing in MPI_Comm_spawn, but it se
My program is failing in MPI_Comm_spawn, but it seems to simply terminate the
job rather than throwing an exception that I can catch. Here is the
abbreviated error message:
[n001:32127] *** An error occurred in MPI_Comm_spawn
[n001:32127] *** reported by process [1679884289,1]
[n001:32127] ***
I am trying to launch a number of manager processes, one per node, and then have
each of those managers spawn, on its own same node, a number of workers. For
this example,
I have 2 managers and 2 workers per manager. I'm following the instructions at
this link
https://stackoverflow.com/questi
smfMcILaJtSebmPpGpbb5CA4hukOPv4Y&s=KOjBOU3R8SYRlORpTU4f1S89BfzgobqHLEMS3VC_jq8&e=>
Ralph
On Aug 6, 2019, at 3:58 AM, Mccall, Kurt E. (MSFC-EV41) via users
mailto:users@lists.open-mpi.org>> wrote:
Hi,
MPI_Comm_spawn() is failing with the error message “All nodes which are
allocated for this
master_&d=DwMFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=6cP1IfXu3IZOHSDh_vBqciYiIh4uuVgs1MSi5K7l5fQ&m=02dv9l909IBsmfMcILaJtSebmPpGpbb5CA4hukOPv4Y&s=KOjBOU3R8SYRlORpTU4f1S89BfzgobqHLEMS3VC_jq8&e=>
Ralph
On Aug 6, 2019, at 3:58 AM, Mccall, Kurt E. (MSFC-EV41) via users
Hi,
MPI_Comm_spawn() is failing with the error message "All nodes which are
allocated for this job are already filled". I compiled OpenMpi 4.0.1 with the
Portland Group C++ compiler, v. 19.5.0, both with and without Torque/Maui
support. I thought that not using Torque/Maui support would gi
30 matches
Mail list logo