[OMPI users] Slurm or OpenMPI error?

2024-07-01 Thread Mccall, Kurt E. (MSFC-EV41) via users
name looks a little odd. Do you by chance have a reproducer and instructions on how you’re running it that we could try? Howard From: users mailto:users-boun...@lists.open-mpi.org>> on behalf of "Mccall, Kurt E. (MSFC-EV41) via users" mailto:users@lists.open-mpi.org>> Re

[OMPI users] Slurm or OpenMPI error?

2024-07-01 Thread Mccall, Kurt E. (MSFC-EV41) via users
: [EXTERNAL] [OMPI users] Slurm or OpenMPI error? Hello Kurt, The host name looks a little odd. Do you by chance have a reproducer and instructions on how you’re running it that we could try? Howard From: users mailto:users-boun...@lists.open-mpi.org>> on behalf of "Mccall, Kurt E. (MSF

[OMPI users] Slurm or OpenMPI error?

2024-07-01 Thread Mccall, Kurt E. (MSFC-EV41) via users
Using OpenMPI 5.0.3 and Slurm slurm 20.11.8. Is this error message issued by Slurm or by OpenMPI? A google search on the error message yielded nothing. -- At least one of the requested hosts is not included in the current a

Re: [OMPI users] [EXTERNAL] [BULK] Re: OpenMPI crashes with TCP connection error

2023-06-17 Thread Mccall, Kurt E. (MSFC-EV41) via users
pplication. Since you are using Slurm and MPI_Comm_spawn(), it is important to understand whether you are using mpirun or srun FWIW, --mpi=pmix is a srun option. you can srun --mpi=list to find the available options. Cheers, Gilles On Sat, Jun 17, 2023 at 2:53 AM Mccall, Kurt E. (MSFC-EV41)

Re: [OMPI users] OpenMPI crashes with TCP connection error

2023-06-16 Thread Mccall, Kurt E. (MSFC-EV41) via users
cristal orb thinks you might want to try the -mpi=pmix flag for srun as documented for slurm+openmpi: https://slurm.schedmd.com/mpi_guide.html#open_mpi -Joachim From: users mailto:users-boun...@lists.open-mpi.org>> on behalf of Mccall, Kurt E. (MSFC-EV41

[OMPI users] OpenMPI crashes with TCP connection error

2023-06-15 Thread Mccall, Kurt E. (MSFC-EV41) via users
My job immediately crashes with the error message below. I don’t know where to begin looking for the cause of the error, or what information to provide to help you understand it. Maybe you could clue me in 😊. I am on RedHat 4.18.0, using Slurm 20.11.8 and OpenMPI 4.1.5 compiled with gcc 8.5

[OMPI users] Disabling barrier in MPI_Finalize

2022-09-09 Thread Mccall, Kurt E. (MSFC-EV41) via users
Hi, If a single process needs to exit, MPI_Finalize will pause at a barrier, possibly waiting for pending communications to complete. Does OpenMPI have any means to disable this behavior, so the a single process can exit normally if the application calls for it? Thanks, Kurt

Re: [OMPI users] OpenMpi crash in MPI_Comm_spawn / developer message

2022-03-18 Thread Mccall, Kurt E. (MSFC-EV41) via users
Just an update: eliminated the error below by telling MPI_Comm_spawn to create non-MPI processes, via the info key: MPI_Info_set(info, "ompi_non_mpi", "true"); If you still want to pursue this matter, let me know. Kurt From: Mccall, Kurt E. (MSFC-EV41) Sent: Thursday, March 17, 2022 5:58 PM

[OMPI users] OpenMpi crash in MPI_Comm_spawn / developer message

2022-03-17 Thread Mccall, Kurt E. (MSFC-EV41) via users
My job successfully spawned a large number of subprocesses via MPI_Comm_spawn, filling up the available cores. When some of those subprocesses terminated, it attempted to spawn more. It appears that the latter calls to MPI_Comm_spawn caused this error: [n022.cluster.com:08996] [[56319,0],0]

Re: [OMPI users] MPI_Intercomm_create error

2022-03-16 Thread Mccall, Kurt E. (MSFC-EV41) via users
are not fully connected. In general, specifying which interface OMPI can use (with --mca btl_tcp_if_include x.y.z.t/s) solves the problem. George. On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via users mailto:users@lists.open-mpi.org>> wrote: I’m using OpenMpi 4.1.2 under

[OMPI users] MPI_Intercomm_create error

2022-03-16 Thread Mccall, Kurt E. (MSFC-EV41) via users
I'm using OpenMpi 4.1.2 under Slurm 20.11.8. My 2 process job is successfully launched, but when the main process rank 0 attempts to create an intercommunicator with process rank 1 on the other node: MPI_Comm intercom; MPI_Intercomm_create(MPI_COMM_SELF, 0, MPI_COMM_WORLD, 1, , &intercom); Op

[OMPI users] OpenMPI, Slurm and MPI_Comm_spawn

2022-03-08 Thread Mccall, Kurt E. (MSFC-EV41) via users
The Slurm MPI User's Guide at https://slurm.schedmd.com/mpi_guide.html#open_mpi has a note that states: NOTE: OpenMPI has a limitation that does not support calls to MPI_Comm_spawn() from within a Slurm allocation. If you need to use the MPI_Comm_spawn() function you will need to use another MP

Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-05 Thread Mccall, Kurt E. (MSFC-EV41) via users
rg>> Subject: [EXTERNAL] Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn Could you please ensure it was configured with --enable-debug and then add "--mca rmaps_base_verbose 5" to the mpirun cmd line? On Nov 3, 2021, at 9:10 AM, Mcc

Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Mccall, Kurt E. (MSFC-EV41) via users
eed to use a hostfile. As a workaround, I would suggest you try to mpirun --map-by node -np 21 ... Cheers, Gilles On Wed, Nov 3, 2021 at 6:06 PM Mccall, Kurt E. (MSFC-EV41) via users mailto:users@lists.open-mpi.org>> wrote: I’m using OpenMPI 4.1.1 compiled with Nvidia’s nvc++ 20.9, and c

[OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Mccall, Kurt E. (MSFC-EV41) via users
I'm using OpenMPI 4.1.1 compiled with Nvidia's nvc++ 20.9, and compiled with Torque support. I want to reserve multiple slots on each node, and then launch a single manager process on each node. The remaining slots would be filled up as the manager spawns new processes with MPI_Comm_spawn on

[OMPI users] Memchecker and MPI_Comm_spawn

2020-05-09 Thread Mccall, Kurt E. (MSFC-EV41) via users
How can I run OpenMPI's Memchecker on a process created by MPI_Comm_spawn()? I've configured OpenMPI 4.0.3 for Memchecker, along with Valgrind 3.15.0 and it works quite well on processes created directly by mpiexec. I tried to do something analogous by pre-pending "valgrind" onto the command

Re: [OMPI users] Meaning of mpiexec error flags

2020-04-14 Thread Mccall, Kurt E. (MSFC-EV41) via users
cified - used only in non-managed environments #define PRRTE_NODE_NON_USABLE 0x20 // the node is hosting a tool and is NOT to be used for jobs On Apr 13, 2020, at 2:15 PM, Mccall, Kurt E. (MSFC-EV41) via users mailto:users@lists.open-mpi.org>> wrote: My ap

Re: [OMPI users] Meaning of mpiexec error flags

2020-04-13 Thread Mccall, Kurt E. (MSFC-EV41) via users
GIVEN 0x10 // the number of slots was specified - used only in non-managed environments #define PRRTE_NODE_NON_USABLE 0x20 // the node is hosting a tool and is NOT to be used for jobs On Apr 13, 2020, at 2:15 PM, Mccall, Kurt E. (MSFC-EV41) via users mailto:

[OMPI users] Meaning of mpiexec error flags

2020-04-13 Thread Mccall, Kurt E. (MSFC-EV41) via users
My application is behaving correctly on node n006, and incorrectly on the lower numbered nodes. The flags in the error message below may give a clue as to why. What is the meaning of the flag values 0x11 and 0x13? == ALLOCATED NODES == n006

Re: [OMPI users] [EXTERNAL] Re: Please help me interpret MPI output

2019-11-21 Thread Mccall, Kurt E. (MSFC-EV41) via users
Peter Kjellström Sent: Thursday, November 21, 2019 3:40 AM To: Mccall, Kurt E. (MSFC-EV41) Cc: users@lists.open-mpi.org Subject: [EXTERNAL] Re: [OMPI users] Please help me interpret MPI output On Wed, 20 Nov 2019 17:38:19 +0000 "Mccall, Kurt E. \(MSFC-EV41\) via users" wrote: > Hi, >

[OMPI users] Please help me interpret MPI output

2019-11-20 Thread Mccall, Kurt E. (MSFC-EV41) via users
Hi, My job is behaving differently on its two nodes, refusing to MPI_Comm_spawn() a process on one of them but succeeding on the other. Please help me interpret the output that MPI is producing - I am hoping it will yield clues as to what is different between the two nodes. Here is one instan

[OMPI users] Interpreting the output of --display-map and --display-allocation

2019-11-18 Thread Mccall, Kurt E. (MSFC-EV41) via users
I'm trying to debug a problem with my job, launched with the mpiexec options -display-map and -display-allocation, but I don't know how to interpret the output. For example, mpiexec displays the following when a job is spawned by MPI_Comm_spawn(): == ALLOCATED NODES =

Re: [OMPI users] OpenMpi not throwing C++ exceptions

2019-11-07 Thread Mccall, Kurt E. (MSFC-EV41) via users
> Something is odd here, though -- I have two separately compiled OpenMpi > directories, one with and one without Torque support > (via the -with-tm configure flag). >Ompi_info chose the one without Torque > support. Why would it choose one over the other? > The one with Torque support is w

Re: [OMPI users] OpenMpi not throwing C++ exceptions

2019-11-07 Thread Mccall, Kurt E. (MSFC-EV41) via users
Just to double check, does ompi_info show that you have C++ exception support? - $ ompi_info --all | grep exceptions C++ exceptions: yes - Indeed it does: $ ompi_info --all | grep exceptions Configure command line: '--prefix=/opt/openmpi_pgc' '--enable-mpi-cxx' '--enable-cx

Re: [OMPI users] OpenMpi not throwing C++ exceptions

2019-11-07 Thread Mccall, Kurt E. (MSFC-EV41) via users
it be something else? Kurt From: Jeff Squyres (jsquyres) Subject: [EXTERNAL] Re: [OMPI users] OpenMpi not throwing C++ exceptions On Nov 7, 2019, at 3:02 PM, Mccall, Kurt E. (MSFC-EV41) via users mailto:users@lists.open-mpi.org>> wrote: My program is failing in MPI_Comm_spawn, but it se

[OMPI users] OpenMpi not throwing C++ exceptions

2019-11-07 Thread Mccall, Kurt E. (MSFC-EV41) via users
My program is failing in MPI_Comm_spawn, but it seems to simply terminate the job rather than throwing an exception that I can catch. Here is the abbreviated error message: [n001:32127] *** An error occurred in MPI_Comm_spawn [n001:32127] *** reported by process [1679884289,1] [n001:32127] ***

[OMPI users] MPI_Comm_spawn: no allocated resources for the application ...

2019-10-25 Thread Mccall, Kurt E. (MSFC-EV41) via users
I am trying to launch a number of manager processes, one per node, and then have each of those managers spawn, on its own same node, a number of workers. For this example, I have 2 managers and 2 workers per manager. I'm following the instructions at this link https://stackoverflow.com/questi

[OMPI users] MPI_Comm_Spawn failure: All nodes already filled

2019-08-07 Thread Mccall, Kurt E. (MSFC-EV41) via users
smfMcILaJtSebmPpGpbb5CA4hukOPv4Y&s=KOjBOU3R8SYRlORpTU4f1S89BfzgobqHLEMS3VC_jq8&e=> Ralph On Aug 6, 2019, at 3:58 AM, Mccall, Kurt E. (MSFC-EV41) via users mailto:users@lists.open-mpi.org>> wrote: Hi, MPI_Comm_spawn() is failing with the error message “All nodes which are allocated for this

Re: [OMPI users] [EXTERNAL] Re: MPI_Comm_Spawn failure: All nodes already filled

2019-08-07 Thread Mccall, Kurt E. (MSFC-EV41) via users
master_&d=DwMFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=6cP1IfXu3IZOHSDh_vBqciYiIh4uuVgs1MSi5K7l5fQ&m=02dv9l909IBsmfMcILaJtSebmPpGpbb5CA4hukOPv4Y&s=KOjBOU3R8SYRlORpTU4f1S89BfzgobqHLEMS3VC_jq8&e=> Ralph On Aug 6, 2019, at 3:58 AM, Mccall, Kurt E. (MSFC-EV41) via users

[OMPI users] MPI_Comm_Spawn failure: All nodes already filled

2019-08-06 Thread Mccall, Kurt E. (MSFC-EV41) via users
Hi, MPI_Comm_spawn() is failing with the error message "All nodes which are allocated for this job are already filled". I compiled OpenMpi 4.0.1 with the Portland Group C++ compiler, v. 19.5.0, both with and without Torque/Maui support. I thought that not using Torque/Maui support would gi