Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-21 Thread Mehmet Oren via users
Sent: Wednesday, April 17, 2024 5:11 PM To: Open MPI Users Cc: Greg Samonds ; Adnane Khattabi ; Philippe Rouchon Subject: Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently Hi Greg, I am not an openmpi exper

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-17 Thread Greg Samonds via users
s Cc: Greg Samonds ; Adnane Khattabi ; Philippe Rouchon Subject: Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently Hi Greg, I am not an openmpi expert but I just wanted to share my experience with HPC-X. 1. Default

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-17 Thread Mehmet Oren via users
ds, Mehmet From: users on behalf of Greg Samonds via users Sent: Tuesday, April 16, 2024 5:50 PM To: Open MPI Users Cc: Greg Samonds ; Adnane Khattabi ; Philippe Rouchon Subject: Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multip

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-16 Thread Greg Samonds via users
. Thanks again! Regards, Greg From: users On Behalf Of Gilles Gouaillardet via users Sent: Tuesday, April 16, 2024 12:59 AM To: Open MPI Users Cc: Gilles Gouaillardet Subject: Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running mul

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-15 Thread Gilles Gouaillardet via users
Greg, If Open MPI was built with UCX, your jobs will likely use UCX (and the shared memory provider) even if running on a single node. You can mpirun --mca pml ob1 --mca btl self,sm ... if you want to avoid using UCX. What is a typical mpirun command line used under the hood by your "make test"?

[OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-15 Thread Greg Samonds via users
Hello, We're running into issues with jobs failing in a non-deterministic way when running multiple jobs concurrently within a "make test" framework. Make test is launched from within a shell script running inside a Podman container, and we're typically running with "-j 20" and "-np 4" (20 jobs