Hi Greg,

I am not an openmpi expert but I just wanted to share my experience with HPC-X.

  1.
Default HPC-X builds which come with the mofed drivers are built with UCX and 
as Gilles stated, specifying ob1 will not change the layer for openmpi. You can 
try to discard UCX and let the openmpi decide for the layer by adding "--mca 
pml ^ucx" to your command line.
  2.
HPC-X comes with two scripts named mpivars.sh and mpivars.csh respectively 
under bin folder. It could be a better option to source mpivars.sh before 
running your job instead of adding LD_LIBRARY_PATH. By sourcing this script, 
you can set up all required paths and environment variables easily and fix most 
of the run time problems.
  3.
And also, please check hwloc and it is dependencies which usually are not 
present with default os installations and container images.

Regards,
Mehmet
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Greg Samonds via 
users <users@lists.open-mpi.org>
Sent: Tuesday, April 16, 2024 5:50 PM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Greg Samonds <greg.samo...@esi-group.com>; Adnane Khattabi 
<adnane.khatt...@esi-group.com>; Philippe Rouchon 
<philippe.rouc...@esi-group.com>
Subject: Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available 
processors)" when running multiple jobs concurrently


Hi Gilles,



Thanks for your assistance.



I tried the recommended settings but got an error saying “sm” is no longer 
available in Open MPI 3.0+, and to use “vader” instead.  I then tried with 
“--mca pml ob1 --mca btl self,vader” but ended up with the original error:



[podman-ci-rocky-8.8:09900] MCW rank 3 is not bound (or bound to all available 
processors)

[podman-ci-rocky-8.8:09899] MCW rank 2 is not bound (or bound to all available 
processors)

[podman-ci-rocky-8.8:09898] MCW rank 1 is not bound (or bound to all available 
processors)

[podman-ci-rocky-8.8:09897] MCW rank 0 is not bound (or bound to all available 
processors)



Program received signal SIGILL: Illegal instruction.



Backtrace for this error:

#0  0xffffa202a917 in ???

#1  0xffffa20299a7 in ???

#2  0xffffa520079f in ???

#3  0xffffa1d0380c in ???

#4  0xffffa1d56fe7 in ???

#5  0xffffa1d57be7 in ???

#6  0xffffa1d5a5f7 in ???

#7  0xffffa1d5b35b in ???

#8  0xffffa17b8db7 in get_print_name_buffer

                at util/name_fns.c:106

#9  0xffffa17b8e1b in orte_util_print_jobids

                at util/name_fns.c:171

#10  0xffffa17b91eb in orte_util_print_name_args

                at util/name_fns.c:143

#11  0xffffa1822e93 in _process_name_print_for_opal

                at runtime/orte_init.c:68

#12  0xffff9ebe5e6f in process_event

                at 
/build-result/src/hpcx-v2.17.1-gcc-mlnx_ofed-redhat8-cuda12-aarch64/ompi-821f7a18fb5f87c7840032d0251fb36675505a64/opal/mca/pmix/pmix3x/pmix3x.c:255

#13  0xffffa16ec3cf in event_process_active_single_queue

                at 
/build-result/src/hpcx-v2.17.1-gcc-mlnx_ofed-redhat8-cuda12-aarch64/ompi-821f7a18fb5f87c7840032d0251fb36675505a64/opal/mca/event/libevent2022/libevent/event.c:1370

#14  0xffffa16ec3cf in event_process_active

                at 
/build-result/src/hpcx-v2.17.1-gcc-mlnx_ofed-redhat8-cuda12-aarch64/ompi-821f7a18fb5f87c7840032d0251fb36675505a64/opal/mca/event/libevent2022/libevent/event.c:1440

#15  0xffffa16ec3cf in opal_libevent2022_event_base_loop

                at 
/build-result/src/hpcx-v2.17.1-gcc-mlnx_ofed-redhat8-cuda12-aarch64/ompi-821f7a18fb5f87c7840032d0251fb36675505a64/opal/mca/event/libevent2022/libevent/event.c:1644

#16  0xffffa16a9d93 in progress_engine

                at runtime/opal_progress_threads.c:105

#17  0xffffa1e678b7 in ???

#18  0xffffa1d03afb in ???

#19  0xffffffffffffffff in ???



The typical mpiexec options for each job include “-np 4 --allow-run-as-root 
--bind-to none --report-bindings” and a “-x LD_LIBRARY_PATH=…” which passes the 
HPC-X and application environment.



I will get back to you with a core dump once I figure out the best way to 
generate and retrieve it from within our CI infrastructure.



Thanks again!



Regards,

Greg



From: users <users-boun...@lists.open-mpi.org> On Behalf Of Gilles Gouaillardet 
via users
Sent: Tuesday, April 16, 2024 12:59 AM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
Subject: Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available 
processors)" when running multiple jobs concurrently



Greg,



If Open MPI was built with UCX, your jobs will likely use UCX (and the shared 
memory provider) even if running on a single node.

You can

mpirun --mca pml ob1 --mca btl self,sm ...

if you want to avoid using UCX.



What is a typical mpirun command line used under the hood by your "make test"?

Though the warning might be ignored, SIGILL is definitely an issue.

I encourage you to have your app dump a core in order to figure out where this 
is coming from





Cheers,



Gilles



On Tue, Apr 16, 2024 at 5:20 AM Greg Samonds via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

Hello,



We’re running into issues with jobs failing in a non-deterministic way when 
running multiple jobs concurrently within a “make test” framework.



Make test is launched from within a shell script running inside a Podman 
container, and we’re typically running with “-j 20” and “-np 4” (20 jobs 
concurrently with 4 procs each).  We’ve also tried reducing the number of jobs 
to no avail.  Each time the battery of test cases is run, about 2 to 4 
different jobs out of around 200 fail with the following errors:

[podman-ci-rocky-8.8:03528] MCW rank 1 is not bound (or bound to all available 
processors)
[podman-ci-rocky-8.8:03540] MCW rank 3 is not bound (or bound to all available 
processors)
[podman-ci-rocky-8.8:03519] MCW rank 0 is not bound (or bound to all available 
processors)
[podman-ci-rocky-8.8:03533] MCW rank 2 is not bound (or bound to all available 
processors)

Program received signal SIGILL: Illegal instruction.

Some info about our setup:

  *   Ampere Altra 80 core ARM machine
  *   Open MPI 4.1.7a1 from HPC-X v2.18
  *   Rocky Linux 8.6 host, Rocky Linux 8.8 container
  *   Podman 4.4.1
  *   This machine has a Mellanox Connect X-6 Lx NIC, however we’re avoiding 
the Mellanox software stack by running in a container, and these are single 
node jobs only



We tried passing “—bind-to none” to the running jobs, and while this seemed to 
reduce the number of failing jobs on average, it didn’t eliminate the issue.



We also encounter the following warning:



[1712927028.412063] [podman-ci-rocky-8:3519 :0]            sock.c:514  UCX  
WARN  unable to read somaxconn value from /proc/sys/net/core/somaxconn file



…however as far as I can tell this is probably unrelated and occurs because the 
associated file isn’t accessible inside the container, and after checking the 
UCX source I can see that SOMAXCONN is picked up from the system headers anyway.



If anyone has hints about how to workaround this issue we’d greatly appreciate 
it!



Thanks,

Greg

Reply via email to