Hello,

While working in a HPC support role, I was asked to resolve an apparent
discrepancy between OpenMPI 'mpi_cart_rank' behavior and the MPI spec [1,
2] that says "[out]-of-range coordinates are erroneous for non-periodic
dimensions." The observed behavior in our environment [3] was that
mpi_cart_rank on a topology with non-periodic dimensions was returning an
implicitly shifted value for a lookup of an invalid coordinate ('-1' for
example). This behavior was caused by the compile time flag
"--with-mpi-param-check=no" as included in the
contrib/platform/mellanox/optimized file [4], which ultimately seems to
disable the coordinate bounds checking happening at
ompi/mpi/c/cart_rank.c#L85-L91 [5]. We initially thought this could be a
bug, especially after reading 'MPI_Cart_rank: Out-of-range coordinates are
erroneous for non-periodic dimensions' [6], but the realization that our
build was disabling all parameter checking makes me a bit reluctant to call
this a 'bug'.

I'm relatively new to the MPI world and have searched this list's archives
for answers but found nothing really specific to my question. This is a
general question for other OpenMPI users and cluster admins regarding the
build optimization --with-mpi-param-check=no. I'm looking for opinions
based on experience supporting diverse user code in shared OpenMPI
installations:

In the context of a shared cluster deployment in a high performance
environment, are there good arguments for permanently disabling MPI
parameter checking (--with-mpi-param-check=no)? To eliminate some runtime
overhead in the functions that conditionally skip parameter validation? Is
that overhead substantial? I haven't found any recommendations to use the
configure flag '--with-mpi-param-check=no', apart from indirectly by
incorporating the Mellanox platform optimized [4] file. Are any other
site installers here intentionally (permanently) disabling parameter
checking in shared installations? Anyone disabling parameter checking at
runtime as a default? Are there other considerations?

My impression is it would be safe to compile out parameter checking if you
know your MPI code passes only legal parameter values to all MPI functions,
otherwise it would be prudent to leave parameter checking enabled (or
runtime disable-able).


1. MPI 4, 8.5.5, p406
2. MPI 3.1, 7.5.5, p305
3. OpenMPI 4.1.5 and 4.0.3 configured with
"--with-platform=contrib/platform/mellanox/optimized", as found in
https://linux.mellanox.com/public/repo/mlnx_ofed/5.8-3.0.7.0/rhel9.2/x86_64/openmpi-4.1.5a1-1.58307.x86_64.rpm
(/usr/mpi/gcc/openmpi-4.1.5a1/bin/ompi_info | grep "Configure command")
4.
https://github.com/open-mpi/ompi/blob/42b829b3b3190dd1987d113fd8c2810eb8584007/contrib/platform/mellanox/optimized#L55
5.
https://github.com/open-mpi/ompi/blob/42b829b3b3190dd1987d113fd8c2810eb8584007/ompi/mpi/c/cart_rank.c#L85-L91
6. https://www.mail-archive.com/users@lists.open-mpi.org/msg07705.html


Eli

Reply via email to