Hi Esthela,

As George mentions, this is indeed libpsm2 printing this error. Opcode=0xCC is 
a disconnect retry. There are a few scenarios that could be happening, but can 
simplify in saying it is an already disconnected endpoint message arriving 
late. What version of Intel Ompin-path Software or libpsm2 do you have in your 
system? We have not seen this error since the release of IFS 10.3.0. I suggest 
updating and testing again.

https://downloadcenter.intel.com/download/26567/Intel-Omni-Path-Fabric-Software-Including-Intel-Omni-Path-Host-Fabric-Interface-Driver-?v=t

Thanks,

_MAC

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of George 
Bosilca
Sent: Thursday, April 27, 2017 7:46 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Received eager message(s) from an unknown process 
error on KNL

Esthela,

This error message is generated internally by the PSM2 library, so you will not 
be able to get rid of it simply by recompiling Open MPI.

  George.


On Thu, Apr 27, 2017 at 8:21 PM, Gallardo, Esthela 
<egallar...@miners.utep.edu<mailto:egallar...@miners.utep.edu>> wrote:
Hello,

I am currently running a couple of benchmarks on two Intel Xeon Phi 7250 
second-generation KNL MIC compute nodes using Open MPI 2.1.0. While trying to 
run the osu_bcast benchmark with 8 MPI tasks (4 on each node), I noticed the 
following error in my output:

Received eager message(s) ptype=0x1 opcode=0xcc from an unknown process (err=49)

I have tried running the benchmark in the following manners:
mpirun -np 8 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4  ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 --mca mtl psm2  ./osu_bcast

But, nothing changes the error message at the end. Note, that the error does 
not really impact the results of the benchmark, so it’s possible that the error 
may be occurring in MPI_Finalize.

Also, in order to try to avoid getting this error, I tried to build the library 
with both of these configurations:
 ./configure --prefix=<path_to_build_folder> CC=icc CXX=icpc FC=ifort 
CFLAGS=-xCORE-AVX2 -axMIC-AVX512 CXXFLAGS=-xCORE-AVX2 -axMIC-AVX512 
FFLAGS=-xCORE-AVX2 -axMIC-AVX512 LDFLAGS=-xCORE-AVX2 -axMIC-AVX512

 ./configure --prefix=<path_to_build_folder> —enable-orterun-prefix-by-default 
—with-cma=yes --with-psm2 CC=icc CXX=icpc FC=ifort --disable-shared 
--enable-static  --without-slurm

However, this did not help prevent the occurrence of the error either. I was 
wondering if anyone has encountered this issue before, and what can be done in 
order to get rid of the error message.

Thank you,

Esthela Gallardo



_______________________________________________
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to