Since it is happening on this cluster and not on others, have you checked the
InfiniBand counters to ensure it’s not a bad cable or something along those
lines? I believe the command is ibdiag (or something similar).
Collin
From: users On Behalf Of Bart Willems via
users
Sent: Thursday, June
it instructs mpirun to treat the HWTs as independent cpus so you would
> have 4 slots in this case.
>
>
> > On Jun 8, 2020, at 11:28 AM, Collin Strassburger via users
> > wrote:
> >
> > Hello David,
> >
> > The slot calculation is based on physical
Hello David,
The slot calculation is based on physical cores rather than logical cores. The
4 CPUs you are seeing there are logical CPUs. And since your processor has 2
threads per core, you have two physical cores; yielding a total of 4 logical
cores (which is reported to lscpu). On machine
.
Cheers,
Gilles
On April 6, 2020, at 23:22, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
Just a quick comment on this; is your code written in C/C++ or Fortran?
Fortran has issues with writing at a decent speed regardless of MPI setup and
as such sho
Hello,
Just a quick comment on this; is your code written in C/C++ or Fortran?
Fortran has issues with writing at a decent speed regardless of MPI setup and
as such should be avoided for file IO (yet I still occasionally see it
implemented).
Collin
From: users On Behalf Of Dong-In Kang via
Wonderful! I am happy to confirm that this resolves the issue!
Many thanks to everyone for their assistance,
Collin
I agree that it is odd that the issue does not appear until after the Mellanox
drivers have been installed (and the configure flags set to use them). As
requested, here are the results
Input: mpirun -np 128 --mca odls_base_verbose 10 --mca state_base_verbose 10
hostname
Output:
[Gen2Node3:54
users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
I hav
On
Behalf Of Ralph Castain via users
Sent: Tuesday, January 28, 2020 11:02 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per nod
t;>
Subject: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>&
sts.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM,
ct: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
I have done some additional testi
here:
https://www.open-mpi.org/community/help/
Thanks!
On Jan 27, 2020, at 12:00 PM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
I had initially thought the same thing about the streams, but I have 2 sockets
with 64 cores each. Additionally, I h
/27/2020 11:29 AM, Collin Strassburger via users wrote:
This message was sent from a non-IU address. Please exercise caution when
clicking links or opening attachments from external sources.
Hello Howard,
To remove potential interactions, I have found that the issue persists without
ucx and hcoll
:38 Uhr schrieb Collin Strassburger via users
mailto:users@lists.open-mpi.org>>:
Hello,
I am having difficulty with OpenMPI versions 4.0.2 and 3.1.5. Both of these
versions cause the same error (error code 63) when utilizing more than 100
cores on a single node. The processors I am util
Hello,
I am having difficulty with OpenMPI versions 4.0.2 and 3.1.5. Both of these
versions cause the same error (error code 63) when utilizing more than 100
cores on a single node. The processors I am utilizing are AMD Epyc "Rome"
7742s. The OS is CentOS 8.1. I have tried compiling with bo
17 matches
Mail list logo