Hi everyone,
is there anything new on this issue? Should I report it on github as well?
Regads, Götz Waschk
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
Hi everyone,
I have a cluster of 32 nodes with Infiniband, four of them
additionally have a 10G Mellanox Ethernet card for faster I/O. If my
job based on openmpi 1.10.6 ends up on one of these nodes, it will
crash:
No OpenFabrics connection schemes reported that they were able to be
used on a spe
ar to
> dshbak)
>
> Cheers,
>
> Gilles
>
>
> Noam Bernstein wrote:
>
> On Dec 1, 2017, at 8:10 AM, Götz Waschk wrote:
>
> On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk wrote:
>
> I have attached my slurm job script, it will simply do an mpirun
> IMB-MPI1 w
On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk wrote:
> I have attached my slurm job script, it will simply do an mpirun
> IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for
> instance, vader is enabled.
I have tested again, with
mpirun --mca btl "^vader&q
On Thu, Nov 30, 2017 at 6:32 PM, Jeff Squyres (jsquyres)
wrote:
> Ah, I was misled by the subject.
>
> Can you provide more information about "hangs", and your environment?
>
> You previously cited:
>
> - E5-2697A v4 CPUs and Mellanox ConnectX-3 FDR Infiniband
> - SLRUM
> - Open MPI v3.0.0
> - IMB
7;s the last release in the v1.10 series, and
> has all the latest bug fixes.
>
>> On Nov 30, 2017, at 9:53 AM, Götz Waschk wrote:
>>
>> Hi everyone,
>>
>> I have managed to solve the first part of this problem. It was caused
>> by the quota on /tmp, that&
lt was the
openmpi crash from a bus error.
After setting TMPDIR in slurm, I was finally able to run IMB-MPI1 with
1024 cores and openmpi 1.10.6.
But now for the new problem: with openmpi3, the same test (IMB-MPI1,
1024 cores, 32 nodes) hangs after about 30 minutes of runtime. Any
idea on this?
Reg
Hi everyone,
so how do I proceed with this problem, do you need more information?
Should I open a bug report on github?
Regards, Götz Waschk
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
On Thu, Mar 23, 2017 at 2:37 PM, Götz Waschk wrote:
> I have also tried mpirun --mca coll ^tuned --mca btl tcp,openib , this
> finished fine, but was quite slow. I am currently testing with mpirun
> --mca coll ^tuned
This one ran also fine.
Hi Gilles,
On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet
wrote:
> mpirun --mca btl openib,self ...
Looks like this didn't finish, I had to terminate the job during the
Gather with 32 processes step.
> Then can you try
> mpirun --mca coll ^tuned --mca btl tcp,self ...
As mentioned, this
Hi Gilles,
I'm currently testing and here are some preliminary results:
On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet
wrote:
> Can you please try
> mpirun --mca btl tcp,self ...
this failed to produce the program output, there were lots of errors like this:
[pax11-00][[54124,1],31][btl_
On Thu, Mar 23, 2017 at 9:59 AM, Åke Sandgren wrote:
> E5-2697A which version? v4?
Hi, yes, that one:
Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz
Regards, Götz
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/li
Hi Howard,
I had tried to send config.log of my 2.1.0 build, but I guess it was
too big for the list. I'm trying again with a compressed file.
I have based it on the OpenHPC package. Unfortunately, it still
crashes with disabling
the vader btl with this command line:
mpirun --mca btl "^vader" IMB-
Hi Åke,
I have E5-2697A CPUs and Mellanox ConnectX-3 FDR Infiniband. I'm using
EL7.3 as the operating system.
Regards, Götz Waschk
On Thu, Mar 23, 2017 at 9:28 AM, Åke Sandgren wrote:
> Since i'm seeing similar Bus errors from both openmpi and other places
> on our system I
of error message ***
--
mpirun noticed that process rank 320 with PID 21920 on node pax11-10
exited on signal 7 (Bus error).
--
Reg
0 on node pax11-17
exited on signal 7 (Bus error).
--
The program is started from the slurm batch system using mpirun. The
same application is working fine when using mvapich2 instead.
Regards, Götz W
indow creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size Bandwidth (MB/s)
1 28.56
2 58.74
So it wasn't fixed for RHEL 6.6.
Regards, Götz
On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk wrote:
> Hi,
>
> I had tested 1.8
Hi,
I had tested 1.8.4rc1 and it wasn't fixed. I can try again though,
maybe I had made an error.
Regards, Götz Waschk
On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd wrote:
> Hi,
>
> This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3 a
> shot?
>
> Be
() to detect OFED 2.0.
Regards, Götz Waschk
ile in etc under the
configured prefix.
Regards, Götz Waschk
On Tue, Jan 31, 2012 at 8:19 PM, Daniel Milroy
wrote:
> Hello,
>
> I have built OpenMPI 1.4.5rc2 with Intel 12.1 compilers in an HPC
> environment. We are running RHEL 5, kernel 2.6.18-238 with Intel Xeon
> X5660 cpus. You can find my build options below. In an effort to
> test the OpenMPI buil
On Tue, Jan 31, 2012 at 5:20 PM, Richard Walsh
wrote:
> in the malloc.c routine in 1.5.5. Perhaps you should lower the optimization
> level to zero and see what you get.
Hi Richard,
thanks for the suggestion. I was able to solve the problem by
upgrading the Intel Compiler to version 12.1.2 and r
emedy. I would also try things with the very latest release.
Yes, the mpicc crash happened every time, I could reproduce that.
I have only tested the most basic code, the cpi.c example. The funny
thing is, that mpirun -np 8 cpi doesn't always crash, sometimes it
finishes just fine.
Regards, Götz Waschk
process rank 6 with PID 13662 on node pax8e.ifh.de
exited on signal 11 (Segmentation fault).
I am using RHEL6.1 and the affected Intel 12.1 compiler.
Regards, Götz Waschk
On Fri, Oct 8, 2010 at 8:27 PM, Richard Walsh
wrote:
> Regarding building HD5 ... the OpenMPI 1.4.1 wrapper using the May 2010
> release of the Intel Compiler Toolkit Cluster Edition (ICTCE) worked for me.
> Here is my config.log header:
> $ ./configure CC=mpicc CXX=mpiCC F77=mpif77 FC=mpif90 --e
On Wed, Oct 6, 2010 at 4:35 PM, Jeff Squyres wrote:
> On Oct 6, 2010, at 10:07 AM, Götz Waschk wrote:
>>> Do -Wl,-rpath and -Wl,-soname= work any better?
>> Yes, with these options, it build fine. But the command line is
>> generated by libtool, so how can I make libt
On Wed, Oct 6, 2010 at 2:43 PM, Tim Prince wrote:
>> libtool: link: mpif90 -shared .libs/H5f90global.o
>> .libs/H5fortran_types.o .libs/H5_ff.o .libs/H5Aff.o .libs/H5Dff.o
>> .libs/H5Eff.o .libs/H5Fff.o .libs/H5Gff.o .libs/H5Iff.o .libs/H5Lff.o
>> .libs/H5Off.o .libs/H5Pff.o .libs/H5Rff.o .libs/H
an.so.6: No such file: No such file or directory
make[1]: *** [libhdf5_fortran.la] Fehler 1
hdf5 builds fine with static libraries only, but they become huge. It
looks like libtool or mpif90 or something else is calling ifort with
the wrong options. Any idea on how to fix this?
Regards, Götz Waschk
On Fri, Jun 25, 2010 at 9:14 AM, Srinivas Gopal wrote:
> I'm trying to build CCSM4 for which I'm using open mpi 1.4.1. $MPICH_PATH
> is set /usr/local (output of $which mpirun is /usr/local/bin/mpirun) and
> LIB_MPI is set to $(MPICH_PATH)/lib in its Macros file. However build
> process exits w
/mpi.h
Regards, Götz Waschk
idea?
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
/*
Benchmark of MPI_Sendrecv_replace as in get_overlaps_spinor_tslice of ddhqet
hs 15.12.2009
*/
#include
#include
#include
#include
#define NMEAS 11
#define L032
#define L116
#define L216
On 5/21/07, Pak Lui wrote:
I have tried using SSH instead of rsh before but I didn't use with the
kerberos auth. I can see you've tried to run qrsh -inherit via ssh
already before the mpirun line and verify the connection works.
Hi Pak Lui,
I have tested it a bit more, the culprit is the kerb
ld be?
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
openmpi.job
Description: Binary data
openmpi.job.e6205663
Description: Binary data
openmpi.job.o6205663
Description: Binary data
hard stack size limit of 10240 and the problem is
gone.
Sorry for the noise, regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
On 4/27/07, Götz Waschk wrote:
I'm testing my new cluster installation with the hpcc benchmark and
openmpi 1.2.1 on RHEL5 32 bit. I have some trouble with using a
threaded BLAS implementation. I have tried ATLAS 3.7.30 compiled with
pthread support. It crashes as reported here:
[...]
I h
is is because we have to use our own memory manager code to
get around the memory pinning problem with Myrinet/GM and
InfiniBand. You might want to configure with --without-memory-
manager and see if that helps with your crashes.
I have tried that, same result.
Regards, Götz Waschk
--
AL I:40: Do
Hello everyone,
I've found a bug trying to build openmpi 1.2.1 with progress threads
and gm btl support. Gcc had no problem with the missing header but
pgcc 7.0 complained. Check the attached patch.
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
--- op
CPU. If I set the maximum number of threads for Goto
BLAS to 1, hpcc is working fine again.
openmpi was compiled without thread support.
Can you give me a hint?
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
t of such conversions are undefined.
This is PGI 7.0-2. So maybe the documentation should be changed into
'The Portland Group compilers prior to version 7.0 require ...'
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
delete all *gridengine*
plugin files from lib/openmpi/
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
veral ways to set the limit. It seems you are hitting
the hard limit, if you want to set a higher value, you have to modify
/etc/security/limits.conf as defined in the comments. This is part of
pam, so you have to make sure your ssh session is using pam.
Regards, Götz Waschk
--
AL I:40: Do what
what exactly you have tried and please
include the complete error message as well.
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
you might hit the Xeon
bottle neck.
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
ng by itself by looking at SGE's environment variables.
Regards, Götz Waschk - DESY Zeuthen
--
AL I:40: Do what thou wilt shall be the whole of the Law.
ase is still reproducable
with the fresh 7.0 version of pgf90. We can only hope that they'll fix
it in a later version. You are right, it is easy to avoid this, so it
is no big problem.
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
he linking step is done separate
from the compilation of conftest.f90. The Portland Group support is
still trying to figure out if this is a bug in their compiler.
Regards, Götz Waschk
--
AL I:40: Do what thou wilt shall be the whole of the Law.
Hi everybody,
first I'd like to introduce myself, my name is Götz Waschk and I'm
working at DESY in the computing department.
The default shell here is /bin/zsh. mpirun has support for setting
PATH and LD_LIBRARY_PATH for a list of known shells (bash, ksh, csh,
...) but not for zsh.
47 matches
Mail list logo