Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2018-03-26 Thread Götz Waschk
Hi everyone, is there anything new on this issue? Should I report it on github as well? Regads, Götz Waschk ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Problem with Mellanox device selection

2017-12-18 Thread Götz Waschk
Hi everyone, I have a cluster of 32 nodes with Infiniband, four of them additionally have a 10G Mellanox Ethernet card for faster I/O. If my job based on openmpi 1.10.6 ends up on one of these nodes, it will crash: No OpenFabrics connection schemes reported that they were able to be used on a spe

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Götz Waschk
ar to > dshbak) > > Cheers, > > Gilles > > > Noam Bernstein wrote: > > On Dec 1, 2017, at 8:10 AM, Götz Waschk wrote: > > On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk wrote: > > I have attached my slurm job script, it will simply do an mpirun > IMB-MPI1 w

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Götz Waschk
On Fri, Dec 1, 2017 at 10:13 AM, Götz Waschk wrote: > I have attached my slurm job script, it will simply do an mpirun > IMB-MPI1 with 1024 processes. I haven't set any mca parameters, so for > instance, vader is enabled. I have tested again, with mpirun --mca btl "^vader&q

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-01 Thread Götz Waschk
On Thu, Nov 30, 2017 at 6:32 PM, Jeff Squyres (jsquyres) wrote: > Ah, I was misled by the subject. > > Can you provide more information about "hangs", and your environment? > > You previously cited: > > - E5-2697A v4 CPUs and Mellanox ConnectX-3 FDR Infiniband > - SLRUM > - Open MPI v3.0.0 > - IMB

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-11-30 Thread Götz Waschk
7;s the last release in the v1.10 series, and > has all the latest bug fixes. > >> On Nov 30, 2017, at 9:53 AM, Götz Waschk wrote: >> >> Hi everyone, >> >> I have managed to solve the first part of this problem. It was caused >> by the quota on /tmp, that&

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-11-30 Thread Götz Waschk
lt was the openmpi crash from a bus error. After setting TMPDIR in slurm, I was finally able to run IMB-MPI1 with 1024 cores and openmpi 1.10.6. But now for the new problem: with openmpi3, the same test (IMB-MPI1, 1024 cores, 32 nodes) hangs after about 30 minutes of runtime. Any idea on this? Reg

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-28 Thread Götz Waschk
Hi everyone, so how do I proceed with this problem, do you need more information? Should I open a bug report on github? Regards, Götz Waschk ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
On Thu, Mar 23, 2017 at 2:37 PM, Götz Waschk wrote: > I have also tried mpirun --mca coll ^tuned --mca btl tcp,openib , this > finished fine, but was quite slow. I am currently testing with mpirun > --mca coll ^tuned This one ran also fine.

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Gilles, On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet wrote: > mpirun --mca btl openib,self ... Looks like this didn't finish, I had to terminate the job during the Gather with 32 processes step. > Then can you try > mpirun --mca coll ^tuned --mca btl tcp,self ... As mentioned, this

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Gilles, I'm currently testing and here are some preliminary results: On Thu, Mar 23, 2017 at 10:33 AM, Gilles Gouaillardet wrote: > Can you please try > mpirun --mca btl tcp,self ... this failed to produce the program output, there were lots of errors like this: [pax11-00][[54124,1],31][btl_

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
On Thu, Mar 23, 2017 at 9:59 AM, Åke Sandgren wrote: > E5-2697A which version? v4? Hi, yes, that one: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz Regards, Götz ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/li

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Howard, I had tried to send config.log of my 2.1.0 build, but I guess it was too big for the list. I'm trying again with a compressed file. I have based it on the OpenHPC package. Unfortunately, it still crashes with disabling the vader btl with this command line: mpirun --mca btl "^vader" IMB-

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-23 Thread Götz Waschk
Hi Åke, I have E5-2697A CPUs and Mellanox ConnectX-3 FDR Infiniband. I'm using EL7.3 as the operating system. Regards, Götz Waschk On Thu, Mar 23, 2017 at 9:28 AM, Åke Sandgren wrote: > Since i'm seeing similar Bus errors from both openmpi and other places > on our system I&#

Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-22 Thread Götz Waschk
of error message *** -- mpirun noticed that process rank 320 with PID 21920 on node pax11-10 exited on signal 7 (Bus error). -- Reg

[OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-22 Thread Götz Waschk
0 on node pax11-17 exited on signal 7 (Bus error). -- The program is started from the slurm batch system using mpirun. The same application is working fine when using mvapich2 instead. Regards, Götz W

Re: [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-08 Thread Götz Waschk
indow creation: MPI_Win_allocate # Synchronization: MPI_Win_flush # Size Bandwidth (MB/s) 1 28.56 2 58.74 So it wasn't fixed for RHEL 6.6. Regards, Götz On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk wrote: > Hi, > > I had tested 1.8

Re: [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-08 Thread Götz Waschk
Hi, I had tested 1.8.4rc1 and it wasn't fixed. I can try again though, maybe I had made an error. Regards, Götz Waschk On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd wrote: > Hi, > > This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3 a > shot? > > Be

[OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-08 Thread Götz Waschk
() to detect OFED 2.0. Regards, Götz Waschk

[OMPI users] Open-MPI 1.6 searches for default hostfile in the wrong directory

2012-05-14 Thread Götz Waschk
ile in etc under the configured prefix. Regards, Götz Waschk

Re: [OMPI users] Segfault on mpirun with OpenMPI 1.4.5rc2

2012-02-01 Thread Götz Waschk
On Tue, Jan 31, 2012 at 8:19 PM, Daniel Milroy wrote: > Hello, > > I have built OpenMPI 1.4.5rc2 with Intel 12.1 compilers in an HPC > environment.  We are running RHEL 5, kernel 2.6.18-238 with Intel Xeon > X5660 cpus.  You can find my build options below.  In an effort to > test the OpenMPI buil

Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...

2012-02-01 Thread Götz Waschk
On Tue, Jan 31, 2012 at 5:20 PM, Richard Walsh wrote: > in the malloc.c routine in 1.5.5.  Perhaps you should lower the optimization > level to zero and see what you get. Hi Richard, thanks for the suggestion. I was able to solve the problem by upgrading the Intel Compiler to version 12.1.2 and r

Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...

2012-01-31 Thread Götz Waschk
emedy.  I would also try things with the very latest release. Yes, the mpicc crash happened every time, I could reproduce that. I have only tested the most basic code, the cpi.c example. The funny thing is, that mpirun -np 8 cpi doesn't always crash, sometimes it finishes just fine. Regards, Götz Waschk

Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...

2012-01-30 Thread Götz Waschk
process rank 6 with PID 13662 on node pax8e.ifh.de exited on signal 11 (Segmentation fault). I am using RHEL6.1 and the affected Intel 12.1 compiler. Regards, Götz Waschk

Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran

2010-10-09 Thread Götz Waschk
On Fri, Oct 8, 2010 at 8:27 PM, Richard Walsh wrote: > Regarding building HD5 ... the OpenMPI 1.4.1 wrapper using the May 2010 > release of the Intel Compiler Toolkit Cluster Edition (ICTCE) worked for me. > Here is my config.log header: >  $ ./configure CC=mpicc CXX=mpiCC F77=mpif77 FC=mpif90 --e

Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran

2010-10-08 Thread Götz Waschk
On Wed, Oct 6, 2010 at 4:35 PM, Jeff Squyres wrote: > On Oct 6, 2010, at 10:07 AM, Götz Waschk wrote: >>> Do -Wl,-rpath and -Wl,-soname= work any better? >> Yes, with these options, it build fine. But the command line is >> generated by libtool, so how can I make libt

Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran

2010-10-06 Thread Götz Waschk
On Wed, Oct 6, 2010 at 2:43 PM, Tim Prince wrote: >> libtool: link: mpif90 -shared  .libs/H5f90global.o >> .libs/H5fortran_types.o .libs/H5_ff.o .libs/H5Aff.o .libs/H5Dff.o >> .libs/H5Eff.o .libs/H5Fff.o .libs/H5Gff.o .libs/H5Iff.o .libs/H5Lff.o >> .libs/H5Off.o .libs/H5Pff.o .libs/H5Rff.o .libs/H

[OMPI users] hdf5 build error using openmpi and Intel Fortran

2010-10-06 Thread Götz Waschk
an.so.6: No such file: No such file or directory make[1]: *** [libhdf5_fortran.la] Fehler 1 hdf5 builds fine with static libraries only, but they become huge. It looks like libtool or mpif90 or something else is calling ifort with the wrong options. Any idea on how to fix this? Regards, Götz Waschk

Re: [OMPI users] Unable to include mpich library

2010-06-25 Thread Götz Waschk
On Fri, Jun 25, 2010 at 9:14 AM, Srinivas Gopal wrote: >    I'm trying to build CCSM4 for which I'm using open mpi 1.4.1. $MPICH_PATH > is set /usr/local (output of $which mpirun is /usr/local/bin/mpirun) and > LIB_MPI is set to $(MPICH_PATH)/lib in its Macros file. However build > process exits w

Re: [OMPI users] mpi.h file is missing in openmpi

2010-03-25 Thread Götz Waschk
/mpi.h Regards, Götz Waschk

[OMPI users] openib btl slows down application

2010-01-04 Thread Götz Waschk
idea? Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law. /* Benchmark of MPI_Sendrecv_replace as in get_overlaps_spinor_tslice of ddhqet hs 15.12.2009 */ #include #include #include #include #define NMEAS 11 #define L032 #define L116 #define L216

Re: [OMPI users] Gridengine integration problems

2007-05-22 Thread Götz Waschk
On 5/21/07, Pak Lui wrote: I have tried using SSH instead of rsh before but I didn't use with the kerberos auth. I can see you've tried to run qrsh -inherit via ssh already before the mpirun line and verify the connection works. Hi Pak Lui, I have tested it a bit more, the culprit is the kerb

[OMPI users] Gridengine integration problems

2007-05-21 Thread Götz Waschk
ld be? Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law. openmpi.job Description: Binary data openmpi.job.e6205663 Description: Binary data openmpi.job.o6205663 Description: Binary data

Re: [OMPI users] Problem running hpcc with a threaded BLAS

2007-05-14 Thread Götz Waschk
hard stack size limit of 10240 and the problem is gone. Sorry for the noise, regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] Problem running hpcc with a threaded BLAS

2007-05-11 Thread Götz Waschk
On 4/27/07, Götz Waschk wrote: I'm testing my new cluster installation with the hpcc benchmark and openmpi 1.2.1 on RHEL5 32 bit. I have some trouble with using a threaded BLAS implementation. I have tried ATLAS 3.7.30 compiled with pthread support. It crashes as reported here: [...] I h

Re: [OMPI users] Problem running hpcc with a threaded BLAS

2007-05-02 Thread Götz Waschk
is is because we have to use our own memory manager code to get around the memory pinning problem with Myrinet/GM and InfiniBand. You might want to configure with --without-memory- manager and see if that helps with your crashes. I have tried that, same result. Regards, Götz Waschk -- AL I:40: Do

[OMPI users] [PATCH] small build fix for gm btl

2007-04-27 Thread Götz Waschk
Hello everyone, I've found a bug trying to build openmpi 1.2.1 with progress threads and gm btl support. Gcc had no problem with the missing header but pgcc 7.0 complained. Check the attached patch. Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law. --- op

[OMPI users] Problem running hpcc with a threaded BLAS

2007-04-27 Thread Götz Waschk
CPU. If I set the maximum number of threads for Goto BLAS to 1, hpcc is working fine again. openmpi was compiled without thread support. Can you give me a hint? Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] Portland Group Compiler "-Msignextend" flag

2007-04-26 Thread Götz Waschk
t of such conversions are undefined. This is PGI 7.0-2. So maybe the documentation should be changed into 'The Portland Group compilers prior to version 7.0 require ...' Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] Disabling Tight Integration with SGE

2007-04-25 Thread Götz Waschk
delete all *gridengine* plugin files from lib/openmpi/ Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] OpenMPI run with the SGE launcher, orte PE calrification

2007-04-10 Thread Götz Waschk
veral ways to set the limit. It seems you are hitting the hard limit, if you want to set a higher value, you have to modify /etc/security/limits.conf as defined in the comments. This is part of pam, so you have to make sure your ssh session is using pam. Regards, Götz Waschk -- AL I:40: Do what

Re: [OMPI users] (no subject)

2007-04-04 Thread Götz Waschk
what exactly you have tried and please include the complete error message as well. Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] Odd behavior with slots=4

2007-03-28 Thread Götz Waschk
you might hit the Xeon bottle neck. Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] OpenMPI run with the SGE launcher, orte PE calrification

2007-03-28 Thread Götz Waschk
ng by itself by looking at SGE's environment variables. Regards, Götz Waschk - DESY Zeuthen -- AL I:40: Do what thou wilt shall be the whole of the Law.

Re: [OMPI users] Build problem with the pgi compiler

2007-03-19 Thread Götz Waschk
ase is still reproducable with the fresh 7.0 version of pgf90. We can only hope that they'll fix it in a later version. You are right, it is easy to avoid this, so it is no big problem. Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

[OMPI users] Build problem with the pgi compiler

2007-03-19 Thread Götz Waschk
he linking step is done separate from the compilation of conftest.f90. The Portland Group support is still trying to figure out if this is a bug in their compiler. Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.

[OMPI users] mpirun does not set the PATH and LD_LIBRARY_PATH under zsh

2007-03-01 Thread Götz Waschk
Hi everybody, first I'd like to introduce myself, my name is Götz Waschk and I'm working at DESY in the computing department. The default shell here is /bin/zsh. mpirun has support for setting PATH and LD_LIBRARY_PATH for a list of known shells (bash, ksh, csh, ...) but not for zsh.